Substitution of kernel functions based on pattern matching on schedule trees

Zi Xuan Chen, Wuu Yang

研究成果: Conference contribution同行評審

摘要

With the rise of AI, computing hardware with varying architectures has emerged. For some frequently used AI kernels, these hardwares provide special accelerators and related instructions. For example, since the Volta architecture, Nvidia GPUs have provided tensor cores to optimize operations related to matrix multiplication. The vector extension of the RISC-V architecture provides instruction-level parallelism for kernels. We design and implement a language for pattern matching with which a user can define patterns for kernels. We identify segments of the schedule trees that match the defined patterns and replace the segments with calls to kernel functions (in libraries) or intrinsics that are optimized for the specific accelerators. In the experiments, the Polybench benchmarks are optimized for (and hence linked with) the following libraries: CBLAS on the x64 platform, CuBLAS with tensor-core instructions on GPU, OpenBLAS containing vector instructions on the RISC-V platform (software emulation, using the vector-instruction emulation ability provided by the Ara vector unit). The average (geomean) performance improvements on selected BLAS benchmarks are (1) run-time speedup is 1.38x for CBLAS on the x64 platform; (2) run-time improvement is 5.27x for CuBLAS with tensor-core instructions on GPU; (3) cycle-count speedup is 5.78x for OpenBLAS containing vector instructions on the RISC-V platform.

原文English
主出版物標題53rd International Conference on Parallel Processing, ICPP 2024 - Workshops Proceedings
發行者Association for Computing Machinery
頁面48-57
頁數10
ISBN(電子)9798400718021
DOIs
出版狀態Published - 12 8月 2024
事件53rd International Conference on Parallel Processing, ICPP 2024 - Gotland, 瑞典
持續時間: 12 8月 202415 8月 2024

出版系列

名字ACM International Conference Proceeding Series

Conference

Conference53rd International Conference on Parallel Processing, ICPP 2024
國家/地區瑞典
城市Gotland
期間12/08/2415/08/24

指紋

深入研究「Substitution of kernel functions based on pattern matching on schedule trees」主題。共同形成了獨特的指紋。

引用此