TY - GEN
T1 - MSIM
T2 - 35th IEEE International System-on-Chip Conference, SOCC 2022
AU - Sinha, Aman
AU - Mai, Jhih Yong
AU - Lai, Bo Cheng
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Genome Assembly is an important Big Data analytics which involves massive computations for similarity searches on sequence databases. Being major component of runtime, similarity searches require careful design for scalable performance. MinHash Sketching is an extensively used data structure in Long-read genome assembly pipelines, which involves generating, randomizing and minimizing a set of hashes for all the k-mers in genome sequences. Compute-hungry MinHash sketch processing on commercially available multi-threaded CPUs suffer from the limited bandwidth of the L1-cache, which causes the CPUs to stall. Near-Data Processing (NDP) is an emerging trend in data-bound Big Data analytics to harness the low-latency, highbandwidth available within the Dual In-line Memory Modules (DIMMs). While NDP architectures have generally been utilized for memory-bound computations, MinHash sketching is a potential application that can gain massive throughput by exploiting memory Banks as higher bandwidth L1-cache.In this work, we propose MSIM, a distributed, highly parallel and efficient hardware-software co-design for accelerating MinHash Sketch processing on light-weight components placed on the DRAM hierarchy. Multiple ASIC-based Processing Engines (PEs) placed at the bank-group-level in MSIM provide highparallelism for low-latency computations. The PEs sequentially access data from all Banks within their bank-group with the help of a dedicated Address calculator, which utilizes an optimal data mapping scheme. The PEs are controlled by a custom Arbiter, which is directly activated by the host CPU using general DDR commands, without requiring any modification to the memory controller or the DIMM standard buses. MSIM requires limited area and power overheads, while displaying up-to 384.9x speedup and 1088.4x energy reduction compared to the baseline multithreaded software solution in our experiments. MSIM achieves 4.26x speedup over high-end GPU, while consuming 26.4x lesser energy. Moreover, MSIM design is highly scalable and extendable in nature.
AB - Genome Assembly is an important Big Data analytics which involves massive computations for similarity searches on sequence databases. Being major component of runtime, similarity searches require careful design for scalable performance. MinHash Sketching is an extensively used data structure in Long-read genome assembly pipelines, which involves generating, randomizing and minimizing a set of hashes for all the k-mers in genome sequences. Compute-hungry MinHash sketch processing on commercially available multi-threaded CPUs suffer from the limited bandwidth of the L1-cache, which causes the CPUs to stall. Near-Data Processing (NDP) is an emerging trend in data-bound Big Data analytics to harness the low-latency, highbandwidth available within the Dual In-line Memory Modules (DIMMs). While NDP architectures have generally been utilized for memory-bound computations, MinHash sketching is a potential application that can gain massive throughput by exploiting memory Banks as higher bandwidth L1-cache.In this work, we propose MSIM, a distributed, highly parallel and efficient hardware-software co-design for accelerating MinHash Sketch processing on light-weight components placed on the DRAM hierarchy. Multiple ASIC-based Processing Engines (PEs) placed at the bank-group-level in MSIM provide highparallelism for low-latency computations. The PEs sequentially access data from all Banks within their bank-group with the help of a dedicated Address calculator, which utilizes an optimal data mapping scheme. The PEs are controlled by a custom Arbiter, which is directly activated by the host CPU using general DDR commands, without requiring any modification to the memory controller or the DIMM standard buses. MSIM requires limited area and power overheads, while displaying up-to 384.9x speedup and 1088.4x energy reduction compared to the baseline multithreaded software solution in our experiments. MSIM achieves 4.26x speedup over high-end GPU, while consuming 26.4x lesser energy. Moreover, MSIM design is highly scalable and extendable in nature.
KW - Long read genome assembly
KW - MinHash Sketches
KW - Near Memory Processing
KW - Processing-In-Memory
UR - http://www.scopus.com/inward/record.url?scp=85140725848&partnerID=8YFLogxK
U2 - 10.1109/SOCC56010.2022.9908115
DO - 10.1109/SOCC56010.2022.9908115
M3 - Conference contribution
AN - SCOPUS:85140725848
T3 - International System on Chip Conference
BT - Proceedings - 2022 IEEE 35th International System-on-Chip Conference, SOCC 2022
A2 - Sezer, Sakir
A2 - Buchner, Thomas
A2 - Becker, Jurgen
A2 - Marshall, Andrew
A2 - Siddiqui, Fahad
A2 - Harbaum, Tanja
A2 - McLaughlin, Kieran
PB - IEEE Computer Society
Y2 - 5 September 2022 through 8 September 2022
ER -