MSIM: A Highly Parallel Near-Memory Accelerator for MinHash Sketch

Aman Sinha, Jhih Yong Mai, Bo Cheng Lai

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Genome Assembly is an important Big Data analytics which involves massive computations for similarity searches on sequence databases. Being major component of runtime, similarity searches require careful design for scalable performance. MinHash Sketching is an extensively used data structure in Long-read genome assembly pipelines, which involves generating, randomizing and minimizing a set of hashes for all the k-mers in genome sequences. Compute-hungry MinHash sketch processing on commercially available multi-threaded CPUs suffer from the limited bandwidth of the L1-cache, which causes the CPUs to stall. Near-Data Processing (NDP) is an emerging trend in data-bound Big Data analytics to harness the low-latency, highbandwidth available within the Dual In-line Memory Modules (DIMMs). While NDP architectures have generally been utilized for memory-bound computations, MinHash sketching is a potential application that can gain massive throughput by exploiting memory Banks as higher bandwidth L1-cache.In this work, we propose MSIM, a distributed, highly parallel and efficient hardware-software co-design for accelerating MinHash Sketch processing on light-weight components placed on the DRAM hierarchy. Multiple ASIC-based Processing Engines (PEs) placed at the bank-group-level in MSIM provide highparallelism for low-latency computations. The PEs sequentially access data from all Banks within their bank-group with the help of a dedicated Address calculator, which utilizes an optimal data mapping scheme. The PEs are controlled by a custom Arbiter, which is directly activated by the host CPU using general DDR commands, without requiring any modification to the memory controller or the DIMM standard buses. MSIM requires limited area and power overheads, while displaying up-to 384.9x speedup and 1088.4x energy reduction compared to the baseline multithreaded software solution in our experiments. MSIM achieves 4.26x speedup over high-end GPU, while consuming 26.4x lesser energy. Moreover, MSIM design is highly scalable and extendable in nature.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 35th International System-on-Chip Conference, SOCC 2022
EditorsSakir Sezer, Thomas Buchner, Jurgen Becker, Andrew Marshall, Fahad Siddiqui, Tanja Harbaum, Kieran McLaughlin
PublisherIEEE Computer Society
ISBN (Electronic)9781665459853
DOIs
StatePublished - 2022
Event35th IEEE International System-on-Chip Conference, SOCC 2022 - Belfast, Northern Ireland, United Kingdom
Duration: 5 Sep 20228 Sep 2022

Publication series

NameInternational System on Chip Conference
Volume2022-September
ISSN (Print)2164-1676
ISSN (Electronic)2164-1706

Conference

Conference35th IEEE International System-on-Chip Conference, SOCC 2022
Country/TerritoryUnited Kingdom
CityBelfast, Northern Ireland
Period5/09/228/09/22

Keywords

  • Long read genome assembly
  • MinHash Sketches
  • Near Memory Processing
  • Processing-In-Memory

Fingerprint

Dive into the research topics of 'MSIM: A Highly Parallel Near-Memory Accelerator for MinHash Sketch'. Together they form a unique fingerprint.

Cite this