DP2: A Highly Parallel Range Join for Genome Analysis on Distributed Computing Platform

Aman Sinha, Bo Cheng Lai

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Rapid growth of the sheer amount of genome data and intense computation become great challenges for downstream genome analytics. Efficient parallel processing and distributed computing are the two effective schemes to address the analysis of big data. Range join is a widely used, effective, yet time-consuming operation that finds the overlap between two different sets of genome features. The current widely adopted BEDTools [6] pipeline adopts single-node binary tree approach, while the distributed GenAp scheme fails to exploit the massive parallel computation on modern throughput processors, such as GPU (Graphic Processing Unit). This paper proposes a novel Distributed Parallel P-ary search (DP2) that applies novel P-ary analysis to enable high parallelism at algorithmic level, and extensively utilize multiple GPUs at system and architecture level. Efficient computation allocation is implemented to leverage the distributed computing on clusters. The proposed framework can be well integrated with current BEDTools [6] pipeline, and achieves an average of 25x speedup for the actual range-join operation when compared with Binary tree approach of GenAp and a 13x end-to-end (total execution time) speedup in comparison to ADAM.

Original languageEnglish
Title of host publication2019 International Conference on High Performance Computing and Simulation, HPCS 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages358-362
Number of pages5
ISBN (Electronic)9781728144849
DOIs
StatePublished - Jul 2019
Event2019 International Conference on High Performance Computing and Simulation, HPCS 2019 - Dublin, Ireland
Duration: 15 Jul 201919 Jul 2019

Publication series

Name2019 International Conference on High Performance Computing and Simulation, HPCS 2019

Conference

Conference2019 International Conference on High Performance Computing and Simulation, HPCS 2019
Country/TerritoryIreland
CityDublin
Period15/07/1919/07/19

Keywords

  • distributed heterogeneous systems
  • Range join
  • Sequence analysis

Fingerprint

Dive into the research topics of 'DP2: A Highly Parallel Range Join for Genome Analysis on Distributed Computing Platform'. Together they form a unique fingerprint.

Cite this