Partial Flattening: A Compilation Technique for Irregular Nested Parallelism on GPGPUs

Ming Hsiang Huang, Wuu Yang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Supporting irregular nested parallelism on modern GPUs requires much effort. One should distribute the parallel tasks evenly while preserving reasonable memory usage. Moreover, the task distribution should also fit the thread hierarchy of the underlying GPU to fully exploit its computing power. We propose partial flattening, an automatic code transformation which translates annotated C programs to CUDA kernels. Thread blocks are treated as flat SIMT processors. Iterations are dynamically organized into batches. Batches are executed in a sequential (depth-first) order. A kernel is treated as multiple independent SIMT processors with an additional task-stealing mechanism. Partial flattening allows easy expression of nested parallelism and synchronization by annotating nested parallel loops or parallel-recursive calls, while preserving reasonable memory usage by the depth-first execution order. Our 2-level task distribution scheme does not need special hardware support, and fits well with the CUDA thread hierarchy. Experiments show that partial flattening outperforms NESL significantly in most benchmarks, and obtains 2.15x and 67x speedup over CUDA dynamic parallelism in Quicksort and the Bron-Kerbosch algorithm, respectively.

Original languageEnglish
Title of host publicationProceedings - 45th International Conference on Parallel Processing, ICPP 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages552-561
Number of pages10
ISBN (Electronic)9781509028238
DOIs
StatePublished - 21 Sep 2016
Event45th International Conference on Parallel Processing, ICPP 2016 - Philadelphia, United States
Duration: 16 Aug 201619 Aug 2016

Publication series

NameProceedings of the International Conference on Parallel Processing
Volume2016-September
ISSN (Print)0190-3918

Conference

Conference45th International Conference on Parallel Processing, ICPP 2016
Country/TerritoryUnited States
CityPhiladelphia
Period16/08/1619/08/16

Keywords

  • CUDA
  • Compiler directive
  • Nested parallelism
  • Program transform

Fingerprint

Dive into the research topics of 'Partial Flattening: A Compilation Technique for Irregular Nested Parallelism on GPGPUs'. Together they form a unique fingerprint.

Cite this