Abstract
This paper presents the first end-to-end next-generation sequencing (NGS) data analysis accelerator for short-read mapping, haplotype calling, variant calling, and genotyping. It supports both single-end and paired-end short-reads (or reads) and uses the FM-index, a compact index data structure, for exact-match in short-read mapping. For inexact match part of short-read mapping, a dynamic programming array is proposed to determine the mapping results. To reduce the workload of short-read mapping, a rapid similarity calculation is designed. A rescue technique is also adopted to increase the overall sensitivity. In haplotype calling, a parallel k-mer processing engine can construct the de Bruijn graph and assemble the haplotypes. The variant calling step determines variants between a subject and a reference genome sequence with a variant discovery engine. Lastly, genotype likelihood is computed in parallel by a genotype likelihood computing engine, which outputs genotypes of all discovered variants and corresponding Phred-scaled likelihood (PL) values. This work completes end-to-end data analysis for the (Formula presented) PrecisionFDA dataset in an average of 28.2 minutes. It achieves a (Formula presented) higher throughput than the existing solutions with higher precision (99.79%) and sensitivity (99.03%). The chip also achieves a (Formula presented) higher energy efficiency than the Illumina DRAGEN FPGA acceleration system.
| Original language | English |
|---|---|
| Pages (from-to) | 1105-1119 |
| Number of pages | 15 |
| Journal | IEEE Transactions on Biomedical Circuits and Systems |
| Volume | 19 |
| Issue number | 6 |
| DOIs | |
| State | Published - 2025 |
Keywords
- Next-generation sequencing (NGS)
- application-specific integrated circuit (ASIC)
- digital CMOS integrated circuits
- genotyping
- haplotype calling
- short-read mapping
- variant calling