TY - GEN
T1 - 21.1 A Fully Integrated Genetic Variant Discovery SoC for Next-Generation Sequencing
AU - Wu, Yi Chung
AU - Chen, Yen Lung
AU - Yang, Chung Hsuan
AU - Lee, Chao Hsi
AU - Yu, Chao Yang
AU - Chang, Nian Shyang
AU - Chen, Ling Chien
AU - Chang, Jia Rong
AU - Lin, Chun Pin
AU - Chen, Hung Lieh
AU - Chen, Chi Shi
AU - Hung, Jui Hung
AU - Yang, Chia Hsiang
PY - 2020/2
Y1 - 2020/2
N2 - Next-generation sequencing (NGS) is now indispensable for genetics research and biomedical applications, such as disease analysis and evolution tracking [1]. However, it still takes up to a couple of days to analyze all genetic mutations (variants) of a human genome, which consists of 3 billion nucleotides, through GPU acceleration. Fig. 21.1.1 shows an overview of NGS and the data analysis workflow. The NGS technology enables sequencing hundreds of millions of DNA segments, anchored and amplified on a microarray, in parallel. In each sequencing cycle, the nucleotides (A, T, C, G) are individually detected by their unique fluorescence labels and DNA segments can then be constructed as short reads. The NGS data analysis workflow consists of Preprocessing, Short-Read Mapping (including Exact Matching and Inexact Matching), Haplotype Calling, and Variant Calling [2]. Short reads are first mapped to a reference DNA and further used to assemble the genome of the DNA sample. Preprocessing is essential for constructing the data structure for indexing the reference DNA. In Short-Read Mapping, a seeding-and-extension scheme is applied to perform both Exact and Inexact Matching. The equal-length sub-sequences (seeds) of the short reads are used to find the exact locations on the reference DNA. Then, the seeds are extended to identify the most-likely locations through global alignment, allowing mismatches and insertions/deletions [2]. Next, in Haplotype Calling, the reads mapped to a specific region are assembled to reconstruct the paternal and maternal genomes (i.e. haplotypes) of the DNA sample. Finally, in Variant Calling, the assembled haplotypes are used to determine the variants between the reference DNA and the sample DNA. The outputs of Variant Calling indicate the location and likelihood of each variant. Dedicated VLSI solutions have been developed for acceleration, but only Suffix-Array (SA) Sorting for Preprocessing and Exact Matching for Short-Read Mapping were realized on silicon [3]. This work presents a fully integrated SoC for the entire NGS data analysis process.
AB - Next-generation sequencing (NGS) is now indispensable for genetics research and biomedical applications, such as disease analysis and evolution tracking [1]. However, it still takes up to a couple of days to analyze all genetic mutations (variants) of a human genome, which consists of 3 billion nucleotides, through GPU acceleration. Fig. 21.1.1 shows an overview of NGS and the data analysis workflow. The NGS technology enables sequencing hundreds of millions of DNA segments, anchored and amplified on a microarray, in parallel. In each sequencing cycle, the nucleotides (A, T, C, G) are individually detected by their unique fluorescence labels and DNA segments can then be constructed as short reads. The NGS data analysis workflow consists of Preprocessing, Short-Read Mapping (including Exact Matching and Inexact Matching), Haplotype Calling, and Variant Calling [2]. Short reads are first mapped to a reference DNA and further used to assemble the genome of the DNA sample. Preprocessing is essential for constructing the data structure for indexing the reference DNA. In Short-Read Mapping, a seeding-and-extension scheme is applied to perform both Exact and Inexact Matching. The equal-length sub-sequences (seeds) of the short reads are used to find the exact locations on the reference DNA. Then, the seeds are extended to identify the most-likely locations through global alignment, allowing mismatches and insertions/deletions [2]. Next, in Haplotype Calling, the reads mapped to a specific region are assembled to reconstruct the paternal and maternal genomes (i.e. haplotypes) of the DNA sample. Finally, in Variant Calling, the assembled haplotypes are used to determine the variants between the reference DNA and the sample DNA. The outputs of Variant Calling indicate the location and likelihood of each variant. Dedicated VLSI solutions have been developed for acceleration, but only Suffix-Array (SA) Sorting for Preprocessing and Exact Matching for Short-Read Mapping were realized on silicon [3]. This work presents a fully integrated SoC for the entire NGS data analysis process.
UR - http://www.scopus.com/inward/record.url?scp=85083845184&partnerID=8YFLogxK
U2 - 10.1109/ISSCC19947.2020.9063002
DO - 10.1109/ISSCC19947.2020.9063002
M3 - Conference contribution
AN - SCOPUS:85083845184
SN - 978-1-7281-3206-8
T3 - IEEE International Solid State Circuits Conference
SP - 322
EP - 324
BT - 2020 IEEE INTERNATIONAL SOLID- STATE CIRCUITS CONFERENCE (ISSCC)
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Solid-State Circuits Conference, ISSCC 2020
Y2 - 16 February 2020 through 20 February 2020
ER -