TY - GEN
T1 - An Effective and Efficient Algorithm for Detecting Exact Deletion Breakpoints from Viral Next-Generation Sequencing Data
AU - Cheng, Ji Hong
AU - Liu, Wen Chun
AU - Chang, Ting Tsung
AU - Hsieh, Sun Yuan
AU - Tseng, Vincent S.
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12
Y1 - 2020/12
N2 - The COVID-19 pandemic has caused serious damage to the health, life and, economic stability of human beings all over the world. In order to combat this disease, researchers from all over the world, including computer scientists, are beginning to engage in cross-regional cooperation to conduct research on SARS-CoV-2. One of the latest reports pointed out that the sequence deletion of the specific region of the SARS-CoV-2 genomic is related to its viral infectivity. In addition, the sequence deletion of this specific region is also found in Hepatitis B Virus (HBV), and Hepatocellular carcinoma (HCC). Through next-generation sequencing (NGS) technology, the sequence data of biological genomes can be quickly obtained, but the number of short reads generated by NGS is often as high as one million big data. It is a challenge to detect the information necessary to provide the exact sequence deletion breakpoint from these NGS data, especially in the sequence data of highly variable viral genomes. In our previous research, we proposed VirDelect, a bioinformatics tool that can detect exact breakpoints in Viral NGS data. In this paper, a new method, One-base Alignment Plus (OAP), is proposed to enhance further the core VirDelect algorithm, in order to improve the sequence deletion detection correctness. We use the simulated data of SARS-CoV-2 and HBV with different deletion lengths and the real data of HBV to conduct experiments and evaluate the correctness. The experimental results showed that VirDelect+OAP was able to find deletions that VirDelect could not find in the simulation data, and in the real data, the correctness of VirDelect+OPA was raised effectively.
AB - The COVID-19 pandemic has caused serious damage to the health, life and, economic stability of human beings all over the world. In order to combat this disease, researchers from all over the world, including computer scientists, are beginning to engage in cross-regional cooperation to conduct research on SARS-CoV-2. One of the latest reports pointed out that the sequence deletion of the specific region of the SARS-CoV-2 genomic is related to its viral infectivity. In addition, the sequence deletion of this specific region is also found in Hepatitis B Virus (HBV), and Hepatocellular carcinoma (HCC). Through next-generation sequencing (NGS) technology, the sequence data of biological genomes can be quickly obtained, but the number of short reads generated by NGS is often as high as one million big data. It is a challenge to detect the information necessary to provide the exact sequence deletion breakpoint from these NGS data, especially in the sequence data of highly variable viral genomes. In our previous research, we proposed VirDelect, a bioinformatics tool that can detect exact breakpoints in Viral NGS data. In this paper, a new method, One-base Alignment Plus (OAP), is proposed to enhance further the core VirDelect algorithm, in order to improve the sequence deletion detection correctness. We use the simulated data of SARS-CoV-2 and HBV with different deletion lengths and the real data of HBV to conduct experiments and evaluate the correctness. The experimental results showed that VirDelect+OAP was able to find deletions that VirDelect could not find in the simulation data, and in the real data, the correctness of VirDelect+OPA was raised effectively.
KW - Big data
KW - COVID-19
KW - Hepatitis B Virus
KW - Next-generation sequencing
KW - Viral deletion detection
UR - http://www.scopus.com/inward/record.url?scp=85102198961&partnerID=8YFLogxK
U2 - 10.1109/ICS51289.2020.00038
DO - 10.1109/ICS51289.2020.00038
M3 - Conference contribution
AN - SCOPUS:85102198961
T3 - Proceedings - 2020 International Computer Symposium, ICS 2020
SP - 147
EP - 152
BT - Proceedings - 2020 International Computer Symposium, ICS 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 International Computer Symposium, ICS 2020
Y2 - 17 December 2020 through 19 December 2020
ER -