TY - JOUR
T1 - Integrating transformer and imbalanced multi-label learning to identify antimicrobial peptides and their functional activities
AU - Pang, Yuxuan
AU - Yao, Lantian
AU - Xu, Jingyi
AU - Wang, Zhuo
AU - Lee, Tzong Yi
N1 - Publisher Copyright:
© The Author(s) 2022. Published by Oxford University Press.
PY - 2022/12/13
Y1 - 2022/12/13
N2 - MOTIVATION: Antimicrobial peptides (AMPs) have the potential to inhibit multiple types of pathogens and to heal infections. Computational strategies can assist in characterizing novel AMPs from proteome or collections of synthetic sequences and discovering their functional abilities toward different microbial targets without intensive labor. RESULTS: Here, we present a deep learning-based method for computer-aided novel AMP discovery that utilizes the transformer neural network architecture with knowledge from natural language processing to extract peptide sequence information. We implemented the method for two AMP-related tasks: the first is to discriminate AMPs from other peptides, and the second task is identifying AMPs functional activities related to seven different targets (gram-negative bacteria, gram-positive bacteria, fungi, viruses, cancer cells, parasites and mammalian cell inhibition), which is a multi-label problem. In addition, asymmetric loss was adopted to resolve the intrinsic imbalance of dataset, particularly for the multi-label scenarios. The evaluation showed that our proposed scheme achieves the best performance for the first task (96.85% balanced accuracy) and has a more unbiased prediction for the second task (79.83% balanced accuracy averaged across all functional activities) when compared with that of strategies without imbalanced learning or deep learning. AVAILABILITY AND IMPLEMENTATION: The source code and data of this study are available at https://github.com/BiOmicsLab/TransImbAMP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
AB - MOTIVATION: Antimicrobial peptides (AMPs) have the potential to inhibit multiple types of pathogens and to heal infections. Computational strategies can assist in characterizing novel AMPs from proteome or collections of synthetic sequences and discovering their functional abilities toward different microbial targets without intensive labor. RESULTS: Here, we present a deep learning-based method for computer-aided novel AMP discovery that utilizes the transformer neural network architecture with knowledge from natural language processing to extract peptide sequence information. We implemented the method for two AMP-related tasks: the first is to discriminate AMPs from other peptides, and the second task is identifying AMPs functional activities related to seven different targets (gram-negative bacteria, gram-positive bacteria, fungi, viruses, cancer cells, parasites and mammalian cell inhibition), which is a multi-label problem. In addition, asymmetric loss was adopted to resolve the intrinsic imbalance of dataset, particularly for the multi-label scenarios. The evaluation showed that our proposed scheme achieves the best performance for the first task (96.85% balanced accuracy) and has a more unbiased prediction for the second task (79.83% balanced accuracy averaged across all functional activities) when compared with that of strategies without imbalanced learning or deep learning. AVAILABILITY AND IMPLEMENTATION: The source code and data of this study are available at https://github.com/BiOmicsLab/TransImbAMP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
UR - http://www.scopus.com/inward/record.url?scp=85144585853&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btac711
DO - 10.1093/bioinformatics/btac711
M3 - Article
C2 - 36326438
AN - SCOPUS:85144585853
SN - 1367-4803
VL - 38
SP - 5368
EP - 5374
JO - Bioinformatics
JF - Bioinformatics
IS - 24
ER -