TY - JOUR
T1 - Predicting allergenic proteins using wavelet transform
AU - Li, Kuo Bin
AU - Issac, Praveen
AU - Krishnan, Arun
N1 - Funding Information:
Chua Gek Huey and Wang Yi are thanked for useful discussions. Dr Francis Tang is acknowledged for his assistance on computation tasks. This work was supported by the Bioinformatics Institute, Singapore.
PY - 2004/11/1
Y1 - 2004/11/1
N2 - Motivation: With many transgenic proteins introduced today, the ability to predict their potential allergenicity has become an important issue. Previous studies were based on either sequence similarity or the protein motifs identified from known allergen databases. The similarity-based approaches, although being able to produce high recalls, usually have low prediction precisions. Previous motif-based approaches have been shown to be able to improve the precisions on cross-validation experiments. In this study, a system that combines the advantages of similarity-based and motif-based prediction is described. Results: The new prediction system uses a clustering algorithm that groups the known allergenic proteins into clusters. Proteins within each cluster are assumed to carry one or more common motifs. After a multiple sequence alignment, proteins in each cluster go through a wavelet analysis program whereby conserved motifs will be identified. A hidden Markov model (HMM) profile will then be prepared for each identified motif. The allergens that do not appear to carry detectable allergen motifs will be saved in a small database. The allergenicity of an unknown protein may be predicted by comparing it against the HMM profiles, and, if no matching profiles are found, against the small allergen database by BLASTP. Over 70% of recall and over 90% of precision were observed using cross-validation experiments. Using the entire Swiss-Prot as the query, we predicted about 2000 potential allergens.
AB - Motivation: With many transgenic proteins introduced today, the ability to predict their potential allergenicity has become an important issue. Previous studies were based on either sequence similarity or the protein motifs identified from known allergen databases. The similarity-based approaches, although being able to produce high recalls, usually have low prediction precisions. Previous motif-based approaches have been shown to be able to improve the precisions on cross-validation experiments. In this study, a system that combines the advantages of similarity-based and motif-based prediction is described. Results: The new prediction system uses a clustering algorithm that groups the known allergenic proteins into clusters. Proteins within each cluster are assumed to carry one or more common motifs. After a multiple sequence alignment, proteins in each cluster go through a wavelet analysis program whereby conserved motifs will be identified. A hidden Markov model (HMM) profile will then be prepared for each identified motif. The allergens that do not appear to carry detectable allergen motifs will be saved in a small database. The allergenicity of an unknown protein may be predicted by comparing it against the HMM profiles, and, if no matching profiles are found, against the small allergen database by BLASTP. Over 70% of recall and over 90% of precision were observed using cross-validation experiments. Using the entire Swiss-Prot as the query, we predicted about 2000 potential allergens.
UR - http://www.scopus.com/inward/record.url?scp=8844259689&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bth286
DO - 10.1093/bioinformatics/bth286
M3 - Article
C2 - 15117757
AN - SCOPUS:8844259689
SN - 1367-4803
VL - 20
SP - 2572
EP - 2578
JO - Bioinformatics
JF - Bioinformatics
IS - 16
ER -