TY - JOUR
T1 - Fine-grained protein fold assignment by support vector machines using generalized npeptide coding schemes and jury voting from multiple-parameter sets
AU - Yu, Chin Sheng
AU - Wang, Jung Ying
AU - Yang, Jinn-Moon
AU - Lyu, Ping Chiang
AU - Lin, Chih Jen
AU - Hwang, Jenn Kang
PY - 2003/3/1
Y1 - 2003/3/1
N2 - In the coarse-grained fold assignment of major protein classes, such as all-α, all-β, α + β, α/β proteins, one can easily achieve high prediction accuracy from primary amino acid sequences. However, the fine-grained assignment of folds, such as those defined in the Structural Classification of Proteins (SCOP) database, presents a challenge due to the larger amount of folds available. Recent study yielded reasonable prediction accuracy of 56.0% on an independent set of 27 most populated folds. In this communication, we apply the support vector machine (SVM) method, using a combination of protein descriptors based on the properties derived from the composition of n-peptide and jury voting, to the fine-grained fold prediction, and are able to achieve an overall prediction accuracy of 69.6% on the same independent set - significantly higher than the previous results. On 10-fold cross-validation, we obtained a prediction accuracy of 65.3%. Our results show that SVM coupled with suitable global sequence-coding schemes can significantly improve the fine-grained fold prediction. Our approach should be useful in structure prediction and modeling.
AB - In the coarse-grained fold assignment of major protein classes, such as all-α, all-β, α + β, α/β proteins, one can easily achieve high prediction accuracy from primary amino acid sequences. However, the fine-grained assignment of folds, such as those defined in the Structural Classification of Proteins (SCOP) database, presents a challenge due to the larger amount of folds available. Recent study yielded reasonable prediction accuracy of 56.0% on an independent set of 27 most populated folds. In this communication, we apply the support vector machine (SVM) method, using a combination of protein descriptors based on the properties derived from the composition of n-peptide and jury voting, to the fine-grained fold prediction, and are able to achieve an overall prediction accuracy of 69.6% on the same independent set - significantly higher than the previous results. On 10-fold cross-validation, we obtained a prediction accuracy of 65.3%. Our results show that SVM coupled with suitable global sequence-coding schemes can significantly improve the fine-grained fold prediction. Our approach should be useful in structure prediction and modeling.
KW - Fine-grained fold prediction
KW - Global sequence-coding scheme
KW - N-peptide
KW - Support vector machines
UR - http://www.scopus.com/inward/record.url?scp=0037341893&partnerID=8YFLogxK
U2 - 10.1002/prot.10313
DO - 10.1002/prot.10313
M3 - Article
C2 - 12577258
AN - SCOPUS:0037341893
SN - 0887-3585
VL - 50
SP - 531
EP - 536
JO - Proteins: Structure, Function and Genetics
JF - Proteins: Structure, Function and Genetics
IS - 4
ER -