TY - JOUR
T1 - Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition
AU - Chen, Yen Kuang
AU - Li, Kuo Bin
N1 - Funding Information:
This work was support by the grants from the National Science Council of Taiwan (NSC 98-2311-B-010-002-MY3 and NSC 101-2221-E-010-010- ).
PY - 2013/2/7
Y1 - 2013/2/7
N2 - The type information of un-annotated membrane proteins provides an important hint for their biological functions. The experimental determination of membrane protein types, despite being more accurate and reliable, is not always feasible due to the costly laboratory procedures, thereby creating a need for the development of bioinformatics methods. This article describes a novel computational classifier for the prediction of membrane protein types using proteins' sequences. The classifier, comprising a collection of one-versus-one support vector machines, makes use of the following sequence attributes: (1) the cationic patch sizes, the orientation, and the topology of transmembrane segments; (2) the amino acid physicochemical properties; (3) the presence of signal peptides or anchors; and (4) the specific protein motifs. A new voting scheme was implemented to cope with the multi-class prediction. Both the training and the testing sequences were collected from SwissProt. Homologous proteins were removed such that there is no pair of sequences left in the datasets with a sequence identity higher than 40%. The performance of the classifier was evaluated by a Jackknife cross-validation and an independent testing experiments. Results show that the proposed classifier outperforms earlier predictors in prediction accuracy in seven of the eight membrane protein types. The overall accuracy was increased from 78.3% to 88.2%. Unlike earlier approaches which largely depend on position-specific substitution matrices and amino acid compositions, most of the sequence attributes implemented in the proposed classifier have supported literature evidences. The classifier has been deployed as a web server and can be accessed at http://bsaltools.ym.edu.tw/predmpt.
AB - The type information of un-annotated membrane proteins provides an important hint for their biological functions. The experimental determination of membrane protein types, despite being more accurate and reliable, is not always feasible due to the costly laboratory procedures, thereby creating a need for the development of bioinformatics methods. This article describes a novel computational classifier for the prediction of membrane protein types using proteins' sequences. The classifier, comprising a collection of one-versus-one support vector machines, makes use of the following sequence attributes: (1) the cationic patch sizes, the orientation, and the topology of transmembrane segments; (2) the amino acid physicochemical properties; (3) the presence of signal peptides or anchors; and (4) the specific protein motifs. A new voting scheme was implemented to cope with the multi-class prediction. Both the training and the testing sequences were collected from SwissProt. Homologous proteins were removed such that there is no pair of sequences left in the datasets with a sequence identity higher than 40%. The performance of the classifier was evaluated by a Jackknife cross-validation and an independent testing experiments. Results show that the proposed classifier outperforms earlier predictors in prediction accuracy in seven of the eight membrane protein types. The overall accuracy was increased from 78.3% to 88.2%. Unlike earlier approaches which largely depend on position-specific substitution matrices and amino acid compositions, most of the sequence attributes implemented in the proposed classifier have supported literature evidences. The classifier has been deployed as a web server and can be accessed at http://bsaltools.ym.edu.tw/predmpt.
KW - Protein sequence analysis
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=84870215870&partnerID=8YFLogxK
U2 - 10.1016/j.jtbi.2012.10.033
DO - 10.1016/j.jtbi.2012.10.033
M3 - Article
C2 - 23137835
AN - SCOPUS:84870215870
SN - 0022-5193
VL - 318
SP - 1
EP - 12
JO - Journal of Theoretical Biology
JF - Journal of Theoretical Biology
ER -