TY - JOUR
T1 - A New Scheme to Characterize and Identify Protein Ubiquitination Sites
AU - Nguyen, Van Nui
AU - Huang, Kai Yao
AU - Huang, Chien Hsun
AU - Lai, K. Robert
AU - Lee, Tzong Yi
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/3/1
Y1 - 2017/3/1
N2 - Protein ubiquitination, involving the conjugation of ubiquitin on lysine residue, serves as an important modulator of many cellular functions in eukaryotes. Recent advancements in proteomic technology have stimulated increasing interest in identifying ubiquitination sites. However, most computational tools for predicting ubiquitination sites are focused on small-scale data. With an increasing number of experimentally verified ubiquitination sites, we were motivated to design a predictive model for identifying lysine ubiquitination sites for large-scale proteome dataset. This work assessed not only single features, such as amino acid composition (AAC), amino acid pair composition (AAPC) and evolutionary information, but also the effectiveness of incorporating two or more features into a hybrid approach to model construction. The support vector machine (SVM) was applied to generate the prediction models for ubiquitination site identification. Evaluation by five-fold cross-validation showed that the SVM models learned from the combination of hybrid features delivered a better prediction performance. Additionally, a motif discovery tool, MDDLogo, was adopted to characterize the potential substrate motifs of ubiquitination sites. The SVM models integrating the MDDLogo-identified substrate motifs could yield an average accuracy of 68.70 percent. Furthermore, the independent testing result showed that the MDDLogo-clustered SVM models could provide a promising accuracy (78.50 percent) and perform better than other prediction tools. Two cases have demonstrated the effective prediction of ubiquitination sites with corresponding substrate motifs.
AB - Protein ubiquitination, involving the conjugation of ubiquitin on lysine residue, serves as an important modulator of many cellular functions in eukaryotes. Recent advancements in proteomic technology have stimulated increasing interest in identifying ubiquitination sites. However, most computational tools for predicting ubiquitination sites are focused on small-scale data. With an increasing number of experimentally verified ubiquitination sites, we were motivated to design a predictive model for identifying lysine ubiquitination sites for large-scale proteome dataset. This work assessed not only single features, such as amino acid composition (AAC), amino acid pair composition (AAPC) and evolutionary information, but also the effectiveness of incorporating two or more features into a hybrid approach to model construction. The support vector machine (SVM) was applied to generate the prediction models for ubiquitination site identification. Evaluation by five-fold cross-validation showed that the SVM models learned from the combination of hybrid features delivered a better prediction performance. Additionally, a motif discovery tool, MDDLogo, was adopted to characterize the potential substrate motifs of ubiquitination sites. The SVM models integrating the MDDLogo-identified substrate motifs could yield an average accuracy of 68.70 percent. Furthermore, the independent testing result showed that the MDDLogo-clustered SVM models could provide a promising accuracy (78.50 percent) and perform better than other prediction tools. Two cases have demonstrated the effective prediction of ubiquitination sites with corresponding substrate motifs.
KW - Protein ubiquitination
KW - maximal dependence decomposition
KW - substrate motif
KW - support vector machine
UR - http://www.scopus.com/inward/record.url?scp=85017139703&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2016.2520939
DO - 10.1109/TCBB.2016.2520939
M3 - Article
C2 - 26887002
AN - SCOPUS:85017139703
SN - 1545-5963
VL - 14
SP - 393
EP - 403
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 2
M1 - 7400980
ER -