Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties

Yuchen Yang, Erwin Tantoso, Kuo Bin Li*

*此作品的通信作者

研究成果: Article同行評審

28 引文 斯高帕斯(Scopus)

摘要

Remote homology detection refers to the detection of structure homology in evolutionarily related proteins with low sequence similarity. Supervised learning algorithms such as support vector machine (SVM) are currently the most accurate methods. In most of these SVM-based methods, efforts have been dedicated to developing new kernels to better use the pairwise alignment scores or sequence profiles. Moreover, amino acids' physicochemical properties are not generally used in the feature representation of protein sequences. In this article, we present a remote homology detection method that incorporates two novel features: (1) a protein's primary sequence is represented using amino acid's physicochemical properties and (2) the similarity between two proteins is measured using recurrence quantification analysis (RQA). An optimization scheme was developed to select different amino acid indices (up to 10 for a protein family) that are best to characterize the given protein family. The selected amino acid indices may enable us to draw better biological explanation of the protein family classification problem than using other alignment-based methods. An SVM-based classifier will then work on the space described by the RQA metrics. The classification scheme is named as SVM-RQA. Experiments at the superfamily level of the SCOP1.53 dataset show that, without using alignment or sequence profile information, the features generated from amino acid indices are able to produce results that are comparable to those obtained by the published state-of-the-art SVM kernels. In the future, better prediction accuracies can be expected by combining the alignment-based features with our amino acids property-based features. Supplementary information including the raw dataset, the best-performing amino acid indices for each protein family and the computed RQA metrics for all protein sequences can be downloaded from http://ym151113.ym.edu.tw/svm-rqa.

原文English
頁(從 - 到)145-154
頁數10
期刊Journal of Theoretical Biology
252
發行號1
DOIs
出版狀態Published - 7 5月 2008

指紋

深入研究「Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties」主題。共同形成了獨特的指紋。

引用此