TY - JOUR
T1 - AAIndexLoc
T2 - Predicting subcellular localization of proteins based on a new representation of sequences using amino acid indices
AU - Tantoso, E.
AU - Li, Kuo Bin
PY - 2008/8
Y1 - 2008/8
N2 - Identifying a protein's subcellular localization is an important step to understand its function. However, the involved experimental work is usually laborious, time consuming and costly. Computational prediction hence becomes valuable to reduce the inefficiency. Here we provide a method to predict protein subcellular localization by using amino acid composition and physicochemical properties. The method concatenates the information extracted from a protein's N-terminal, middle and full sequence. Each part is represented by amino acid composition, weighted amino acid composition, five-level grouping composition and five-level dipeptide composition. We divided our dataset into training and testing set. The training set is used to determine the best performing amino acid index by using five-fold cross validation, whereas the testing set acts as the independent dataset to evaluate the performance of our model. With the novel representation method, we achieve an accuracy of approximately 75% on independent dataset. We conclude that this new representation indeed performs well and is able to extract the protein sequence information. We have developed a web server for predicting protein subcellular localization. The web server is available at http://aaindexloc.bii.a-star.edu.sg.
AB - Identifying a protein's subcellular localization is an important step to understand its function. However, the involved experimental work is usually laborious, time consuming and costly. Computational prediction hence becomes valuable to reduce the inefficiency. Here we provide a method to predict protein subcellular localization by using amino acid composition and physicochemical properties. The method concatenates the information extracted from a protein's N-terminal, middle and full sequence. Each part is represented by amino acid composition, weighted amino acid composition, five-level grouping composition and five-level dipeptide composition. We divided our dataset into training and testing set. The training set is used to determine the best performing amino acid index by using five-fold cross validation, whereas the testing set acts as the independent dataset to evaluate the performance of our model. With the novel representation method, we achieve an accuracy of approximately 75% on independent dataset. We conclude that this new representation indeed performs well and is able to extract the protein sequence information. We have developed a web server for predicting protein subcellular localization. The web server is available at http://aaindexloc.bii.a-star.edu.sg.
KW - Amino acid indices
KW - Subcellular localization
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=46049118177&partnerID=8YFLogxK
U2 - 10.1007/s00726-007-0616-y
DO - 10.1007/s00726-007-0616-y
M3 - Article
C2 - 18163182
AN - SCOPUS:46049118177
SN - 0939-4451
VL - 35
SP - 345
EP - 353
JO - Amino Acids
JF - Amino Acids
IS - 2
ER -