An ensemble method for predicting subnuclear localizations from primary protein structures.

<h4>Background</h4>Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel indi...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Guo Sheng Han, Zu Guo Yu, Vo Anh, Anaththa P D Krishnajith, Yu-Chu Tian
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2013
Materias:
R
Q
Acceso en línea:https://doaj.org/article/3c6730d3dbd840c8b04b6d6f85b8859e
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:3c6730d3dbd840c8b04b6d6f85b8859e
record_format dspace
spelling oai:doaj.org-article:3c6730d3dbd840c8b04b6d6f85b8859e2021-11-18T07:55:50ZAn ensemble method for predicting subnuclear localizations from primary protein structures.1932-620310.1371/journal.pone.0057225https://doaj.org/article/3c6730d3dbd840c8b04b6d6f85b8859e2013-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/23460833/?tool=EBIhttps://doaj.org/toc/1932-6203<h4>Background</h4>Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods.<h4>Methodology/principal findings</h4>A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis.<h4>Conclusions</h4>It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpred_page.php.Guo Sheng HanZu Guo YuVo AnhAnaththa P D KrishnajithYu-Chu TianPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 8, Iss 2, p e57225 (2013)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Guo Sheng Han
Zu Guo Yu
Vo Anh
Anaththa P D Krishnajith
Yu-Chu Tian
An ensemble method for predicting subnuclear localizations from primary protein structures.
description <h4>Background</h4>Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods.<h4>Methodology/principal findings</h4>A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis.<h4>Conclusions</h4>It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpred_page.php.
format article
author Guo Sheng Han
Zu Guo Yu
Vo Anh
Anaththa P D Krishnajith
Yu-Chu Tian
author_facet Guo Sheng Han
Zu Guo Yu
Vo Anh
Anaththa P D Krishnajith
Yu-Chu Tian
author_sort Guo Sheng Han
title An ensemble method for predicting subnuclear localizations from primary protein structures.
title_short An ensemble method for predicting subnuclear localizations from primary protein structures.
title_full An ensemble method for predicting subnuclear localizations from primary protein structures.
title_fullStr An ensemble method for predicting subnuclear localizations from primary protein structures.
title_full_unstemmed An ensemble method for predicting subnuclear localizations from primary protein structures.
title_sort ensemble method for predicting subnuclear localizations from primary protein structures.
publisher Public Library of Science (PLoS)
publishDate 2013
url https://doaj.org/article/3c6730d3dbd840c8b04b6d6f85b8859e
work_keys_str_mv AT guoshenghan anensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT zuguoyu anensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT voanh anensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT anaththapdkrishnajith anensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT yuchutian anensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT guoshenghan ensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT zuguoyu ensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT voanh ensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT anaththapdkrishnajith ensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
AT yuchutian ensemblemethodforpredictingsubnuclearlocalizationsfromprimaryproteinstructures
_version_ 1718422722414903296