Evaluation of sequence features from intrinsically disordered regions for the estimation of protein function.

With the exponential increase in the number of sequenced organisms, automated annotation of proteins is becoming increasingly important. Intrinsically disordered regions are known to play a significant role in protein function. Despite their abundance, especially in eukaryotes, they are rarely used...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Alok Sharma, Abdollah Dehzangi, James Lyons, Seiya Imoto, Satoru Miyano, Kenta Nakai, Ashwini Patil
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2014
Materias:
R
Q
Acceso en línea:https://doaj.org/article/524d9657344049c3aeef6794bb47771b
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:524d9657344049c3aeef6794bb47771b
record_format dspace
spelling oai:doaj.org-article:524d9657344049c3aeef6794bb47771b2021-11-18T08:31:14ZEvaluation of sequence features from intrinsically disordered regions for the estimation of protein function.1932-620310.1371/journal.pone.0089890https://doaj.org/article/524d9657344049c3aeef6794bb47771b2014-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24587103/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203With the exponential increase in the number of sequenced organisms, automated annotation of proteins is becoming increasingly important. Intrinsically disordered regions are known to play a significant role in protein function. Despite their abundance, especially in eukaryotes, they are rarely used to inform function prediction systems. In this study, we extracted seven sequence features in intrinsically disordered regions and developed a scheme to use them to predict Gene Ontology Slim terms associated with proteins. We evaluated the function prediction performance of each feature. Our results indicate that the residue composition based features have the highest precision while bigram probabilities, based on sequence profiles of intrinsically disordered regions obtained from PSIBlast, have the highest recall. Amino acid bigrams and features based on secondary structure show an intermediate level of precision and recall. Almost all features showed a high prediction performance for GO Slim terms related to extracellular matrix, nucleus, RNA and DNA binding. However, feature performance varied significantly for different GO Slim terms emphasizing the need for a unique classifier optimized for the prediction of each functional term. These findings provide a first comprehensive and quantitative evaluation of sequence features in intrinsically disordered regions and will help in the development of a more informative protein function predictor.Alok SharmaAbdollah DehzangiJames LyonsSeiya ImotoSatoru MiyanoKenta NakaiAshwini PatilPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 9, Iss 2, p e89890 (2014)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Alok Sharma
Abdollah Dehzangi
James Lyons
Seiya Imoto
Satoru Miyano
Kenta Nakai
Ashwini Patil
Evaluation of sequence features from intrinsically disordered regions for the estimation of protein function.
description With the exponential increase in the number of sequenced organisms, automated annotation of proteins is becoming increasingly important. Intrinsically disordered regions are known to play a significant role in protein function. Despite their abundance, especially in eukaryotes, they are rarely used to inform function prediction systems. In this study, we extracted seven sequence features in intrinsically disordered regions and developed a scheme to use them to predict Gene Ontology Slim terms associated with proteins. We evaluated the function prediction performance of each feature. Our results indicate that the residue composition based features have the highest precision while bigram probabilities, based on sequence profiles of intrinsically disordered regions obtained from PSIBlast, have the highest recall. Amino acid bigrams and features based on secondary structure show an intermediate level of precision and recall. Almost all features showed a high prediction performance for GO Slim terms related to extracellular matrix, nucleus, RNA and DNA binding. However, feature performance varied significantly for different GO Slim terms emphasizing the need for a unique classifier optimized for the prediction of each functional term. These findings provide a first comprehensive and quantitative evaluation of sequence features in intrinsically disordered regions and will help in the development of a more informative protein function predictor.
format article
author Alok Sharma
Abdollah Dehzangi
James Lyons
Seiya Imoto
Satoru Miyano
Kenta Nakai
Ashwini Patil
author_facet Alok Sharma
Abdollah Dehzangi
James Lyons
Seiya Imoto
Satoru Miyano
Kenta Nakai
Ashwini Patil
author_sort Alok Sharma
title Evaluation of sequence features from intrinsically disordered regions for the estimation of protein function.
title_short Evaluation of sequence features from intrinsically disordered regions for the estimation of protein function.
title_full Evaluation of sequence features from intrinsically disordered regions for the estimation of protein function.
title_fullStr Evaluation of sequence features from intrinsically disordered regions for the estimation of protein function.
title_full_unstemmed Evaluation of sequence features from intrinsically disordered regions for the estimation of protein function.
title_sort evaluation of sequence features from intrinsically disordered regions for the estimation of protein function.
publisher Public Library of Science (PLoS)
publishDate 2014
url https://doaj.org/article/524d9657344049c3aeef6794bb47771b
work_keys_str_mv AT aloksharma evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction
AT abdollahdehzangi evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction
AT jameslyons evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction
AT seiyaimoto evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction
AT satorumiyano evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction
AT kentanakai evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction
AT ashwinipatil evaluationofsequencefeaturesfromintrinsicallydisorderedregionsfortheestimationofproteinfunction
_version_ 1718421682530549760