Integrated assessment of genomic correlates of protein evolutionary rate.

Rates of evolution differ widely among proteins, but the causes and consequences of such differences remain under debate. With the advent of high-throughput functional genomics, it is now possible to rigorously assess the genomic correlates of protein evolutionary rate. However, dissecting the corre...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Yu Xia, Eric A Franzosa, Mark B Gerstein
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2009
Materias:
Acceso en línea:https://doaj.org/article/947ef222f5594b82abd33a49fe0278e4
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:947ef222f5594b82abd33a49fe0278e4
record_format dspace
spelling oai:doaj.org-article:947ef222f5594b82abd33a49fe0278e42021-11-25T05:42:21ZIntegrated assessment of genomic correlates of protein evolutionary rate.1553-734X1553-735810.1371/journal.pcbi.1000413https://doaj.org/article/947ef222f5594b82abd33a49fe0278e42009-06-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19521505/?tool=EBIhttps://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Rates of evolution differ widely among proteins, but the causes and consequences of such differences remain under debate. With the advent of high-throughput functional genomics, it is now possible to rigorously assess the genomic correlates of protein evolutionary rate. However, dissecting the correlations among evolutionary rate and these genomic features remains a major challenge. Here, we use an integrated probabilistic modeling approach to study genomic correlates of protein evolutionary rate in Saccharomyces cerevisiae. We measure and rank degrees of association between (i) an approximate measure of protein evolutionary rate with high genome coverage, and (ii) a diverse list of protein properties (sequence, structural, functional, network, and phenotypic). We observe, among many statistically significant correlations, that slowly evolving proteins tend to be regulated by more transcription factors, deficient in predicted structural disorder, involved in characteristic biological functions (such as translation), biased in amino acid composition, and are generally more abundant, more essential, and enriched for interaction partners. Many of these results are in agreement with recent studies. In addition, we assess information contribution of different subsets of these protein properties in the task of predicting slowly evolving proteins. We employ a logistic regression model on binned data that is able to account for intercorrelation, non-linearity, and heterogeneity within features. Our model considers features both individually and in natural ensembles ("meta-features") in order to assess joint information contribution and degree of contribution independence. Meta-features based on protein abundance and amino acid composition make strong, partially independent contributions to the task of predicting slowly evolving proteins; other meta-features make additional minor contributions. The combination of all meta-features yields predictions comparable to those based on paired species comparisons, and approaching the predictive limit of optimal lineage-insensitive features. Our integrated assessment framework can be readily extended to other correlational analyses at the genome scale.Yu XiaEric A FranzosaMark B GersteinPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 5, Iss 6, p e1000413 (2009)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Yu Xia
Eric A Franzosa
Mark B Gerstein
Integrated assessment of genomic correlates of protein evolutionary rate.
description Rates of evolution differ widely among proteins, but the causes and consequences of such differences remain under debate. With the advent of high-throughput functional genomics, it is now possible to rigorously assess the genomic correlates of protein evolutionary rate. However, dissecting the correlations among evolutionary rate and these genomic features remains a major challenge. Here, we use an integrated probabilistic modeling approach to study genomic correlates of protein evolutionary rate in Saccharomyces cerevisiae. We measure and rank degrees of association between (i) an approximate measure of protein evolutionary rate with high genome coverage, and (ii) a diverse list of protein properties (sequence, structural, functional, network, and phenotypic). We observe, among many statistically significant correlations, that slowly evolving proteins tend to be regulated by more transcription factors, deficient in predicted structural disorder, involved in characteristic biological functions (such as translation), biased in amino acid composition, and are generally more abundant, more essential, and enriched for interaction partners. Many of these results are in agreement with recent studies. In addition, we assess information contribution of different subsets of these protein properties in the task of predicting slowly evolving proteins. We employ a logistic regression model on binned data that is able to account for intercorrelation, non-linearity, and heterogeneity within features. Our model considers features both individually and in natural ensembles ("meta-features") in order to assess joint information contribution and degree of contribution independence. Meta-features based on protein abundance and amino acid composition make strong, partially independent contributions to the task of predicting slowly evolving proteins; other meta-features make additional minor contributions. The combination of all meta-features yields predictions comparable to those based on paired species comparisons, and approaching the predictive limit of optimal lineage-insensitive features. Our integrated assessment framework can be readily extended to other correlational analyses at the genome scale.
format article
author Yu Xia
Eric A Franzosa
Mark B Gerstein
author_facet Yu Xia
Eric A Franzosa
Mark B Gerstein
author_sort Yu Xia
title Integrated assessment of genomic correlates of protein evolutionary rate.
title_short Integrated assessment of genomic correlates of protein evolutionary rate.
title_full Integrated assessment of genomic correlates of protein evolutionary rate.
title_fullStr Integrated assessment of genomic correlates of protein evolutionary rate.
title_full_unstemmed Integrated assessment of genomic correlates of protein evolutionary rate.
title_sort integrated assessment of genomic correlates of protein evolutionary rate.
publisher Public Library of Science (PLoS)
publishDate 2009
url https://doaj.org/article/947ef222f5594b82abd33a49fe0278e4
work_keys_str_mv AT yuxia integratedassessmentofgenomiccorrelatesofproteinevolutionaryrate
AT ericafranzosa integratedassessmentofgenomiccorrelatesofproteinevolutionaryrate
AT markbgerstein integratedassessmentofgenomiccorrelatesofproteinevolutionaryrate
_version_ 1718414531533733888