Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners.

In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accu...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Carlo Baldassi, Marco Zamparo, Christoph Feinauer, Andrea Procaccini, Riccardo Zecchina, Martin Weigt, Andrea Pagnani
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2014
Materias:
R
Q
Acceso en línea:https://doaj.org/article/d4d88aaa739445c99c721e30223b4003
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:d4d88aaa739445c99c721e30223b4003
record_format dspace
spelling oai:doaj.org-article:d4d88aaa739445c99c721e30223b40032021-11-18T08:26:37ZFast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners.1932-620310.1371/journal.pone.0092721https://doaj.org/article/d4d88aaa739445c99c721e30223b40032014-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24663061/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code.Carlo BaldassiMarco ZamparoChristoph FeinauerAndrea ProcacciniRiccardo ZecchinaMartin WeigtAndrea PagnaniPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 9, Iss 3, p e92721 (2014)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Carlo Baldassi
Marco Zamparo
Christoph Feinauer
Andrea Procaccini
Riccardo Zecchina
Martin Weigt
Andrea Pagnani
Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners.
description In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code.
format article
author Carlo Baldassi
Marco Zamparo
Christoph Feinauer
Andrea Procaccini
Riccardo Zecchina
Martin Weigt
Andrea Pagnani
author_facet Carlo Baldassi
Marco Zamparo
Christoph Feinauer
Andrea Procaccini
Riccardo Zecchina
Martin Weigt
Andrea Pagnani
author_sort Carlo Baldassi
title Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners.
title_short Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners.
title_full Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners.
title_fullStr Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners.
title_full_unstemmed Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners.
title_sort fast and accurate multivariate gaussian modeling of protein families: predicting residue contacts and protein-interaction partners.
publisher Public Library of Science (PLoS)
publishDate 2014
url https://doaj.org/article/d4d88aaa739445c99c721e30223b4003
work_keys_str_mv AT carlobaldassi fastandaccuratemultivariategaussianmodelingofproteinfamiliespredictingresiduecontactsandproteininteractionpartners
AT marcozamparo fastandaccuratemultivariategaussianmodelingofproteinfamiliespredictingresiduecontactsandproteininteractionpartners
AT christophfeinauer fastandaccuratemultivariategaussianmodelingofproteinfamiliespredictingresiduecontactsandproteininteractionpartners
AT andreaprocaccini fastandaccuratemultivariategaussianmodelingofproteinfamiliespredictingresiduecontactsandproteininteractionpartners
AT riccardozecchina fastandaccuratemultivariategaussianmodelingofproteinfamiliespredictingresiduecontactsandproteininteractionpartners
AT martinweigt fastandaccuratemultivariategaussianmodelingofproteinfamiliespredictingresiduecontactsandproteininteractionpartners
AT andreapagnani fastandaccuratemultivariategaussianmodelingofproteinfamiliespredictingresiduecontactsandproteininteractionpartners
_version_ 1718421829964529664