From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.

Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analy...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Simona Cocco, Remi Monasson, Martin Weigt
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2013
Materias:
Acceso en línea:https://doaj.org/article/68c16feebb164cbb9f09f23f2fca9ce6
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:68c16feebb164cbb9f09f23f2fca9ce6
record_format dspace
spelling oai:doaj.org-article:68c16feebb164cbb9f09f23f2fca9ce62021-11-18T05:53:40ZFrom principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.1553-734X1553-735810.1371/journal.pcbi.1003176https://doaj.org/article/68c16feebb164cbb9f09f23f2fca9ce62013-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/23990764/?tool=EBIhttps://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant 'patterns' of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold.Simona CoccoRemi MonassonMartin WeigtPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 9, Iss 8, p e1003176 (2013)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Simona Cocco
Remi Monasson
Martin Weigt
From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.
description Various approaches have explored the covariation of residues in multiple-sequence alignments of homologous proteins to extract functional and structural information. Among those are principal component analysis (PCA), which identifies the most correlated groups of residues, and direct coupling analysis (DCA), a global inference method based on the maximum entropy principle, which aims at predicting residue-residue contacts. In this paper, inspired by the statistical physics of disordered systems, we introduce the Hopfield-Potts model to naturally interpolate between these two approaches. The Hopfield-Potts model allows us to identify relevant 'patterns' of residues from the knowledge of the eigenmodes and eigenvalues of the residue-residue correlation matrix. We show how the computation of such statistical patterns makes it possible to accurately predict residue-residue contacts with a much smaller number of parameters than DCA. This dimensional reduction allows us to avoid overfitting and to extract contact information from multiple-sequence alignments of reduced size. In addition, we show that low-eigenvalue correlation modes, discarded by PCA, are important to recover structural information: the corresponding patterns are highly localized, that is, they are concentrated in few sites, which we find to be in close contact in the three-dimensional protein fold.
format article
author Simona Cocco
Remi Monasson
Martin Weigt
author_facet Simona Cocco
Remi Monasson
Martin Weigt
author_sort Simona Cocco
title From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.
title_short From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.
title_full From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.
title_fullStr From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.
title_full_unstemmed From principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.
title_sort from principal component to direct coupling analysis of coevolution in proteins: low-eigenvalue modes are needed for structure prediction.
publisher Public Library of Science (PLoS)
publishDate 2013
url https://doaj.org/article/68c16feebb164cbb9f09f23f2fca9ce6
work_keys_str_mv AT simonacocco fromprincipalcomponenttodirectcouplinganalysisofcoevolutioninproteinsloweigenvaluemodesareneededforstructureprediction
AT remimonasson fromprincipalcomponenttodirectcouplinganalysisofcoevolutioninproteinsloweigenvaluemodesareneededforstructureprediction
AT martinweigt fromprincipalcomponenttodirectcouplinganalysisofcoevolutioninproteinsloweigenvaluemodesareneededforstructureprediction
_version_ 1718424670672257024