Disentangling direct from indirect co-evolution of residues in protein alignments.

Predicting protein structure from primary sequence is one of the ultimate challenges in computational biology. Given the large amount of available sequence data, the analysis of co-evolution, i.e., statistical dependency, between columns in multiple alignments of protein domain sequences remains one...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Lukas Burger, Erik van Nimwegen
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2010
Materias:
Acceso en línea:https://doaj.org/article/d444721ac80f405f84279f4fd42201a7
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:d444721ac80f405f84279f4fd42201a7
record_format dspace
spelling oai:doaj.org-article:d444721ac80f405f84279f4fd42201a72021-11-25T05:42:43ZDisentangling direct from indirect co-evolution of residues in protein alignments.1553-734X1553-735810.1371/journal.pcbi.1000633https://doaj.org/article/d444721ac80f405f84279f4fd42201a72010-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/20052271/?tool=EBIhttps://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Predicting protein structure from primary sequence is one of the ultimate challenges in computational biology. Given the large amount of available sequence data, the analysis of co-evolution, i.e., statistical dependency, between columns in multiple alignments of protein domain sequences remains one of the most promising avenues for predicting residues that are contacting in the structure. A key impediment to this approach is that strong statistical dependencies are also observed for many residue pairs that are distal in the structure. Using a comprehensive analysis of protein domains with available three-dimensional structures we show that co-evolving contacts very commonly form chains that percolate through the protein structure, inducing indirect statistical dependencies between many distal pairs of residues. We characterize the distributions of length and spatial distance traveled by these co-evolving contact chains and show that they explain a large fraction of observed statistical dependencies between structurally distal pairs. We adapt a recently developed Bayesian network model into a rigorous procedure for disentangling direct from indirect statistical dependencies, and we demonstrate that this method not only successfully accomplishes this task, but also allows contacts with weak statistical dependency to be detected. To illustrate how additional information can be incorporated into our method, we incorporate a phylogenetic correction, and we develop an informative prior that takes into account that the probability for a pair of residues to contact depends strongly on their primary-sequence distance and the amount of conservation that the corresponding columns in the multiple alignment exhibit. We show that our model including these extensions dramatically improves the accuracy of contact prediction from multiple sequence alignments.Lukas BurgerErik van NimwegenPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 6, Iss 1, p e1000633 (2010)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Lukas Burger
Erik van Nimwegen
Disentangling direct from indirect co-evolution of residues in protein alignments.
description Predicting protein structure from primary sequence is one of the ultimate challenges in computational biology. Given the large amount of available sequence data, the analysis of co-evolution, i.e., statistical dependency, between columns in multiple alignments of protein domain sequences remains one of the most promising avenues for predicting residues that are contacting in the structure. A key impediment to this approach is that strong statistical dependencies are also observed for many residue pairs that are distal in the structure. Using a comprehensive analysis of protein domains with available three-dimensional structures we show that co-evolving contacts very commonly form chains that percolate through the protein structure, inducing indirect statistical dependencies between many distal pairs of residues. We characterize the distributions of length and spatial distance traveled by these co-evolving contact chains and show that they explain a large fraction of observed statistical dependencies between structurally distal pairs. We adapt a recently developed Bayesian network model into a rigorous procedure for disentangling direct from indirect statistical dependencies, and we demonstrate that this method not only successfully accomplishes this task, but also allows contacts with weak statistical dependency to be detected. To illustrate how additional information can be incorporated into our method, we incorporate a phylogenetic correction, and we develop an informative prior that takes into account that the probability for a pair of residues to contact depends strongly on their primary-sequence distance and the amount of conservation that the corresponding columns in the multiple alignment exhibit. We show that our model including these extensions dramatically improves the accuracy of contact prediction from multiple sequence alignments.
format article
author Lukas Burger
Erik van Nimwegen
author_facet Lukas Burger
Erik van Nimwegen
author_sort Lukas Burger
title Disentangling direct from indirect co-evolution of residues in protein alignments.
title_short Disentangling direct from indirect co-evolution of residues in protein alignments.
title_full Disentangling direct from indirect co-evolution of residues in protein alignments.
title_fullStr Disentangling direct from indirect co-evolution of residues in protein alignments.
title_full_unstemmed Disentangling direct from indirect co-evolution of residues in protein alignments.
title_sort disentangling direct from indirect co-evolution of residues in protein alignments.
publisher Public Library of Science (PLoS)
publishDate 2010
url https://doaj.org/article/d444721ac80f405f84279f4fd42201a7
work_keys_str_mv AT lukasburger disentanglingdirectfromindirectcoevolutionofresiduesinproteinalignments
AT erikvannimwegen disentanglingdirectfromindirectcoevolutionofresiduesinproteinalignments
_version_ 1718414546795757568