TwinCons: Conservation score for uncovering deep sequence similarity and divergence

We have developed the program TwinCons, to detect noisy signals of deep ancestry of proteins or nucleic acids. As input, the program uses a composite alignment containing pre-defined groups, and mathematically determines a ‘cost’ of transforming one group to the other at each position of the alignme...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Petar I. Penev, Claudia Alvarez-Carreño, Eric Smith, Anton S. Petrov, Loren Dean Williams
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
Acceso en línea:https://doaj.org/article/e964b131d31148a489815dceb4e20530
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:e964b131d31148a489815dceb4e20530
record_format dspace
spelling oai:doaj.org-article:e964b131d31148a489815dceb4e205302021-11-18T05:49:17ZTwinCons: Conservation score for uncovering deep sequence similarity and divergence1553-734X1553-7358https://doaj.org/article/e964b131d31148a489815dceb4e205302021-10-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8580257/?tool=EBIhttps://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358We have developed the program TwinCons, to detect noisy signals of deep ancestry of proteins or nucleic acids. As input, the program uses a composite alignment containing pre-defined groups, and mathematically determines a ‘cost’ of transforming one group to the other at each position of the alignment. The output distinguishes conserved, variable and signature positions. A signature is conserved within groups but differs between groups. The method automatically detects continuous characteristic stretches (segments) within alignments. TwinCons provides a convenient representation of conserved, variable and signature positions as a single score, enabling the structural mapping and visualization of these characteristics. Structure is more conserved than sequence. TwinCons highlights alternative sequences of conserved structures. Using TwinCons, we detected highly similar segments between proteins from the translation and transcription systems. TwinCons detects conserved residues within regions of high functional importance for the ribosomal RNA (rRNA) and demonstrates that signatures are not confined to specific regions but are distributed across the rRNA structure. The ability to evaluate both nucleic acid and protein alignments allows TwinCons to be used in combined sequence and structural analysis of signatures and conservation in rRNA and in ribosomal proteins (rProteins). TwinCons detects a strong sequence conservation signal between bacterial and archaeal rProteins related by circular permutation. This conserved sequence is structurally colocalized with conserved rRNA, indicated by TwinCons scores of rRNA alignments of bacterial and archaeal groups. This combined analysis revealed deep co-evolution of rRNA and rProtein buried within the deepest branching points in the tree of life. Author summary All species on Earth can be thought of as leaves on the Tree of Life, which are connected by branches representing their ancestral relationships. Biopolymers are evolutionary markers within species, that contain records of evolutionary history. Excavation of molecular evolutionary histories involves collecting sequences from extant species and organizing them into multiple sequence alignments. For the purpose of comparison, the sequences within an alignment can be partitioned into two groups, resulting in a composite alignment. We have developed the program TwinCons, to detect noisy signals of deep ancestry. TwinCons distinguishes conserved, variable and signature positions between the groups of the composite alignment. A signature is a position conserved within each group but differing between groups. TwinCons can further be used to detect uninterrupted ranges of positions (segments) preserved within the composite alignment. TwinCons results can be mapped onto structures of molecules. TwinCons scores can be applied to either proteins or ribonucleic acids (RNA). Using TwinCons we detected highly similar segments across ancient and essential protein components of living cells (translation and transcription) and pinpointed the deepest signatures between bacterial and archaeal RNAs within the ribosome.Petar I. PenevClaudia Alvarez-CarreñoEric SmithAnton S. PetrovLoren Dean WilliamsPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 10 (2021)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Petar I. Penev
Claudia Alvarez-Carreño
Eric Smith
Anton S. Petrov
Loren Dean Williams
TwinCons: Conservation score for uncovering deep sequence similarity and divergence
description We have developed the program TwinCons, to detect noisy signals of deep ancestry of proteins or nucleic acids. As input, the program uses a composite alignment containing pre-defined groups, and mathematically determines a ‘cost’ of transforming one group to the other at each position of the alignment. The output distinguishes conserved, variable and signature positions. A signature is conserved within groups but differs between groups. The method automatically detects continuous characteristic stretches (segments) within alignments. TwinCons provides a convenient representation of conserved, variable and signature positions as a single score, enabling the structural mapping and visualization of these characteristics. Structure is more conserved than sequence. TwinCons highlights alternative sequences of conserved structures. Using TwinCons, we detected highly similar segments between proteins from the translation and transcription systems. TwinCons detects conserved residues within regions of high functional importance for the ribosomal RNA (rRNA) and demonstrates that signatures are not confined to specific regions but are distributed across the rRNA structure. The ability to evaluate both nucleic acid and protein alignments allows TwinCons to be used in combined sequence and structural analysis of signatures and conservation in rRNA and in ribosomal proteins (rProteins). TwinCons detects a strong sequence conservation signal between bacterial and archaeal rProteins related by circular permutation. This conserved sequence is structurally colocalized with conserved rRNA, indicated by TwinCons scores of rRNA alignments of bacterial and archaeal groups. This combined analysis revealed deep co-evolution of rRNA and rProtein buried within the deepest branching points in the tree of life. Author summary All species on Earth can be thought of as leaves on the Tree of Life, which are connected by branches representing their ancestral relationships. Biopolymers are evolutionary markers within species, that contain records of evolutionary history. Excavation of molecular evolutionary histories involves collecting sequences from extant species and organizing them into multiple sequence alignments. For the purpose of comparison, the sequences within an alignment can be partitioned into two groups, resulting in a composite alignment. We have developed the program TwinCons, to detect noisy signals of deep ancestry. TwinCons distinguishes conserved, variable and signature positions between the groups of the composite alignment. A signature is a position conserved within each group but differing between groups. TwinCons can further be used to detect uninterrupted ranges of positions (segments) preserved within the composite alignment. TwinCons results can be mapped onto structures of molecules. TwinCons scores can be applied to either proteins or ribonucleic acids (RNA). Using TwinCons we detected highly similar segments across ancient and essential protein components of living cells (translation and transcription) and pinpointed the deepest signatures between bacterial and archaeal RNAs within the ribosome.
format article
author Petar I. Penev
Claudia Alvarez-Carreño
Eric Smith
Anton S. Petrov
Loren Dean Williams
author_facet Petar I. Penev
Claudia Alvarez-Carreño
Eric Smith
Anton S. Petrov
Loren Dean Williams
author_sort Petar I. Penev
title TwinCons: Conservation score for uncovering deep sequence similarity and divergence
title_short TwinCons: Conservation score for uncovering deep sequence similarity and divergence
title_full TwinCons: Conservation score for uncovering deep sequence similarity and divergence
title_fullStr TwinCons: Conservation score for uncovering deep sequence similarity and divergence
title_full_unstemmed TwinCons: Conservation score for uncovering deep sequence similarity and divergence
title_sort twincons: conservation score for uncovering deep sequence similarity and divergence
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/e964b131d31148a489815dceb4e20530
work_keys_str_mv AT petaripenev twinconsconservationscoreforuncoveringdeepsequencesimilarityanddivergence
AT claudiaalvarezcarreno twinconsconservationscoreforuncoveringdeepsequencesimilarityanddivergence
AT ericsmith twinconsconservationscoreforuncoveringdeepsequencesimilarityanddivergence
AT antonspetrov twinconsconservationscoreforuncoveringdeepsequencesimilarityanddivergence
AT lorendeanwilliams twinconsconservationscoreforuncoveringdeepsequencesimilarityanddivergence
_version_ 1718424824720654336