Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in marker gene phylogenetic trees.

<h4>Background</h4>Most of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new sourc...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Dongying Wu, Martin Wu, Aaron Halpern, Douglas B Rusch, Shibu Yooseph, Marvin Frazier, J Craig Venter, Jonathan A Eisen
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2011
Materias:
R
Q
Acceso en línea:https://doaj.org/article/f786eb7b30b248bbbef393be4155698b
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:f786eb7b30b248bbbef393be4155698b
record_format dspace
spelling oai:doaj.org-article:f786eb7b30b248bbbef393be4155698b2021-11-18T06:57:09ZStalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in marker gene phylogenetic trees.1932-620310.1371/journal.pone.0018011https://doaj.org/article/f786eb7b30b248bbbef393be4155698b2011-03-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/21437252/?tool=EBIhttps://doaj.org/toc/1932-6203<h4>Background</h4>Most of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generated directly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as we argue here, in studies of very early events in the evolution of gene families and of species.<h4>Methodology/principal findings</h4>We designed and implemented new methods for analyzing metagenomic data and used them to search the Global Ocean Sampling (GOS) expedition data set for novel lineages in three gene families commonly used in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies. Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties in making robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novel branches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as these novel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences.<h4>Conclusions/significance</h4>Of the novel recA and rpoB homologs identified in the metagenomic data, some likely come from uncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A third possibility is that some come from novel cellular lineages that are only distantly related to any organisms for which sequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the tree of life, we suggest that methods such as those described herein currently offer the best way to search for them.Dongying WuMartin WuAaron HalpernDouglas B RuschShibu YoosephMarvin FrazierJ Craig VenterJonathan A EisenPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 6, Iss 3, p e18011 (2011)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Dongying Wu
Martin Wu
Aaron Halpern
Douglas B Rusch
Shibu Yooseph
Marvin Frazier
J Craig Venter
Jonathan A Eisen
Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in marker gene phylogenetic trees.
description <h4>Background</h4>Most of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generated directly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as we argue here, in studies of very early events in the evolution of gene families and of species.<h4>Methodology/principal findings</h4>We designed and implemented new methods for analyzing metagenomic data and used them to search the Global Ocean Sampling (GOS) expedition data set for novel lineages in three gene families commonly used in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies. Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties in making robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novel branches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as these novel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences.<h4>Conclusions/significance</h4>Of the novel recA and rpoB homologs identified in the metagenomic data, some likely come from uncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A third possibility is that some come from novel cellular lineages that are only distantly related to any organisms for which sequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the tree of life, we suggest that methods such as those described herein currently offer the best way to search for them.
format article
author Dongying Wu
Martin Wu
Aaron Halpern
Douglas B Rusch
Shibu Yooseph
Marvin Frazier
J Craig Venter
Jonathan A Eisen
author_facet Dongying Wu
Martin Wu
Aaron Halpern
Douglas B Rusch
Shibu Yooseph
Marvin Frazier
J Craig Venter
Jonathan A Eisen
author_sort Dongying Wu
title Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in marker gene phylogenetic trees.
title_short Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in marker gene phylogenetic trees.
title_full Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in marker gene phylogenetic trees.
title_fullStr Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in marker gene phylogenetic trees.
title_full_unstemmed Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in marker gene phylogenetic trees.
title_sort stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in marker gene phylogenetic trees.
publisher Public Library of Science (PLoS)
publishDate 2011
url https://doaj.org/article/f786eb7b30b248bbbef393be4155698b
work_keys_str_mv AT dongyingwu stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees
AT martinwu stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees
AT aaronhalpern stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees
AT douglasbrusch stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees
AT shibuyooseph stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees
AT marvinfrazier stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees
AT jcraigventer stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees
AT jonathanaeisen stalkingthefourthdomaininmetagenomicdatasearchingfordiscoveringandinterpretingnoveldeepbranchesinmarkergenephylogenetictrees
_version_ 1718424206076542976