Generation of Comprehensive Ecosystem-Specific Reference Databases with Species-Level Resolution by High-Throughput Full-Length 16S rRNA Gene Sequencing and Automated Taxonomy Assignment (AutoTax)

ABSTRACT High-throughput 16S rRNA gene amplicon sequencing is an essential method for studying the diversity and dynamics of microbial communities. However, this method is presently hampered by the lack of high-identity reference sequences for many environmental microbes in the public 16S rRNA gene...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Morten Simonsen Dueholm, Kasper Skytte Andersen, Simon Jon McIlroy, Jannie Munk Kristensen, Erika Yashiro, Søren Michael Karst, Mads Albertsen, Per Halkjær Nielsen
Formato: article
Lenguaje:EN
Publicado: American Society for Microbiology 2020
Materias:
Acceso en línea:https://doaj.org/article/97839cfdc7064442bf889f2a3f10e3d7
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:97839cfdc7064442bf889f2a3f10e3d7
record_format dspace
spelling oai:doaj.org-article:97839cfdc7064442bf889f2a3f10e3d72021-11-15T16:19:09ZGeneration of Comprehensive Ecosystem-Specific Reference Databases with Species-Level Resolution by High-Throughput Full-Length 16S rRNA Gene Sequencing and Automated Taxonomy Assignment (AutoTax)10.1128/mBio.01557-202150-7511https://doaj.org/article/97839cfdc7064442bf889f2a3f10e3d72020-10-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mBio.01557-20https://doaj.org/toc/2150-7511ABSTRACT High-throughput 16S rRNA gene amplicon sequencing is an essential method for studying the diversity and dynamics of microbial communities. However, this method is presently hampered by the lack of high-identity reference sequences for many environmental microbes in the public 16S rRNA gene reference databases and by the absence of a systematic and comprehensive taxonomy for the uncultured majority. Here, we demonstrate how high-throughput synthetic long-read sequencing can be applied to create ecosystem-specific full-length 16S rRNA gene amplicon sequence variant (FL-ASV) resolved reference databases that include high-identity references (>98.7% identity) for nearly all abundant bacteria (>0.01% relative abundance) using Danish wastewater treatment systems and anaerobic digesters as an example. In addition, we introduce a novel sequence identity-based approach for automated taxonomy assignment (AutoTax) that provides a complete seven-rank taxonomy for all reference sequences, using the SILVA taxonomy as a backbone, with stable placeholder names for unclassified taxa. The FL-ASVs are perfectly suited for the evaluation of taxonomic resolution and bias associated with primers commonly used for amplicon sequencing, allowing researchers to choose those that are ideal for their ecosystem. Reference databases processed with AutoTax greatly improves the classification of short-read 16S rRNA ASVs at the genus- and species-level, compared with the commonly used universal reference databases. Importantly, the placeholder names provide a way to explore the unclassified environmental taxa at different taxonomic ranks, which in combination with in situ analyses can be used to uncover their ecological roles.Morten Simonsen DueholmKasper Skytte AndersenSimon Jon McIlroyJannie Munk KristensenErika YashiroSøren Michael KarstMads AlbertsenPer Halkjær NielsenAmerican Society for Microbiologyarticle16S RNAgene sequencingmicrobial communitiesmicrobial ecologytaxonomywastewater treatmentMicrobiologyQR1-502ENmBio, Vol 11, Iss 5 (2020)
institution DOAJ
collection DOAJ
language EN
topic 16S RNA
gene sequencing
microbial communities
microbial ecology
taxonomy
wastewater treatment
Microbiology
QR1-502
spellingShingle 16S RNA
gene sequencing
microbial communities
microbial ecology
taxonomy
wastewater treatment
Microbiology
QR1-502
Morten Simonsen Dueholm
Kasper Skytte Andersen
Simon Jon McIlroy
Jannie Munk Kristensen
Erika Yashiro
Søren Michael Karst
Mads Albertsen
Per Halkjær Nielsen
Generation of Comprehensive Ecosystem-Specific Reference Databases with Species-Level Resolution by High-Throughput Full-Length 16S rRNA Gene Sequencing and Automated Taxonomy Assignment (AutoTax)
description ABSTRACT High-throughput 16S rRNA gene amplicon sequencing is an essential method for studying the diversity and dynamics of microbial communities. However, this method is presently hampered by the lack of high-identity reference sequences for many environmental microbes in the public 16S rRNA gene reference databases and by the absence of a systematic and comprehensive taxonomy for the uncultured majority. Here, we demonstrate how high-throughput synthetic long-read sequencing can be applied to create ecosystem-specific full-length 16S rRNA gene amplicon sequence variant (FL-ASV) resolved reference databases that include high-identity references (>98.7% identity) for nearly all abundant bacteria (>0.01% relative abundance) using Danish wastewater treatment systems and anaerobic digesters as an example. In addition, we introduce a novel sequence identity-based approach for automated taxonomy assignment (AutoTax) that provides a complete seven-rank taxonomy for all reference sequences, using the SILVA taxonomy as a backbone, with stable placeholder names for unclassified taxa. The FL-ASVs are perfectly suited for the evaluation of taxonomic resolution and bias associated with primers commonly used for amplicon sequencing, allowing researchers to choose those that are ideal for their ecosystem. Reference databases processed with AutoTax greatly improves the classification of short-read 16S rRNA ASVs at the genus- and species-level, compared with the commonly used universal reference databases. Importantly, the placeholder names provide a way to explore the unclassified environmental taxa at different taxonomic ranks, which in combination with in situ analyses can be used to uncover their ecological roles.
format article
author Morten Simonsen Dueholm
Kasper Skytte Andersen
Simon Jon McIlroy
Jannie Munk Kristensen
Erika Yashiro
Søren Michael Karst
Mads Albertsen
Per Halkjær Nielsen
author_facet Morten Simonsen Dueholm
Kasper Skytte Andersen
Simon Jon McIlroy
Jannie Munk Kristensen
Erika Yashiro
Søren Michael Karst
Mads Albertsen
Per Halkjær Nielsen
author_sort Morten Simonsen Dueholm
title Generation of Comprehensive Ecosystem-Specific Reference Databases with Species-Level Resolution by High-Throughput Full-Length 16S rRNA Gene Sequencing and Automated Taxonomy Assignment (AutoTax)
title_short Generation of Comprehensive Ecosystem-Specific Reference Databases with Species-Level Resolution by High-Throughput Full-Length 16S rRNA Gene Sequencing and Automated Taxonomy Assignment (AutoTax)
title_full Generation of Comprehensive Ecosystem-Specific Reference Databases with Species-Level Resolution by High-Throughput Full-Length 16S rRNA Gene Sequencing and Automated Taxonomy Assignment (AutoTax)
title_fullStr Generation of Comprehensive Ecosystem-Specific Reference Databases with Species-Level Resolution by High-Throughput Full-Length 16S rRNA Gene Sequencing and Automated Taxonomy Assignment (AutoTax)
title_full_unstemmed Generation of Comprehensive Ecosystem-Specific Reference Databases with Species-Level Resolution by High-Throughput Full-Length 16S rRNA Gene Sequencing and Automated Taxonomy Assignment (AutoTax)
title_sort generation of comprehensive ecosystem-specific reference databases with species-level resolution by high-throughput full-length 16s rrna gene sequencing and automated taxonomy assignment (autotax)
publisher American Society for Microbiology
publishDate 2020
url https://doaj.org/article/97839cfdc7064442bf889f2a3f10e3d7
work_keys_str_mv AT mortensimonsendueholm generationofcomprehensiveecosystemspecificreferencedatabaseswithspecieslevelresolutionbyhighthroughputfulllength16srrnagenesequencingandautomatedtaxonomyassignmentautotax
AT kasperskytteandersen generationofcomprehensiveecosystemspecificreferencedatabaseswithspecieslevelresolutionbyhighthroughputfulllength16srrnagenesequencingandautomatedtaxonomyassignmentautotax
AT simonjonmcilroy generationofcomprehensiveecosystemspecificreferencedatabaseswithspecieslevelresolutionbyhighthroughputfulllength16srrnagenesequencingandautomatedtaxonomyassignmentautotax
AT janniemunkkristensen generationofcomprehensiveecosystemspecificreferencedatabaseswithspecieslevelresolutionbyhighthroughputfulllength16srrnagenesequencingandautomatedtaxonomyassignmentautotax
AT erikayashiro generationofcomprehensiveecosystemspecificreferencedatabaseswithspecieslevelresolutionbyhighthroughputfulllength16srrnagenesequencingandautomatedtaxonomyassignmentautotax
AT sørenmichaelkarst generationofcomprehensiveecosystemspecificreferencedatabaseswithspecieslevelresolutionbyhighthroughputfulllength16srrnagenesequencingandautomatedtaxonomyassignmentautotax
AT madsalbertsen generationofcomprehensiveecosystemspecificreferencedatabaseswithspecieslevelresolutionbyhighthroughputfulllength16srrnagenesequencingandautomatedtaxonomyassignmentautotax
AT perhalkjærnielsen generationofcomprehensiveecosystemspecificreferencedatabaseswithspecieslevelresolutionbyhighthroughputfulllength16srrnagenesequencingandautomatedtaxonomyassignmentautotax
_version_ 1718426918781452288