BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities
ABSTRACT High-throughput 16S rRNA gene sequencing technologies have robust potential to improve our understanding of bee (Hymenoptera: Apoidea)-associated microbial communities and their impact on hive health and disease. Despite recent computation algorithms now permitting exact inferencing of high...
Guardado en:
Autores principales: | , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
American Society for Microbiology
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/9c14296e762641a5a6a0c0ef3589a7c0 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:9c14296e762641a5a6a0c0ef3589a7c0 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:9c14296e762641a5a6a0c0ef3589a7c02021-12-02T17:07:47ZBEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities10.1128/mSystems.00082-212379-5077https://doaj.org/article/9c14296e762641a5a6a0c0ef3589a7c02021-04-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSystems.00082-21https://doaj.org/toc/2379-5077ABSTRACT High-throughput 16S rRNA gene sequencing technologies have robust potential to improve our understanding of bee (Hymenoptera: Apoidea)-associated microbial communities and their impact on hive health and disease. Despite recent computation algorithms now permitting exact inferencing of high-resolution exact amplicon sequence variants (ASVs), the taxonomic classification of these ASVs remains a challenge due to inadequate reference databases. To address this, we assemble a comprehensive data set of all publicly available bee-associated 16S rRNA gene sequences, systematically annotate poorly resolved identities via inclusion of 618 placeholder labels for uncultivated microbial dark matter, and correct for phylogenetic inconsistencies using a complementary set of distance-based and maximum likelihood correction strategies. To benchmark the resultant database (BEExact), we compare performance against all existing reference databases in silico using a variety of classifier algorithms to produce probabilistic confidence scores. We also validate realistic classification rates on an independent set of ∼234 million short-read sequences derived from 32 studies encompassing 50 different bee types (36 eusocial and 14 solitary). Species-level classification rates on short-read ASVs range from 80 to 90% using BEExact (with ∼20% due to “bxid” placeholder names), whereas only ∼30% at best can be resolved with current universal databases. A series of data-driven recommendations are developed for future studies. We conclude that BEExact (https://github.com/bdaisley/BEExact) enables accurate and standardized microbiota profiling across a broad range of bee species—two factors of key importance to reproducibility and meaningful knowledge exchange within the scientific community that together, can enhance the overall utility and ecological relevance of routine 16S rRNA gene-based sequencing endeavors. IMPORTANCE The failure of current universal taxonomic databases to support the rapidly expanding field of bee microbiota research has led to many investigators relying on “in-house” reference sets or manual classification of sequence reads (usually based on BLAST searches), often with vague identity thresholds and subjective taxonomy choices. This time-consuming, error- and bias-prone process lacks standardization, cripples the potential for comparative cross-study analysis, and in many cases is likely to incorrectly sway study conclusions. BEExact is structured on and leverages several complementary bioinformatic techniques to enable refined inference of bee host-associated microbial communities without any other methodological modifications necessary. It also bridges the gap between current practical outcomes (i.e., phylotype-to-genus level constraints with 97% operational taxonomic units [OTUs]) and the theoretical resolution (i.e., species-to-strain level classification with 100% ASVs) attainable in future microbiota investigations. Other niche habitats could also likely benefit from customized database curation via implementation of the novel approaches introduced in this study.Brendan A. DaisleyGregor ReidAmerican Society for Microbiologyarticlemicrobiotabees16S rRNA gene sequencingmicrobial ecologybioinformaticshost-microbe interactionsMicrobiologyQR1-502ENmSystems, Vol 6, Iss 2 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
microbiota bees 16S rRNA gene sequencing microbial ecology bioinformatics host-microbe interactions Microbiology QR1-502 |
spellingShingle |
microbiota bees 16S rRNA gene sequencing microbial ecology bioinformatics host-microbe interactions Microbiology QR1-502 Brendan A. Daisley Gregor Reid BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities |
description |
ABSTRACT High-throughput 16S rRNA gene sequencing technologies have robust potential to improve our understanding of bee (Hymenoptera: Apoidea)-associated microbial communities and their impact on hive health and disease. Despite recent computation algorithms now permitting exact inferencing of high-resolution exact amplicon sequence variants (ASVs), the taxonomic classification of these ASVs remains a challenge due to inadequate reference databases. To address this, we assemble a comprehensive data set of all publicly available bee-associated 16S rRNA gene sequences, systematically annotate poorly resolved identities via inclusion of 618 placeholder labels for uncultivated microbial dark matter, and correct for phylogenetic inconsistencies using a complementary set of distance-based and maximum likelihood correction strategies. To benchmark the resultant database (BEExact), we compare performance against all existing reference databases in silico using a variety of classifier algorithms to produce probabilistic confidence scores. We also validate realistic classification rates on an independent set of ∼234 million short-read sequences derived from 32 studies encompassing 50 different bee types (36 eusocial and 14 solitary). Species-level classification rates on short-read ASVs range from 80 to 90% using BEExact (with ∼20% due to “bxid” placeholder names), whereas only ∼30% at best can be resolved with current universal databases. A series of data-driven recommendations are developed for future studies. We conclude that BEExact (https://github.com/bdaisley/BEExact) enables accurate and standardized microbiota profiling across a broad range of bee species—two factors of key importance to reproducibility and meaningful knowledge exchange within the scientific community that together, can enhance the overall utility and ecological relevance of routine 16S rRNA gene-based sequencing endeavors. IMPORTANCE The failure of current universal taxonomic databases to support the rapidly expanding field of bee microbiota research has led to many investigators relying on “in-house” reference sets or manual classification of sequence reads (usually based on BLAST searches), often with vague identity thresholds and subjective taxonomy choices. This time-consuming, error- and bias-prone process lacks standardization, cripples the potential for comparative cross-study analysis, and in many cases is likely to incorrectly sway study conclusions. BEExact is structured on and leverages several complementary bioinformatic techniques to enable refined inference of bee host-associated microbial communities without any other methodological modifications necessary. It also bridges the gap between current practical outcomes (i.e., phylotype-to-genus level constraints with 97% operational taxonomic units [OTUs]) and the theoretical resolution (i.e., species-to-strain level classification with 100% ASVs) attainable in future microbiota investigations. Other niche habitats could also likely benefit from customized database curation via implementation of the novel approaches introduced in this study. |
format |
article |
author |
Brendan A. Daisley Gregor Reid |
author_facet |
Brendan A. Daisley Gregor Reid |
author_sort |
Brendan A. Daisley |
title |
BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities |
title_short |
BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities |
title_full |
BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities |
title_fullStr |
BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities |
title_full_unstemmed |
BEExact: a Metataxonomic Database Tool for High-Resolution Inference of Bee-Associated Microbial Communities |
title_sort |
beexact: a metataxonomic database tool for high-resolution inference of bee-associated microbial communities |
publisher |
American Society for Microbiology |
publishDate |
2021 |
url |
https://doaj.org/article/9c14296e762641a5a6a0c0ef3589a7c0 |
work_keys_str_mv |
AT brendanadaisley beexactametataxonomicdatabasetoolforhighresolutioninferenceofbeeassociatedmicrobialcommunities AT gregorreid beexactametataxonomicdatabasetoolforhighresolutioninferenceofbeeassociatedmicrobialcommunities |
_version_ |
1718381570697461760 |