Controlling for Contaminants in Low-Biomass 16S rRNA Gene Sequencing Experiments

ABSTRACT Microbial communities are commonly studied using culture-independent methods, such as 16S rRNA gene sequencing. However, one challenge in accurately characterizing microbial communities is exogenous bacterial DNA contamination, particularly in low-microbial-biomass niches. Computational app...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Lisa Karstens, Mark Asquith, Sean Davin, Damien Fair, W. Thomas Gregory, Alan J. Wolfe, Jonathan Braun, Shannon McWeeney
Formato: article
Lenguaje:EN
Publicado: American Society for Microbiology 2019
Materias:
Acceso en línea:https://doaj.org/article/89ae3b9ffc60409b9f8dc40bafba3212
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:89ae3b9ffc60409b9f8dc40bafba3212
record_format dspace
spelling oai:doaj.org-article:89ae3b9ffc60409b9f8dc40bafba32122021-12-02T19:46:18ZControlling for Contaminants in Low-Biomass 16S rRNA Gene Sequencing Experiments10.1128/mSystems.00290-192379-5077https://doaj.org/article/89ae3b9ffc60409b9f8dc40bafba32122019-08-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSystems.00290-19https://doaj.org/toc/2379-5077ABSTRACT Microbial communities are commonly studied using culture-independent methods, such as 16S rRNA gene sequencing. However, one challenge in accurately characterizing microbial communities is exogenous bacterial DNA contamination, particularly in low-microbial-biomass niches. Computational approaches to identify contaminant sequences have been proposed, but their performance has not been independently evaluated. To identify the impact of decreasing microbial biomass on polymicrobial 16S rRNA gene sequencing experiments, we created a mock microbial community dilution series. We evaluated four computational approaches to identify and remove contaminants, as follows: (i) filtering sequences present in a negative control, (ii) filtering sequences based on relative abundance, (iii) identifying sequences that have an inverse correlation with DNA concentration implemented in Decontam, and (iv) predicting the sequence proportion arising from defined contaminant sources implemented in SourceTracker. As expected, the proportion of contaminant bacterial DNA increased with decreasing starting microbial biomass, with 80.1% of the most diluted sample arising from contaminant sequences. Inclusion of contaminant sequences led to overinflated diversity estimates and distorted microbiome composition. All methods for contaminant identification successfully identified some contaminant sequences, which varied depending on the method parameters used and contaminant prevalence. Notably, removing sequences present in a negative control erroneously removed >20% of expected sequences. SourceTracker successfully removed over 98% of contaminants when the experimental environments were well defined. However, SourceTracker misclassified expected sequences and performed poorly when the experimental environment was unknown, failing to remove >97% of contaminants. In contrast, the Decontam frequency method did not remove expected sequences and successfully removed 70 to 90% of the contaminants. IMPORTANCE The relative scarcity of microbes in low-microbial-biomass environments makes accurate determination of community composition challenging. Identifying and controlling for contaminant bacterial DNA are critical steps in understanding microbial communities from these low-biomass environments. Our study introduces the use of a mock community dilution series as a positive control and evaluates four computational strategies that can identify contaminants in 16S rRNA gene sequencing experiments in order to remove them from downstream analyses. The appropriate computational approach for removing contaminant sequences from an experiment depends on prior knowledge about the microbial environment under investigation and can be evaluated with a dilution series of a mock microbial community.Lisa KarstensMark AsquithSean DavinDamien FairW. Thomas GregoryAlan J. WolfeJonathan BraunShannon McWeeneyAmerican Society for Microbiologyarticle16S rRNA gene sequencingcontaminationDecontamlow microbial biomassmicrobiomeSourceTrackerMicrobiologyQR1-502ENmSystems, Vol 4, Iss 4 (2019)
institution DOAJ
collection DOAJ
language EN
topic 16S rRNA gene sequencing
contamination
Decontam
low microbial biomass
microbiome
SourceTracker
Microbiology
QR1-502
spellingShingle 16S rRNA gene sequencing
contamination
Decontam
low microbial biomass
microbiome
SourceTracker
Microbiology
QR1-502
Lisa Karstens
Mark Asquith
Sean Davin
Damien Fair
W. Thomas Gregory
Alan J. Wolfe
Jonathan Braun
Shannon McWeeney
Controlling for Contaminants in Low-Biomass 16S rRNA Gene Sequencing Experiments
description ABSTRACT Microbial communities are commonly studied using culture-independent methods, such as 16S rRNA gene sequencing. However, one challenge in accurately characterizing microbial communities is exogenous bacterial DNA contamination, particularly in low-microbial-biomass niches. Computational approaches to identify contaminant sequences have been proposed, but their performance has not been independently evaluated. To identify the impact of decreasing microbial biomass on polymicrobial 16S rRNA gene sequencing experiments, we created a mock microbial community dilution series. We evaluated four computational approaches to identify and remove contaminants, as follows: (i) filtering sequences present in a negative control, (ii) filtering sequences based on relative abundance, (iii) identifying sequences that have an inverse correlation with DNA concentration implemented in Decontam, and (iv) predicting the sequence proportion arising from defined contaminant sources implemented in SourceTracker. As expected, the proportion of contaminant bacterial DNA increased with decreasing starting microbial biomass, with 80.1% of the most diluted sample arising from contaminant sequences. Inclusion of contaminant sequences led to overinflated diversity estimates and distorted microbiome composition. All methods for contaminant identification successfully identified some contaminant sequences, which varied depending on the method parameters used and contaminant prevalence. Notably, removing sequences present in a negative control erroneously removed >20% of expected sequences. SourceTracker successfully removed over 98% of contaminants when the experimental environments were well defined. However, SourceTracker misclassified expected sequences and performed poorly when the experimental environment was unknown, failing to remove >97% of contaminants. In contrast, the Decontam frequency method did not remove expected sequences and successfully removed 70 to 90% of the contaminants. IMPORTANCE The relative scarcity of microbes in low-microbial-biomass environments makes accurate determination of community composition challenging. Identifying and controlling for contaminant bacterial DNA are critical steps in understanding microbial communities from these low-biomass environments. Our study introduces the use of a mock community dilution series as a positive control and evaluates four computational strategies that can identify contaminants in 16S rRNA gene sequencing experiments in order to remove them from downstream analyses. The appropriate computational approach for removing contaminant sequences from an experiment depends on prior knowledge about the microbial environment under investigation and can be evaluated with a dilution series of a mock microbial community.
format article
author Lisa Karstens
Mark Asquith
Sean Davin
Damien Fair
W. Thomas Gregory
Alan J. Wolfe
Jonathan Braun
Shannon McWeeney
author_facet Lisa Karstens
Mark Asquith
Sean Davin
Damien Fair
W. Thomas Gregory
Alan J. Wolfe
Jonathan Braun
Shannon McWeeney
author_sort Lisa Karstens
title Controlling for Contaminants in Low-Biomass 16S rRNA Gene Sequencing Experiments
title_short Controlling for Contaminants in Low-Biomass 16S rRNA Gene Sequencing Experiments
title_full Controlling for Contaminants in Low-Biomass 16S rRNA Gene Sequencing Experiments
title_fullStr Controlling for Contaminants in Low-Biomass 16S rRNA Gene Sequencing Experiments
title_full_unstemmed Controlling for Contaminants in Low-Biomass 16S rRNA Gene Sequencing Experiments
title_sort controlling for contaminants in low-biomass 16s rrna gene sequencing experiments
publisher American Society for Microbiology
publishDate 2019
url https://doaj.org/article/89ae3b9ffc60409b9f8dc40bafba3212
work_keys_str_mv AT lisakarstens controllingforcontaminantsinlowbiomass16srrnagenesequencingexperiments
AT markasquith controllingforcontaminantsinlowbiomass16srrnagenesequencingexperiments
AT seandavin controllingforcontaminantsinlowbiomass16srrnagenesequencingexperiments
AT damienfair controllingforcontaminantsinlowbiomass16srrnagenesequencingexperiments
AT wthomasgregory controllingforcontaminantsinlowbiomass16srrnagenesequencingexperiments
AT alanjwolfe controllingforcontaminantsinlowbiomass16srrnagenesequencingexperiments
AT jonathanbraun controllingforcontaminantsinlowbiomass16srrnagenesequencingexperiments
AT shannonmcweeney controllingforcontaminantsinlowbiomass16srrnagenesequencingexperiments
_version_ 1718376037180506112