Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities

Abstract Amplicon sequencing has revolutionized our ability to study DNA collected from environmental samples by providing a rapid and sensitive technique for microbial community analysis that eliminates the challenges associated with lab cultivation and taxonomic identification through microscopy....

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ellen S. Cameron, Philip J. Schmidt, Benjamin J.-M. Tremblay, Monica B. Emelko, Kirsten M. Müller
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/ac2f8ee28b1a482b837a75acc69d8a76
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:ac2f8ee28b1a482b837a75acc69d8a76
record_format dspace
spelling oai:doaj.org-article:ac2f8ee28b1a482b837a75acc69d8a762021-11-21T12:17:19ZEnhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities10.1038/s41598-021-01636-12045-2322https://doaj.org/article/ac2f8ee28b1a482b837a75acc69d8a762021-11-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-01636-1https://doaj.org/toc/2045-2322Abstract Amplicon sequencing has revolutionized our ability to study DNA collected from environmental samples by providing a rapid and sensitive technique for microbial community analysis that eliminates the challenges associated with lab cultivation and taxonomic identification through microscopy. In water resources management, it can be especially useful to evaluate ecosystem shifts in response to natural and anthropogenic landscape disturbances to signal potential water quality concerns, such as the detection of toxic cyanobacteria or pathogenic bacteria. Amplicon sequencing data consist of discrete counts of sequence reads, the sum of which is the library size. Groups of samples typically have different library sizes that are not representative of biological variation; library size normalization is required to meaningfully compare diversity between them. Rarefaction is a widely used normalization technique that involves the random subsampling of sequences from the initial sample library to a selected normalized library size. This process is often dismissed as statistically invalid because subsampling effectively discards a portion of the observed sequences, yet it remains prevalent in practice and the suitability of rarefying, relative to many other normalization approaches, for diversity analysis has been argued. Here, repeated rarefying is proposed as a tool to normalize library sizes for diversity analyses. This enables (i) proportionate representation of all observed sequences and (ii) characterization of the random variation introduced to diversity analyses by rarefying to a smaller library size shared by all samples. While many deterministic data transformations are not tailored to produce equal library sizes, repeatedly rarefying reflects the probabilistic process by which amplicon sequencing data are obtained as a representation of the amplified source microbial community. Specifically, it evaluates which data might have been obtained if a particular sample’s library size had been smaller and allows graphical representation of the effects of this library size normalization process upon diversity analysis results.Ellen S. CameronPhilip J. SchmidtBenjamin J.-M. TremblayMonica B. EmelkoKirsten M. MüllerNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-13 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Ellen S. Cameron
Philip J. Schmidt
Benjamin J.-M. Tremblay
Monica B. Emelko
Kirsten M. Müller
Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities
description Abstract Amplicon sequencing has revolutionized our ability to study DNA collected from environmental samples by providing a rapid and sensitive technique for microbial community analysis that eliminates the challenges associated with lab cultivation and taxonomic identification through microscopy. In water resources management, it can be especially useful to evaluate ecosystem shifts in response to natural and anthropogenic landscape disturbances to signal potential water quality concerns, such as the detection of toxic cyanobacteria or pathogenic bacteria. Amplicon sequencing data consist of discrete counts of sequence reads, the sum of which is the library size. Groups of samples typically have different library sizes that are not representative of biological variation; library size normalization is required to meaningfully compare diversity between them. Rarefaction is a widely used normalization technique that involves the random subsampling of sequences from the initial sample library to a selected normalized library size. This process is often dismissed as statistically invalid because subsampling effectively discards a portion of the observed sequences, yet it remains prevalent in practice and the suitability of rarefying, relative to many other normalization approaches, for diversity analysis has been argued. Here, repeated rarefying is proposed as a tool to normalize library sizes for diversity analyses. This enables (i) proportionate representation of all observed sequences and (ii) characterization of the random variation introduced to diversity analyses by rarefying to a smaller library size shared by all samples. While many deterministic data transformations are not tailored to produce equal library sizes, repeatedly rarefying reflects the probabilistic process by which amplicon sequencing data are obtained as a representation of the amplified source microbial community. Specifically, it evaluates which data might have been obtained if a particular sample’s library size had been smaller and allows graphical representation of the effects of this library size normalization process upon diversity analysis results.
format article
author Ellen S. Cameron
Philip J. Schmidt
Benjamin J.-M. Tremblay
Monica B. Emelko
Kirsten M. Müller
author_facet Ellen S. Cameron
Philip J. Schmidt
Benjamin J.-M. Tremblay
Monica B. Emelko
Kirsten M. Müller
author_sort Ellen S. Cameron
title Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities
title_short Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities
title_full Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities
title_fullStr Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities
title_full_unstemmed Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities
title_sort enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/ac2f8ee28b1a482b837a75acc69d8a76
work_keys_str_mv AT ellenscameron enhancingdiversityanalysisbyrepeatedlyrarefyingnextgenerationsequencingdatadescribingmicrobialcommunities
AT philipjschmidt enhancingdiversityanalysisbyrepeatedlyrarefyingnextgenerationsequencingdatadescribingmicrobialcommunities
AT benjaminjmtremblay enhancingdiversityanalysisbyrepeatedlyrarefyingnextgenerationsequencingdatadescribingmicrobialcommunities
AT monicabemelko enhancingdiversityanalysisbyrepeatedlyrarefyingnextgenerationsequencingdatadescribingmicrobialcommunities
AT kirstenmmuller enhancingdiversityanalysisbyrepeatedlyrarefyingnextgenerationsequencingdatadescribingmicrobialcommunities
_version_ 1718419086853013504