A statistical model for describing and simulating microbial community profiles.

Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Siyuan Ma, Boyu Ren, Himel Mallick, Yo Sup Moon, Emma Schwager, Sagun Maharjan, Timothy L Tickle, Yiren Lu, Rachel N Carmody, Eric A Franzosa, Lucas Janson, Curtis Huttenhower
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
Acceso en línea:https://doaj.org/article/b5a61e321c084730884928cb547eac54
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:b5a61e321c084730884928cb547eac54
record_format dspace
spelling oai:doaj.org-article:b5a61e321c084730884928cb547eac542021-12-02T19:57:49ZA statistical model for describing and simulating microbial community profiles.1553-734X1553-735810.1371/journal.pcbi.1008913https://doaj.org/article/b5a61e321c084730884928cb547eac542021-09-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1008913https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or evaluate such methods within a single systematic framework. To address this challenge, we developed SparseDOSSA (Sparse Data Observations for the Simulation of Synthetic Abundances): a statistical model of microbial ecological population structure, which can be used to parameterize real-world microbial community profiles and to simulate new, realistic profiles of known structure for methods evaluation. Specifically, SparseDOSSA's model captures marginal microbial feature abundances as a zero-inflated log-normal distribution, with additional model components for absolute cell counts and the sequence read generation process, microbe-microbe, and microbe-environment interactions. Together, these allow fully known covariance structure between synthetic features (i.e. "taxa") or between features and "phenotypes" to be simulated for method benchmarking. Here, we demonstrate SparseDOSSA's performance for 1) accurately modeling human-associated microbial population profiles; 2) generating synthetic communities with controlled population and ecological structures; 3) spiking-in true positive synthetic associations to benchmark analysis methods; and 4) recapitulating an end-to-end mouse microbiome feeding experiment. Together, these represent the most common analysis types in assessment of real microbial community environmental and epidemiological statistics, thus demonstrating SparseDOSSA's utility as a general-purpose aid for modeling communities and evaluating quantitative methods. An open-source implementation is available at http://huttenhower.sph.harvard.edu/sparsedossa2.Siyuan MaBoyu RenHimel MallickYo Sup MoonEmma SchwagerSagun MaharjanTimothy L TickleYiren LuRachel N CarmodyEric A FranzosaLucas JansonCurtis HuttenhowerPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 9, p e1008913 (2021)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Siyuan Ma
Boyu Ren
Himel Mallick
Yo Sup Moon
Emma Schwager
Sagun Maharjan
Timothy L Tickle
Yiren Lu
Rachel N Carmody
Eric A Franzosa
Lucas Janson
Curtis Huttenhower
A statistical model for describing and simulating microbial community profiles.
description Many methods have been developed for statistical analysis of microbial community profiles, but due to the complex nature of typical microbiome measurements (e.g. sparsity, zero-inflation, non-independence, and compositionality) and of the associated underlying biology, it is difficult to compare or evaluate such methods within a single systematic framework. To address this challenge, we developed SparseDOSSA (Sparse Data Observations for the Simulation of Synthetic Abundances): a statistical model of microbial ecological population structure, which can be used to parameterize real-world microbial community profiles and to simulate new, realistic profiles of known structure for methods evaluation. Specifically, SparseDOSSA's model captures marginal microbial feature abundances as a zero-inflated log-normal distribution, with additional model components for absolute cell counts and the sequence read generation process, microbe-microbe, and microbe-environment interactions. Together, these allow fully known covariance structure between synthetic features (i.e. "taxa") or between features and "phenotypes" to be simulated for method benchmarking. Here, we demonstrate SparseDOSSA's performance for 1) accurately modeling human-associated microbial population profiles; 2) generating synthetic communities with controlled population and ecological structures; 3) spiking-in true positive synthetic associations to benchmark analysis methods; and 4) recapitulating an end-to-end mouse microbiome feeding experiment. Together, these represent the most common analysis types in assessment of real microbial community environmental and epidemiological statistics, thus demonstrating SparseDOSSA's utility as a general-purpose aid for modeling communities and evaluating quantitative methods. An open-source implementation is available at http://huttenhower.sph.harvard.edu/sparsedossa2.
format article
author Siyuan Ma
Boyu Ren
Himel Mallick
Yo Sup Moon
Emma Schwager
Sagun Maharjan
Timothy L Tickle
Yiren Lu
Rachel N Carmody
Eric A Franzosa
Lucas Janson
Curtis Huttenhower
author_facet Siyuan Ma
Boyu Ren
Himel Mallick
Yo Sup Moon
Emma Schwager
Sagun Maharjan
Timothy L Tickle
Yiren Lu
Rachel N Carmody
Eric A Franzosa
Lucas Janson
Curtis Huttenhower
author_sort Siyuan Ma
title A statistical model for describing and simulating microbial community profiles.
title_short A statistical model for describing and simulating microbial community profiles.
title_full A statistical model for describing and simulating microbial community profiles.
title_fullStr A statistical model for describing and simulating microbial community profiles.
title_full_unstemmed A statistical model for describing and simulating microbial community profiles.
title_sort statistical model for describing and simulating microbial community profiles.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/b5a61e321c084730884928cb547eac54
work_keys_str_mv AT siyuanma astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT boyuren astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT himelmallick astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT yosupmoon astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT emmaschwager astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT sagunmaharjan astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT timothyltickle astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT yirenlu astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT rachelncarmody astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT ericafranzosa astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT lucasjanson astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT curtishuttenhower astatisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT siyuanma statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT boyuren statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT himelmallick statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT yosupmoon statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT emmaschwager statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT sagunmaharjan statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT timothyltickle statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT yirenlu statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT rachelncarmody statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT ericafranzosa statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT lucasjanson statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
AT curtishuttenhower statisticalmodelfordescribingandsimulatingmicrobialcommunityprofiles
_version_ 1718375797854568448