A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions

ABSTRACT Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Xiang Gao, Huaiying Lin, Qunfeng Dong
Formato: article
Lenguaje:EN
Publicado: American Society for Microbiology 2017
Materias:
Acceso en línea:https://doaj.org/article/76e69d6fdfac45eba841fa8ed8d7a4d8
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:76e69d6fdfac45eba841fa8ed8d7a4d8
record_format dspace
spelling oai:doaj.org-article:76e69d6fdfac45eba841fa8ed8d7a4d82021-11-15T15:21:51ZA Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions10.1128/mSphereDirect.00536-172379-5042https://doaj.org/article/76e69d6fdfac45eba841fa8ed8d7a4d82017-12-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSphereDirect.00536-17https://doaj.org/toc/2379-5042ABSTRACT Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes’ theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC . IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.Xiang GaoHuaiying LinQunfeng DongAmerican Society for MicrobiologyarticleBayes classifierDirichlet-multinomial distributiondisease diagnosismicrobiomeMicrobiologyQR1-502ENmSphere, Vol 2, Iss 6 (2017)
institution DOAJ
collection DOAJ
language EN
topic Bayes classifier
Dirichlet-multinomial distribution
disease diagnosis
microbiome
Microbiology
QR1-502
spellingShingle Bayes classifier
Dirichlet-multinomial distribution
disease diagnosis
microbiome
Microbiology
QR1-502
Xiang Gao
Huaiying Lin
Qunfeng Dong
A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions
description ABSTRACT Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes’ theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC . IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.
format article
author Xiang Gao
Huaiying Lin
Qunfeng Dong
author_facet Xiang Gao
Huaiying Lin
Qunfeng Dong
author_sort Xiang Gao
title A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions
title_short A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions
title_full A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions
title_fullStr A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions
title_full_unstemmed A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions
title_sort dirichlet-multinomial bayes classifier for disease diagnosis with microbial compositions
publisher American Society for Microbiology
publishDate 2017
url https://doaj.org/article/76e69d6fdfac45eba841fa8ed8d7a4d8
work_keys_str_mv AT xianggao adirichletmultinomialbayesclassifierfordiseasediagnosiswithmicrobialcompositions
AT huaiyinglin adirichletmultinomialbayesclassifierfordiseasediagnosiswithmicrobialcompositions
AT qunfengdong adirichletmultinomialbayesclassifierfordiseasediagnosiswithmicrobialcompositions
AT xianggao dirichletmultinomialbayesclassifierfordiseasediagnosiswithmicrobialcompositions
AT huaiyinglin dirichletmultinomialbayesclassifierfordiseasediagnosiswithmicrobialcompositions
AT qunfengdong dirichletmultinomialbayesclassifierfordiseasediagnosiswithmicrobialcompositions
_version_ 1718428078448836608