Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data

Abstract Background For differential abundance analysis, zero-inflated generalized linear models, typically zero-inflated NB models, have been increasingly used to model microbiome and other sequencing count data. A common assumption in estimating the false discovery rate is that the p values are un...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Wei Bai, Mei Dong, Longhai Li, Cindy Feng, Wei Xu
Formato: article
Lenguaje:EN
Publicado: BMC 2021
Materias:
Acceso en línea:https://doaj.org/article/f4aa6b907cfa402ea9f39d22418ec6df
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:f4aa6b907cfa402ea9f39d22418ec6df
record_format dspace
spelling oai:doaj.org-article:f4aa6b907cfa402ea9f39d22418ec6df2021-11-28T12:11:08ZRandomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data10.1186/s12859-021-04371-61471-2105https://doaj.org/article/f4aa6b907cfa402ea9f39d22418ec6df2021-11-01T00:00:00Zhttps://doi.org/10.1186/s12859-021-04371-6https://doaj.org/toc/1471-2105Abstract Background For differential abundance analysis, zero-inflated generalized linear models, typically zero-inflated NB models, have been increasingly used to model microbiome and other sequencing count data. A common assumption in estimating the false discovery rate is that the p values are uniformly distributed under the null hypothesis, which demands that the postulated model fit the count data adequately. Mis-specification of the distribution of the count data may lead to excess false discoveries. Therefore, model checking is critical to control the FDR at a nominal level in differential abundance analysis. Increasing studies show that the method of randomized quantile residual (RQR) performs well in diagnosing count regression models. However, the performance of RQR in diagnosing zero-inflated GLMMs for sequencing count data has not been extensively investigated in the literature. Results We conduct large-scale simulation studies to investigate the performance of the RQRs for zero-inflated GLMMs. The simulation studies show that the type I error rates of the GOF tests with RQRs are very close to the nominal level; in addition, the scatter-plots and Q–Q plots of RQRs are useful in discerning the good and bad models. We also apply the RQRs to diagnose six GLMMs to a real microbiome dataset. The results show that the OTU counts at the genus level of this dataset (after a truncation treatment) can be modelled well by zero-inflated and zero-modified NB models. Conclusion RQR is an excellent tool for diagnosing GLMMs for zero-inflated count data, particularly the sequencing count data arising in microbiome studies. In the supplementary materials, we provided two generic R functions, called rqr.glmmtmb and rqr.hurdle.glmmtmb, for calculating the RQRs given fitting outputs of the R package glmmTMB.Wei BaiMei DongLonghai LiCindy FengWei XuBMCarticleComputer applications to medicine. Medical informaticsR858-859.7Biology (General)QH301-705.5ENBMC Bioinformatics, Vol 22, Iss 1, Pp 1-18 (2021)
institution DOAJ
collection DOAJ
language EN
topic Computer applications to medicine. Medical informatics
R858-859.7
Biology (General)
QH301-705.5
spellingShingle Computer applications to medicine. Medical informatics
R858-859.7
Biology (General)
QH301-705.5
Wei Bai
Mei Dong
Longhai Li
Cindy Feng
Wei Xu
Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data
description Abstract Background For differential abundance analysis, zero-inflated generalized linear models, typically zero-inflated NB models, have been increasingly used to model microbiome and other sequencing count data. A common assumption in estimating the false discovery rate is that the p values are uniformly distributed under the null hypothesis, which demands that the postulated model fit the count data adequately. Mis-specification of the distribution of the count data may lead to excess false discoveries. Therefore, model checking is critical to control the FDR at a nominal level in differential abundance analysis. Increasing studies show that the method of randomized quantile residual (RQR) performs well in diagnosing count regression models. However, the performance of RQR in diagnosing zero-inflated GLMMs for sequencing count data has not been extensively investigated in the literature. Results We conduct large-scale simulation studies to investigate the performance of the RQRs for zero-inflated GLMMs. The simulation studies show that the type I error rates of the GOF tests with RQRs are very close to the nominal level; in addition, the scatter-plots and Q–Q plots of RQRs are useful in discerning the good and bad models. We also apply the RQRs to diagnose six GLMMs to a real microbiome dataset. The results show that the OTU counts at the genus level of this dataset (after a truncation treatment) can be modelled well by zero-inflated and zero-modified NB models. Conclusion RQR is an excellent tool for diagnosing GLMMs for zero-inflated count data, particularly the sequencing count data arising in microbiome studies. In the supplementary materials, we provided two generic R functions, called rqr.glmmtmb and rqr.hurdle.glmmtmb, for calculating the RQRs given fitting outputs of the R package glmmTMB.
format article
author Wei Bai
Mei Dong
Longhai Li
Cindy Feng
Wei Xu
author_facet Wei Bai
Mei Dong
Longhai Li
Cindy Feng
Wei Xu
author_sort Wei Bai
title Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data
title_short Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data
title_full Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data
title_fullStr Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data
title_full_unstemmed Randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data
title_sort randomized quantile residuals for diagnosing zero-inflated generalized linear mixed models with applications to microbiome count data
publisher BMC
publishDate 2021
url https://doaj.org/article/f4aa6b907cfa402ea9f39d22418ec6df
work_keys_str_mv AT weibai randomizedquantileresidualsfordiagnosingzeroinflatedgeneralizedlinearmixedmodelswithapplicationstomicrobiomecountdata
AT meidong randomizedquantileresidualsfordiagnosingzeroinflatedgeneralizedlinearmixedmodelswithapplicationstomicrobiomecountdata
AT longhaili randomizedquantileresidualsfordiagnosingzeroinflatedgeneralizedlinearmixedmodelswithapplicationstomicrobiomecountdata
AT cindyfeng randomizedquantileresidualsfordiagnosingzeroinflatedgeneralizedlinearmixedmodelswithapplicationstomicrobiomecountdata
AT weixu randomizedquantileresidualsfordiagnosingzeroinflatedgeneralizedlinearmixedmodelswithapplicationstomicrobiomecountdata
_version_ 1718408175397371904