G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes

Abstract In clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecifi...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Florent Le Borgne, Arthur Chatton, Maxime Léger, Rémi Lenain, Yohann Foucher
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/9a15763593834528a4a5936250ed81c7
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:9a15763593834528a4a5936250ed81c7
record_format dspace
spelling oai:doaj.org-article:9a15763593834528a4a5936250ed81c72021-12-02T15:23:04ZG-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes10.1038/s41598-021-81110-02045-2322https://doaj.org/article/9a15763593834528a4a5936250ed81c72021-01-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-81110-0https://doaj.org/toc/2045-2322Abstract In clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.Florent Le BorgneArthur ChattonMaxime LégerRémi LenainYohann FoucherNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-12 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Florent Le Borgne
Arthur Chatton
Maxime Léger
Rémi Lenain
Yohann Foucher
G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes
description Abstract In clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.
format article
author Florent Le Borgne
Arthur Chatton
Maxime Léger
Rémi Lenain
Yohann Foucher
author_facet Florent Le Borgne
Arthur Chatton
Maxime Léger
Rémi Lenain
Yohann Foucher
author_sort Florent Le Borgne
title G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes
title_short G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes
title_full G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes
title_fullStr G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes
title_full_unstemmed G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes
title_sort g-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/9a15763593834528a4a5936250ed81c7
work_keys_str_mv AT florentleborgne gcomputationandmachinelearningforestimatingthecausaleffectsofbinaryexposurestatusesonbinaryoutcomes
AT arthurchatton gcomputationandmachinelearningforestimatingthecausaleffectsofbinaryexposurestatusesonbinaryoutcomes
AT maximeleger gcomputationandmachinelearningforestimatingthecausaleffectsofbinaryexposurestatusesonbinaryoutcomes
AT remilenain gcomputationandmachinelearningforestimatingthecausaleffectsofbinaryexposurestatusesonbinaryoutcomes
AT yohannfoucher gcomputationandmachinelearningforestimatingthecausaleffectsofbinaryexposurestatusesonbinaryoutcomes
_version_ 1718387341717929984