Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse

This article presents a novel and remarkably efficient method of computing the statistical G-test made possible by exploiting a connection with the fundamental elements of information theory: by writing the <i>G</i> statistic as a sum of joint entropy terms, its computation is decomposed...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Camil Băncioiu, Remus Brad
Formato:	article
Lenguaje:	EN
Publicado:	MDPI AG 2021
Materias:	Markov blanket feature selection causal inference G-test information theory computation reuse Science Q Astrophysics QB460-466 Physics QC1-999
Acceso en línea:	https://doaj.org/article/2a57924c93f741e0b2061a803f960863
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:2a57924c93f741e0b2061a803f960863
record_format	dspace
spelling	oai:doaj.org-article:2a57924c93f741e0b2061a803f9608632021-11-25T17:30:14ZAccelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse10.3390/e231115011099-4300https://doaj.org/article/2a57924c93f741e0b2061a803f9608632021-11-01T00:00:00Zhttps://www.mdpi.com/1099-4300/23/11/1501https://doaj.org/toc/1099-4300This article presents a novel and remarkably efficient method of computing the statistical G-test made possible by exploiting a connection with the fundamental elements of information theory: by writing the <i>G</i> statistic as a sum of joint entropy terms, its computation is decomposed into easily reusable partial results with no change in the resulting value. This method greatly improves the efficiency of applications that perform a series of G-tests on permutations of the same features, such as feature selection and causal inference applications because this decomposition allows for an intensive reuse of these partial results. The efficiency of this method is demonstrated by implementing it as part of an experiment involving IPC–MB, an efficient Markov blanket discovery algorithm, applicable both as a feature selection algorithm and as a causal inference method. The results show outstanding efficiency gains for IPC–MB when the G-test is computed with the proposed method, compared to the unoptimized G-test, but also when compared to IPC–MB++, a variant of IPC–MB which is enhanced with an AD–tree, both static and dynamic. Even if this proposed method of computing the G-test is presented here in the context of IPC–MB, it is in fact bound neither to IPC–MB in particular, nor to feature selection or causal inference applications in general, because this method targets the information-theoretic concept that underlies the G-test, namely conditional mutual information. This aspect grants it wide applicability in data sciences.Camil BăncioiuRemus BradMDPI AGarticleMarkov blanketfeature selectioncausal inferenceG-testinformation theorycomputation reuseScienceQAstrophysicsQB460-466PhysicsQC1-999ENEntropy, Vol 23, Iss 1501, p 1501 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Markov blanket feature selection causal inference G-test information theory computation reuse Science Q Astrophysics QB460-466 Physics QC1-999
spellingShingle	Markov blanket feature selection causal inference G-test information theory computation reuse Science Q Astrophysics QB460-466 Physics QC1-999 Camil Băncioiu Remus Brad Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse
description	This article presents a novel and remarkably efficient method of computing the statistical G-test made possible by exploiting a connection with the fundamental elements of information theory: by writing the <i>G</i> statistic as a sum of joint entropy terms, its computation is decomposed into easily reusable partial results with no change in the resulting value. This method greatly improves the efficiency of applications that perform a series of G-tests on permutations of the same features, such as feature selection and causal inference applications because this decomposition allows for an intensive reuse of these partial results. The efficiency of this method is demonstrated by implementing it as part of an experiment involving IPC–MB, an efficient Markov blanket discovery algorithm, applicable both as a feature selection algorithm and as a causal inference method. The results show outstanding efficiency gains for IPC–MB when the G-test is computed with the proposed method, compared to the unoptimized G-test, but also when compared to IPC–MB++, a variant of IPC–MB which is enhanced with an AD–tree, both static and dynamic. Even if this proposed method of computing the G-test is presented here in the context of IPC–MB, it is in fact bound neither to IPC–MB in particular, nor to feature selection or causal inference applications in general, because this method targets the information-theoretic concept that underlies the G-test, namely conditional mutual information. This aspect grants it wide applicability in data sciences.
format	article
author	Camil Băncioiu Remus Brad
author_facet	Camil Băncioiu Remus Brad
author_sort	Camil Băncioiu
title	Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse
title_short	Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse
title_full	Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse
title_fullStr	Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse
title_full_unstemmed	Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse
title_sort	accelerating causal inference and feature selection methods through g-test computation reuse
publisher	MDPI AG
publishDate	2021
url	https://doaj.org/article/2a57924c93f741e0b2061a803f960863
work_keys_str_mv	AT camilbancioiu acceleratingcausalinferenceandfeatureselectionmethodsthroughgtestcomputationreuse AT remusbrad acceleratingcausalinferenceandfeatureselectionmethodsthroughgtestcomputationreuse
_version_	1718412274397347840

Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse

Ejemplares similares