Hollow-tree super: A directional and scalable approach for feature importance in boosted tree models

<h4>Purpose</h4> Current limitations in methodologies used throughout machine-learning to investigate feature importance in boosted tree modelling prevent the effective scaling to datasets with a large number of features, particularly when one is investigating both the magnitude and dire...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Stephane Doyen, Hugh Taylor, Peter Nicholas, Lewis Crawford, Isabella Young, Michael E. Sughrue
Formato:	article
Lenguaje:	EN
Publicado:	Public Library of Science (PLoS) 2021
Materias:	Medicine R Science Q
Acceso en línea:	https://doaj.org/article/ecc3631e3174436696d67ada24c562be
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:ecc3631e3174436696d67ada24c562be
record_format	dspace
spelling	oai:doaj.org-article:ecc3631e3174436696d67ada24c562be2021-11-04T06:09:18ZHollow-tree super: A directional and scalable approach for feature importance in boosted tree models1932-6203https://doaj.org/article/ecc3631e3174436696d67ada24c562be2021-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8544862/?tool=EBIhttps://doaj.org/toc/1932-6203<h4>Purpose</h4> Current limitations in methodologies used throughout machine-learning to investigate feature importance in boosted tree modelling prevent the effective scaling to datasets with a large number of features, particularly when one is investigating both the magnitude and directionality of various features on the classification into a positive or negative class. This manuscript presents a novel methodology, “Hollow-tree Super” (HOTS), designed to resolve and visualize feature importance in boosted tree models involving a large number of features. Further, this methodology allows for accurate investigation of the directionality and magnitude various features have on classification and incorporates cross-validation to improve the accuracy and validity of the determined features of importance. <h4>Methods</h4> Using the Iris dataset, we first highlight the characteristics of HOTS by comparing it to other commonly used techniques for feature importance, including Gini Importance, Partial Dependence Plots, and Permutation Importance, and explain how HOTS resolves the weaknesses present in these three strategies for investigating feature importance. We then demonstrate how HOTS can be utilized in high dimensional spaces such as neuroscientific setting, by taking 60 Schizophrenic subjects from the publicly available SchizConnect database and applying the method to determine which regions of the brain were most important for the positive and negative classification of schizophrenia as determined by the positive and negative syndrome scale (PANSS). <h4>Results</h4> HOTS effectively replicated and supported the findings of feature importance for classification of the Iris dataset when compared to Gini importance, Partial Dependence Plots and Permutation importance, determining ‘petal length’ as the most important feature for positive and negative classification. When applied to the Schizconnect dataset, HOTS was able to resolve from 379 independent features, the top 10 most important features for classification, as well as their directionality for classification and magnitude compared to other features. Cross-validation supported that these same 10 features were consistently used in the decision-making process across multiple trees, and these features were localised primarily to the occipital and parietal cortices, commonly disturbed brain regions in those afflicted with Schizophrenia. <h4>Conclusion</h4> HOTS effectively overcomes previous challenges of identifying feature importance at scale, and can be utilized across a swathe of disciplines. As computational power and data quantity continues to expand, it is imperative that a methodology is developed that is able to handle the demands of working with large datasets that contain a large number of features. This approach represents a unique way to investigate both the directionality and magnitude of feature importance when working at scale within a boosted tree model that can be easily visualized within commonly used software.Stephane DoyenHugh TaylorPeter NicholasLewis CrawfordIsabella YoungMichael E. SughruePublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 10 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Medicine R Science Q
spellingShingle	Medicine R Science Q Stephane Doyen Hugh Taylor Peter Nicholas Lewis Crawford Isabella Young Michael E. Sughrue Hollow-tree super: A directional and scalable approach for feature importance in boosted tree models
description	<h4>Purpose</h4> Current limitations in methodologies used throughout machine-learning to investigate feature importance in boosted tree modelling prevent the effective scaling to datasets with a large number of features, particularly when one is investigating both the magnitude and directionality of various features on the classification into a positive or negative class. This manuscript presents a novel methodology, “Hollow-tree Super” (HOTS), designed to resolve and visualize feature importance in boosted tree models involving a large number of features. Further, this methodology allows for accurate investigation of the directionality and magnitude various features have on classification and incorporates cross-validation to improve the accuracy and validity of the determined features of importance. <h4>Methods</h4> Using the Iris dataset, we first highlight the characteristics of HOTS by comparing it to other commonly used techniques for feature importance, including Gini Importance, Partial Dependence Plots, and Permutation Importance, and explain how HOTS resolves the weaknesses present in these three strategies for investigating feature importance. We then demonstrate how HOTS can be utilized in high dimensional spaces such as neuroscientific setting, by taking 60 Schizophrenic subjects from the publicly available SchizConnect database and applying the method to determine which regions of the brain were most important for the positive and negative classification of schizophrenia as determined by the positive and negative syndrome scale (PANSS). <h4>Results</h4> HOTS effectively replicated and supported the findings of feature importance for classification of the Iris dataset when compared to Gini importance, Partial Dependence Plots and Permutation importance, determining ‘petal length’ as the most important feature for positive and negative classification. When applied to the Schizconnect dataset, HOTS was able to resolve from 379 independent features, the top 10 most important features for classification, as well as their directionality for classification and magnitude compared to other features. Cross-validation supported that these same 10 features were consistently used in the decision-making process across multiple trees, and these features were localised primarily to the occipital and parietal cortices, commonly disturbed brain regions in those afflicted with Schizophrenia. <h4>Conclusion</h4> HOTS effectively overcomes previous challenges of identifying feature importance at scale, and can be utilized across a swathe of disciplines. As computational power and data quantity continues to expand, it is imperative that a methodology is developed that is able to handle the demands of working with large datasets that contain a large number of features. This approach represents a unique way to investigate both the directionality and magnitude of feature importance when working at scale within a boosted tree model that can be easily visualized within commonly used software.
format	article
author	Stephane Doyen Hugh Taylor Peter Nicholas Lewis Crawford Isabella Young Michael E. Sughrue
author_facet	Stephane Doyen Hugh Taylor Peter Nicholas Lewis Crawford Isabella Young Michael E. Sughrue
author_sort	Stephane Doyen
title	Hollow-tree super: A directional and scalable approach for feature importance in boosted tree models
title_short	Hollow-tree super: A directional and scalable approach for feature importance in boosted tree models
title_full	Hollow-tree super: A directional and scalable approach for feature importance in boosted tree models
title_fullStr	Hollow-tree super: A directional and scalable approach for feature importance in boosted tree models
title_full_unstemmed	Hollow-tree super: A directional and scalable approach for feature importance in boosted tree models
title_sort	hollow-tree super: a directional and scalable approach for feature importance in boosted tree models
publisher	Public Library of Science (PLoS)
publishDate	2021
url	https://doaj.org/article/ecc3631e3174436696d67ada24c562be
work_keys_str_mv	AT stephanedoyen hollowtreesuperadirectionalandscalableapproachforfeatureimportanceinboostedtreemodels AT hughtaylor hollowtreesuperadirectionalandscalableapproachforfeatureimportanceinboostedtreemodels AT peternicholas hollowtreesuperadirectionalandscalableapproachforfeatureimportanceinboostedtreemodels AT lewiscrawford hollowtreesuperadirectionalandscalableapproachforfeatureimportanceinboostedtreemodels AT isabellayoung hollowtreesuperadirectionalandscalableapproachforfeatureimportanceinboostedtreemodels AT michaelesughrue hollowtreesuperadirectionalandscalableapproachforfeatureimportanceinboostedtreemodels
_version_	1718445125412061184

Hollow-tree super: A directional and scalable approach for feature importance in boosted tree models

Ejemplares similares