Elastic Correlation Adjusted Regression (ECAR) scores for high dimensional variable importance measuring

Abstract Investigation of the genetic basis of traits or clinical outcomes heavily relies on identifying relevant variables in molecular data. However, characteristics such as high dimensionality and complex correlation structures of these data hinder the development of related methods, resulting in...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Yuan Zhou, Botao Fa, Ting Wei, Jianle Sun, Zhangsheng Yu, Yue Zhang
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/acbdf7fd5e124f8dbc69d0f1bcef190a
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:acbdf7fd5e124f8dbc69d0f1bcef190a
record_format dspace
spelling oai:doaj.org-article:acbdf7fd5e124f8dbc69d0f1bcef190a2021-12-05T12:11:23ZElastic Correlation Adjusted Regression (ECAR) scores for high dimensional variable importance measuring10.1038/s41598-021-02706-02045-2322https://doaj.org/article/acbdf7fd5e124f8dbc69d0f1bcef190a2021-12-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-02706-0https://doaj.org/toc/2045-2322Abstract Investigation of the genetic basis of traits or clinical outcomes heavily relies on identifying relevant variables in molecular data. However, characteristics such as high dimensionality and complex correlation structures of these data hinder the development of related methods, resulting in the inclusion of false positives and negatives. We developed a variable importance measure method, termed the ECAR scores, that evaluates the importance of variables in the dataset. Based on this score, ranking and selection of variables can be achieved simultaneously. Unlike most current approaches, the ECAR scores aim to rank the influential variables as high as possible while maintaining the grouping property, instead of selecting the ones that are merely predictive. The ECAR scores’ performance is tested and compared to other methods on simulated, semi-synthetic, and real datasets. Results showed that the ECAR scores improve the CAR scores in terms of accuracy of variable selection and high-rank variables’ predictive power. It also outperforms other classic methods such as lasso and stability selection when there is a high degree of correlation among influential variables. As an application, we used the ECAR scores to analyze genes associated with forced expiratory volume in the first second in patients with lung cancer and reported six associated genes.Yuan ZhouBotao FaTing WeiJianle SunZhangsheng YuYue ZhangNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-12 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Yuan Zhou
Botao Fa
Ting Wei
Jianle Sun
Zhangsheng Yu
Yue Zhang
Elastic Correlation Adjusted Regression (ECAR) scores for high dimensional variable importance measuring
description Abstract Investigation of the genetic basis of traits or clinical outcomes heavily relies on identifying relevant variables in molecular data. However, characteristics such as high dimensionality and complex correlation structures of these data hinder the development of related methods, resulting in the inclusion of false positives and negatives. We developed a variable importance measure method, termed the ECAR scores, that evaluates the importance of variables in the dataset. Based on this score, ranking and selection of variables can be achieved simultaneously. Unlike most current approaches, the ECAR scores aim to rank the influential variables as high as possible while maintaining the grouping property, instead of selecting the ones that are merely predictive. The ECAR scores’ performance is tested and compared to other methods on simulated, semi-synthetic, and real datasets. Results showed that the ECAR scores improve the CAR scores in terms of accuracy of variable selection and high-rank variables’ predictive power. It also outperforms other classic methods such as lasso and stability selection when there is a high degree of correlation among influential variables. As an application, we used the ECAR scores to analyze genes associated with forced expiratory volume in the first second in patients with lung cancer and reported six associated genes.
format article
author Yuan Zhou
Botao Fa
Ting Wei
Jianle Sun
Zhangsheng Yu
Yue Zhang
author_facet Yuan Zhou
Botao Fa
Ting Wei
Jianle Sun
Zhangsheng Yu
Yue Zhang
author_sort Yuan Zhou
title Elastic Correlation Adjusted Regression (ECAR) scores for high dimensional variable importance measuring
title_short Elastic Correlation Adjusted Regression (ECAR) scores for high dimensional variable importance measuring
title_full Elastic Correlation Adjusted Regression (ECAR) scores for high dimensional variable importance measuring
title_fullStr Elastic Correlation Adjusted Regression (ECAR) scores for high dimensional variable importance measuring
title_full_unstemmed Elastic Correlation Adjusted Regression (ECAR) scores for high dimensional variable importance measuring
title_sort elastic correlation adjusted regression (ecar) scores for high dimensional variable importance measuring
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/acbdf7fd5e124f8dbc69d0f1bcef190a
work_keys_str_mv AT yuanzhou elasticcorrelationadjustedregressionecarscoresforhighdimensionalvariableimportancemeasuring
AT botaofa elasticcorrelationadjustedregressionecarscoresforhighdimensionalvariableimportancemeasuring
AT tingwei elasticcorrelationadjustedregressionecarscoresforhighdimensionalvariableimportancemeasuring
AT jianlesun elasticcorrelationadjustedregressionecarscoresforhighdimensionalvariableimportancemeasuring
AT zhangshengyu elasticcorrelationadjustedregressionecarscoresforhighdimensionalvariableimportancemeasuring
AT yuezhang elasticcorrelationadjustedregressionecarscoresforhighdimensionalvariableimportancemeasuring
_version_ 1718372212570849280