Majority scoring with backward elimination in PLS for high dimensional spectrum data

Abstract Variable selection is crucial issue for high dimensional data modeling, where sample size is smaller compared to number of variables. Recently, majority scoring of filter measures in PLS (MS-PLS) is introduced for variable selection in high dimensional data. Filter measures are not greedy f...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Freeh N. Alenezi
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/dc55cfc4d44f4cbdb4a9bee95682efe4
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:dc55cfc4d44f4cbdb4a9bee95682efe4
record_format dspace
spelling oai:doaj.org-article:dc55cfc4d44f4cbdb4a9bee95682efe42021-12-02T18:51:53ZMajority scoring with backward elimination in PLS for high dimensional spectrum data10.1038/s41598-021-96389-22045-2322https://doaj.org/article/dc55cfc4d44f4cbdb4a9bee95682efe42021-08-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-96389-2https://doaj.org/toc/2045-2322Abstract Variable selection is crucial issue for high dimensional data modeling, where sample size is smaller compared to number of variables. Recently, majority scoring of filter measures in PLS (MS-PLS) is introduced for variable selection in high dimensional data. Filter measures are not greedy for optimal performance, hence we have proposed majority scoring with backward elimination in PLS (MSBE-PLS). In MSBE-PLS we have considered variable importance on projection (VIP) and selectivity ratio (SR). In each iteration of backward elimination in PLS variables are considered influential if they were selected by both filter indicator. The proposed method is implemented for corn’s and diesel’s content prediction. The corn contents include protein, oil, starch and moisture while diesel contents include boiling point at 50% recovery, cetane number, density, freezing temperature of the fuel, total aromatics, and viscosity. The proposed method outperforms in terms of RMSE when compared with reference methods. In addition to validating the spectrum models, data properties are also examined for explaining prediction behaviors. Moreover, MSBE-PLS select the moderate number of influential variables, hence it presents the parsimonious model for predicting contents based on spectrum data.Freeh N. AleneziNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-11 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Freeh N. Alenezi
Majority scoring with backward elimination in PLS for high dimensional spectrum data
description Abstract Variable selection is crucial issue for high dimensional data modeling, where sample size is smaller compared to number of variables. Recently, majority scoring of filter measures in PLS (MS-PLS) is introduced for variable selection in high dimensional data. Filter measures are not greedy for optimal performance, hence we have proposed majority scoring with backward elimination in PLS (MSBE-PLS). In MSBE-PLS we have considered variable importance on projection (VIP) and selectivity ratio (SR). In each iteration of backward elimination in PLS variables are considered influential if they were selected by both filter indicator. The proposed method is implemented for corn’s and diesel’s content prediction. The corn contents include protein, oil, starch and moisture while diesel contents include boiling point at 50% recovery, cetane number, density, freezing temperature of the fuel, total aromatics, and viscosity. The proposed method outperforms in terms of RMSE when compared with reference methods. In addition to validating the spectrum models, data properties are also examined for explaining prediction behaviors. Moreover, MSBE-PLS select the moderate number of influential variables, hence it presents the parsimonious model for predicting contents based on spectrum data.
format article
author Freeh N. Alenezi
author_facet Freeh N. Alenezi
author_sort Freeh N. Alenezi
title Majority scoring with backward elimination in PLS for high dimensional spectrum data
title_short Majority scoring with backward elimination in PLS for high dimensional spectrum data
title_full Majority scoring with backward elimination in PLS for high dimensional spectrum data
title_fullStr Majority scoring with backward elimination in PLS for high dimensional spectrum data
title_full_unstemmed Majority scoring with backward elimination in PLS for high dimensional spectrum data
title_sort majority scoring with backward elimination in pls for high dimensional spectrum data
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/dc55cfc4d44f4cbdb4a9bee95682efe4
work_keys_str_mv AT freehnalenezi majorityscoringwithbackwardeliminationinplsforhighdimensionalspectrumdata
_version_ 1718377416911486976