An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data

Multicollinearity often occurs when two or more predictor variables are correlated, especially for high dimensional data (HDD) where <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>p</mi>&l...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Siti Zahariah, Habshah Midi, Mohd Shafie Mustafa
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/9c469f28d9f54a4da0799efc21d4ac98
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:9c469f28d9f54a4da0799efc21d4ac98
record_format dspace
spelling oai:doaj.org-article:9c469f28d9f54a4da0799efc21d4ac982021-11-25T19:07:37ZAn Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data10.3390/sym131122112073-8994https://doaj.org/article/9c469f28d9f54a4da0799efc21d4ac982021-11-01T00:00:00Zhttps://www.mdpi.com/2073-8994/13/11/2211https://doaj.org/toc/2073-8994Multicollinearity often occurs when two or more predictor variables are correlated, especially for high dimensional data (HDD) where <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>p</mi><mo>></mo><mo>></mo><mi>n</mi></mrow></semantics></math></inline-formula>. The statistically inspired modification of the partial least squares (SIMPLS) is a very popular technique for solving a partial least squares regression problem due to its efficiency, speed, and ease of understanding. The execution of SIMPLS is based on the empirical covariance matrix of explanatory variables and response variables. Nevertheless, SIMPLS is very easily affected by outliers. In order to rectify this problem, a robust iteratively reweighted SIMPLS (RWSIMPLS) is introduced. Nonetheless, it is still not very efficient as the algorithm of RWSIMPLS is based on a weighting function that does not specify any method of identification of high leverage points (HLPs), i.e., outlying observations in the <i>X</i>-direction. HLPs have the most detrimental effect on the computed values of various estimates, which results in misleading conclusions about the fitted regression model. Hence, their effects need to be reduced by assigning smaller weights to them. As a solution to this problem, we propose an improvised SIMPLS based on a new weight function obtained from the MRCD-PCA diagnostic method of the identification of HLPs for HDD and name this method MRCD-PCA-RWSIMPLS. A new MRCD-PCA-RWSIMPLS diagnostic plot is also established for classifying observations into four data points, i.e., regular observations, vertical outliers, and good and bad leverage points. The numerical examples and Monte Carlo simulations signify that MRCD-PCA-RWSIMPLS offers substantial improvements over SIMPLS and RWSIMPLS. The proposed diagnostic plot is able to classify observations into correct groups. On the contrary, SIMPLS and RWSIMPLS plots fail to correctly classify observations into correct groups and show masking and swamping effects.Siti ZahariahHabshah MidiMohd Shafie MustafaMDPI AGarticlehigh dimensional datahigh leverage pointminimum regularized covariance determinantpartial least squares regressionprincipal component analysisSIMPLSMathematicsQA1-939ENSymmetry, Vol 13, Iss 2211, p 2211 (2021)
institution DOAJ
collection DOAJ
language EN
topic high dimensional data
high leverage point
minimum regularized covariance determinant
partial least squares regression
principal component analysis
SIMPLS
Mathematics
QA1-939
spellingShingle high dimensional data
high leverage point
minimum regularized covariance determinant
partial least squares regression
principal component analysis
SIMPLS
Mathematics
QA1-939
Siti Zahariah
Habshah Midi
Mohd Shafie Mustafa
An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data
description Multicollinearity often occurs when two or more predictor variables are correlated, especially for high dimensional data (HDD) where <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>p</mi><mo>></mo><mo>></mo><mi>n</mi></mrow></semantics></math></inline-formula>. The statistically inspired modification of the partial least squares (SIMPLS) is a very popular technique for solving a partial least squares regression problem due to its efficiency, speed, and ease of understanding. The execution of SIMPLS is based on the empirical covariance matrix of explanatory variables and response variables. Nevertheless, SIMPLS is very easily affected by outliers. In order to rectify this problem, a robust iteratively reweighted SIMPLS (RWSIMPLS) is introduced. Nonetheless, it is still not very efficient as the algorithm of RWSIMPLS is based on a weighting function that does not specify any method of identification of high leverage points (HLPs), i.e., outlying observations in the <i>X</i>-direction. HLPs have the most detrimental effect on the computed values of various estimates, which results in misleading conclusions about the fitted regression model. Hence, their effects need to be reduced by assigning smaller weights to them. As a solution to this problem, we propose an improvised SIMPLS based on a new weight function obtained from the MRCD-PCA diagnostic method of the identification of HLPs for HDD and name this method MRCD-PCA-RWSIMPLS. A new MRCD-PCA-RWSIMPLS diagnostic plot is also established for classifying observations into four data points, i.e., regular observations, vertical outliers, and good and bad leverage points. The numerical examples and Monte Carlo simulations signify that MRCD-PCA-RWSIMPLS offers substantial improvements over SIMPLS and RWSIMPLS. The proposed diagnostic plot is able to classify observations into correct groups. On the contrary, SIMPLS and RWSIMPLS plots fail to correctly classify observations into correct groups and show masking and swamping effects.
format article
author Siti Zahariah
Habshah Midi
Mohd Shafie Mustafa
author_facet Siti Zahariah
Habshah Midi
Mohd Shafie Mustafa
author_sort Siti Zahariah
title An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data
title_short An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data
title_full An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data
title_fullStr An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data
title_full_unstemmed An Improvised SIMPLS Estimator Based on MRCD-PCA Weighting Function and Its Application to Real Data
title_sort improvised simpls estimator based on mrcd-pca weighting function and its application to real data
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/9c469f28d9f54a4da0799efc21d4ac98
work_keys_str_mv AT sitizahariah animprovisedsimplsestimatorbasedonmrcdpcaweightingfunctionanditsapplicationtorealdata
AT habshahmidi animprovisedsimplsestimatorbasedonmrcdpcaweightingfunctionanditsapplicationtorealdata
AT mohdshafiemustafa animprovisedsimplsestimatorbasedonmrcdpcaweightingfunctionanditsapplicationtorealdata
AT sitizahariah improvisedsimplsestimatorbasedonmrcdpcaweightingfunctionanditsapplicationtorealdata
AT habshahmidi improvisedsimplsestimatorbasedonmrcdpcaweightingfunctionanditsapplicationtorealdata
AT mohdshafiemustafa improvisedsimplsestimatorbasedonmrcdpcaweightingfunctionanditsapplicationtorealdata
_version_ 1718410293702295552