Information-Corrected Estimation: A Generalization Error Reducing Parameter Estimation Method

Modern computational models in supervised machine learning are often highly parameterized universal approximators. As such, the value of the parameters is unimportant, and only the out of sample performance is considered. On the other hand much of the literature on model estimation assumes that the...

Description complète

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Matthew Dixon, Tyler Ward
Format:	article
Langue:	EN
Publié:	MDPI AG 2021
Sujets:	generalization error overfitting information criteria entropy Science Q Astrophysics QB460-466 Physics QC1-999
Accès en ligne:	https://doaj.org/article/ab0ad735d21a46f785cc82159c6ac2f9
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

id	oai:doaj.org-article:ab0ad735d21a46f785cc82159c6ac2f9
record_format	dspace
spelling	oai:doaj.org-article:ab0ad735d21a46f785cc82159c6ac2f92021-11-25T17:29:31ZInformation-Corrected Estimation: A Generalization Error Reducing Parameter Estimation Method10.3390/e231114191099-4300https://doaj.org/article/ab0ad735d21a46f785cc82159c6ac2f92021-10-01T00:00:00Zhttps://www.mdpi.com/1099-4300/23/11/1419https://doaj.org/toc/1099-4300Modern computational models in supervised machine learning are often highly parameterized universal approximators. As such, the value of the parameters is unimportant, and only the out of sample performance is considered. On the other hand much of the literature on model estimation assumes that the parameters themselves have intrinsic value, and thus is concerned with bias and variance of parameter estimates, which may not have any simple relationship to out of sample model performance. Therefore, within supervised machine learning, heavy use is made of ridge regression (i.e., L2 regularization), which requires the the estimation of hyperparameters and can be rendered ineffective by certain model parameterizations. We introduce an objective function which we refer to as Information-Corrected Estimation (ICE) that reduces KL divergence based generalization error for supervised machine learning. ICE attempts to directly maximize a corrected likelihood function as an estimator of the KL divergence. Such an approach is proven, theoretically, to be effective for a wide class of models, with only mild regularity restrictions. Under finite sample sizes, this corrected estimation procedure is shown experimentally to lead to significant reduction in generalization error compared to maximum likelihood estimation and L2 regularization.Matthew DixonTyler WardMDPI AGarticlegeneralization erroroverfittinginformation criteriaentropyScienceQAstrophysicsQB460-466PhysicsQC1-999ENEntropy, Vol 23, Iss 1419, p 1419 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	generalization error overfitting information criteria entropy Science Q Astrophysics QB460-466 Physics QC1-999
spellingShingle	generalization error overfitting information criteria entropy Science Q Astrophysics QB460-466 Physics QC1-999 Matthew Dixon Tyler Ward Information-Corrected Estimation: A Generalization Error Reducing Parameter Estimation Method
description	Modern computational models in supervised machine learning are often highly parameterized universal approximators. As such, the value of the parameters is unimportant, and only the out of sample performance is considered. On the other hand much of the literature on model estimation assumes that the parameters themselves have intrinsic value, and thus is concerned with bias and variance of parameter estimates, which may not have any simple relationship to out of sample model performance. Therefore, within supervised machine learning, heavy use is made of ridge regression (i.e., L2 regularization), which requires the the estimation of hyperparameters and can be rendered ineffective by certain model parameterizations. We introduce an objective function which we refer to as Information-Corrected Estimation (ICE) that reduces KL divergence based generalization error for supervised machine learning. ICE attempts to directly maximize a corrected likelihood function as an estimator of the KL divergence. Such an approach is proven, theoretically, to be effective for a wide class of models, with only mild regularity restrictions. Under finite sample sizes, this corrected estimation procedure is shown experimentally to lead to significant reduction in generalization error compared to maximum likelihood estimation and L2 regularization.
format	article
author	Matthew Dixon Tyler Ward
author_facet	Matthew Dixon Tyler Ward
author_sort	Matthew Dixon
title	Information-Corrected Estimation: A Generalization Error Reducing Parameter Estimation Method
title_short	Information-Corrected Estimation: A Generalization Error Reducing Parameter Estimation Method
title_full	Information-Corrected Estimation: A Generalization Error Reducing Parameter Estimation Method
title_fullStr	Information-Corrected Estimation: A Generalization Error Reducing Parameter Estimation Method
title_full_unstemmed	Information-Corrected Estimation: A Generalization Error Reducing Parameter Estimation Method
title_sort	information-corrected estimation: a generalization error reducing parameter estimation method
publisher	MDPI AG
publishDate	2021
url	https://doaj.org/article/ab0ad735d21a46f785cc82159c6ac2f9
work_keys_str_mv	AT matthewdixon informationcorrectedestimationageneralizationerrorreducingparameterestimationmethod AT tylerward informationcorrectedestimationageneralizationerrorreducingparameterestimationmethod
_version_	1718412304468410368

Information-Corrected Estimation: A Generalization Error Reducing Parameter Estimation Method

Documents similaires