The validation and assessment of machine learning: a game of prediction from high-dimensional data.

In applied statistics, tools from machine learning are popular for analyzing complex and high-dimensional data. However, few theoretical results are available that could guide to the appropriate machine learning tool in a new application. Initial development of an overall strategy thus often implies...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Tune H Pers, Anders Albrechtsen, Claus Holst, Thorkild I A Sørensen, Thomas A Gerds
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2009
Materias:
R
Q
Acceso en línea:https://doaj.org/article/929dd8bf2caa412a8f79aa4c361a325b
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:929dd8bf2caa412a8f79aa4c361a325b
record_format dspace
spelling oai:doaj.org-article:929dd8bf2caa412a8f79aa4c361a325b2021-11-25T06:21:13ZThe validation and assessment of machine learning: a game of prediction from high-dimensional data.1932-620310.1371/journal.pone.0006287https://doaj.org/article/929dd8bf2caa412a8f79aa4c361a325b2009-08-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19652722/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203In applied statistics, tools from machine learning are popular for analyzing complex and high-dimensional data. However, few theoretical results are available that could guide to the appropriate machine learning tool in a new application. Initial development of an overall strategy thus often implies that multiple methods are tested and compared on the same set of data. This is particularly difficult in situations that are prone to over-fitting where the number of subjects is low compared to the number of potential predictors. The article presents a game which provides some grounds for conducting a fair model comparison. Each player selects a modeling strategy for predicting individual response from potential predictors. A strictly proper scoring rule, bootstrap cross-validation, and a set of rules are used to make the results obtained with different strategies comparable. To illustrate the ideas, the game is applied to data from the Nugenob Study where the aim is to predict the fat oxidation capacity based on conventional factors and high-dimensional metabolomics data. Three players have chosen to use support vector machines, LASSO, and random forests, respectively.Tune H PersAnders AlbrechtsenClaus HolstThorkild I A SørensenThomas A GerdsPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 4, Iss 8, p e6287 (2009)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Tune H Pers
Anders Albrechtsen
Claus Holst
Thorkild I A Sørensen
Thomas A Gerds
The validation and assessment of machine learning: a game of prediction from high-dimensional data.
description In applied statistics, tools from machine learning are popular for analyzing complex and high-dimensional data. However, few theoretical results are available that could guide to the appropriate machine learning tool in a new application. Initial development of an overall strategy thus often implies that multiple methods are tested and compared on the same set of data. This is particularly difficult in situations that are prone to over-fitting where the number of subjects is low compared to the number of potential predictors. The article presents a game which provides some grounds for conducting a fair model comparison. Each player selects a modeling strategy for predicting individual response from potential predictors. A strictly proper scoring rule, bootstrap cross-validation, and a set of rules are used to make the results obtained with different strategies comparable. To illustrate the ideas, the game is applied to data from the Nugenob Study where the aim is to predict the fat oxidation capacity based on conventional factors and high-dimensional metabolomics data. Three players have chosen to use support vector machines, LASSO, and random forests, respectively.
format article
author Tune H Pers
Anders Albrechtsen
Claus Holst
Thorkild I A Sørensen
Thomas A Gerds
author_facet Tune H Pers
Anders Albrechtsen
Claus Holst
Thorkild I A Sørensen
Thomas A Gerds
author_sort Tune H Pers
title The validation and assessment of machine learning: a game of prediction from high-dimensional data.
title_short The validation and assessment of machine learning: a game of prediction from high-dimensional data.
title_full The validation and assessment of machine learning: a game of prediction from high-dimensional data.
title_fullStr The validation and assessment of machine learning: a game of prediction from high-dimensional data.
title_full_unstemmed The validation and assessment of machine learning: a game of prediction from high-dimensional data.
title_sort validation and assessment of machine learning: a game of prediction from high-dimensional data.
publisher Public Library of Science (PLoS)
publishDate 2009
url https://doaj.org/article/929dd8bf2caa412a8f79aa4c361a325b
work_keys_str_mv AT tunehpers thevalidationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata
AT andersalbrechtsen thevalidationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata
AT clausholst thevalidationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata
AT thorkildiasørensen thevalidationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata
AT thomasagerds thevalidationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata
AT tunehpers validationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata
AT andersalbrechtsen validationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata
AT clausholst validationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata
AT thorkildiasørensen validationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata
AT thomasagerds validationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata
_version_ 1718413813745713152