The validation and assessment of machine learning: a game of prediction from high-dimensional data.
In applied statistics, tools from machine learning are popular for analyzing complex and high-dimensional data. However, few theoretical results are available that could guide to the appropriate machine learning tool in a new application. Initial development of an overall strategy thus often implies...
Guardado en:
Autores principales: | , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Public Library of Science (PLoS)
2009
|
Materias: | |
Acceso en línea: | https://doaj.org/article/929dd8bf2caa412a8f79aa4c361a325b |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:929dd8bf2caa412a8f79aa4c361a325b |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:929dd8bf2caa412a8f79aa4c361a325b2021-11-25T06:21:13ZThe validation and assessment of machine learning: a game of prediction from high-dimensional data.1932-620310.1371/journal.pone.0006287https://doaj.org/article/929dd8bf2caa412a8f79aa4c361a325b2009-08-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/19652722/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203In applied statistics, tools from machine learning are popular for analyzing complex and high-dimensional data. However, few theoretical results are available that could guide to the appropriate machine learning tool in a new application. Initial development of an overall strategy thus often implies that multiple methods are tested and compared on the same set of data. This is particularly difficult in situations that are prone to over-fitting where the number of subjects is low compared to the number of potential predictors. The article presents a game which provides some grounds for conducting a fair model comparison. Each player selects a modeling strategy for predicting individual response from potential predictors. A strictly proper scoring rule, bootstrap cross-validation, and a set of rules are used to make the results obtained with different strategies comparable. To illustrate the ideas, the game is applied to data from the Nugenob Study where the aim is to predict the fat oxidation capacity based on conventional factors and high-dimensional metabolomics data. Three players have chosen to use support vector machines, LASSO, and random forests, respectively.Tune H PersAnders AlbrechtsenClaus HolstThorkild I A SørensenThomas A GerdsPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 4, Iss 8, p e6287 (2009) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Medicine R Science Q |
spellingShingle |
Medicine R Science Q Tune H Pers Anders Albrechtsen Claus Holst Thorkild I A Sørensen Thomas A Gerds The validation and assessment of machine learning: a game of prediction from high-dimensional data. |
description |
In applied statistics, tools from machine learning are popular for analyzing complex and high-dimensional data. However, few theoretical results are available that could guide to the appropriate machine learning tool in a new application. Initial development of an overall strategy thus often implies that multiple methods are tested and compared on the same set of data. This is particularly difficult in situations that are prone to over-fitting where the number of subjects is low compared to the number of potential predictors. The article presents a game which provides some grounds for conducting a fair model comparison. Each player selects a modeling strategy for predicting individual response from potential predictors. A strictly proper scoring rule, bootstrap cross-validation, and a set of rules are used to make the results obtained with different strategies comparable. To illustrate the ideas, the game is applied to data from the Nugenob Study where the aim is to predict the fat oxidation capacity based on conventional factors and high-dimensional metabolomics data. Three players have chosen to use support vector machines, LASSO, and random forests, respectively. |
format |
article |
author |
Tune H Pers Anders Albrechtsen Claus Holst Thorkild I A Sørensen Thomas A Gerds |
author_facet |
Tune H Pers Anders Albrechtsen Claus Holst Thorkild I A Sørensen Thomas A Gerds |
author_sort |
Tune H Pers |
title |
The validation and assessment of machine learning: a game of prediction from high-dimensional data. |
title_short |
The validation and assessment of machine learning: a game of prediction from high-dimensional data. |
title_full |
The validation and assessment of machine learning: a game of prediction from high-dimensional data. |
title_fullStr |
The validation and assessment of machine learning: a game of prediction from high-dimensional data. |
title_full_unstemmed |
The validation and assessment of machine learning: a game of prediction from high-dimensional data. |
title_sort |
validation and assessment of machine learning: a game of prediction from high-dimensional data. |
publisher |
Public Library of Science (PLoS) |
publishDate |
2009 |
url |
https://doaj.org/article/929dd8bf2caa412a8f79aa4c361a325b |
work_keys_str_mv |
AT tunehpers thevalidationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata AT andersalbrechtsen thevalidationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata AT clausholst thevalidationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata AT thorkildiasørensen thevalidationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata AT thomasagerds thevalidationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata AT tunehpers validationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata AT andersalbrechtsen validationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata AT clausholst validationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata AT thorkildiasørensen validationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata AT thomasagerds validationandassessmentofmachinelearningagameofpredictionfromhighdimensionaldata |
_version_ |
1718413813745713152 |