Comparing Genomic Prediction Models by Means of Cross Validation
In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those para...
Guardado en:
Autores principales: | , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/d8bb4b08605549eeb8fc88dc3c3b7037 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:d8bb4b08605549eeb8fc88dc3c3b7037 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:d8bb4b08605549eeb8fc88dc3c3b70372021-11-19T04:52:45ZComparing Genomic Prediction Models by Means of Cross Validation1664-462X10.3389/fpls.2021.734512https://doaj.org/article/d8bb4b08605549eeb8fc88dc3c3b70372021-11-01T00:00:00Zhttps://www.frontiersin.org/articles/10.3389/fpls.2021.734512/fullhttps://doaj.org/toc/1664-462XIn the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.Matías F. SchraufMatías F. SchraufGustavo de los CamposSebastián MunillaSebastián MunillaFrontiers Media S.A.articlegenomic selectioncross validationplant breedinggenomic modelsmodel selectionPlant cultureSB1-1110ENFrontiers in Plant Science, Vol 12 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
genomic selection cross validation plant breeding genomic models model selection Plant culture SB1-1110 |
spellingShingle |
genomic selection cross validation plant breeding genomic models model selection Plant culture SB1-1110 Matías F. Schrauf Matías F. Schrauf Gustavo de los Campos Sebastián Munilla Sebastián Munilla Comparing Genomic Prediction Models by Means of Cross Validation |
description |
In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders. |
format |
article |
author |
Matías F. Schrauf Matías F. Schrauf Gustavo de los Campos Sebastián Munilla Sebastián Munilla |
author_facet |
Matías F. Schrauf Matías F. Schrauf Gustavo de los Campos Sebastián Munilla Sebastián Munilla |
author_sort |
Matías F. Schrauf |
title |
Comparing Genomic Prediction Models by Means of Cross Validation |
title_short |
Comparing Genomic Prediction Models by Means of Cross Validation |
title_full |
Comparing Genomic Prediction Models by Means of Cross Validation |
title_fullStr |
Comparing Genomic Prediction Models by Means of Cross Validation |
title_full_unstemmed |
Comparing Genomic Prediction Models by Means of Cross Validation |
title_sort |
comparing genomic prediction models by means of cross validation |
publisher |
Frontiers Media S.A. |
publishDate |
2021 |
url |
https://doaj.org/article/d8bb4b08605549eeb8fc88dc3c3b7037 |
work_keys_str_mv |
AT matiasfschrauf comparinggenomicpredictionmodelsbymeansofcrossvalidation AT matiasfschrauf comparinggenomicpredictionmodelsbymeansofcrossvalidation AT gustavodeloscampos comparinggenomicpredictionmodelsbymeansofcrossvalidation AT sebastianmunilla comparinggenomicpredictionmodelsbymeansofcrossvalidation AT sebastianmunilla comparinggenomicpredictionmodelsbymeansofcrossvalidation |
_version_ |
1718420404634124288 |