Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato

Training set construction is an important prerequisite to Genomic Prediction (GP), and while this has been studied in diploids, polyploids have not received the same attention. Polyploidy is a common feature in many crop plants, like for example banana and blueberry, but also potato which is the thi...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Stefan Wilson, Marcos Malosetti, Chris Maliepaard, Han A. Mulder, Richard G. F. Visser, Fred van Eeuwijk
Formato: article
Lenguaje:EN
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://doaj.org/article/6ed54ce3c91b4c1cbec52c3206b058aa
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:6ed54ce3c91b4c1cbec52c3206b058aa
record_format dspace
spelling oai:doaj.org-article:6ed54ce3c91b4c1cbec52c3206b058aa2021-11-30T17:31:04ZTraining Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato1664-462X10.3389/fpls.2021.771075https://doaj.org/article/6ed54ce3c91b4c1cbec52c3206b058aa2021-11-01T00:00:00Zhttps://www.frontiersin.org/articles/10.3389/fpls.2021.771075/fullhttps://doaj.org/toc/1664-462XTraining set construction is an important prerequisite to Genomic Prediction (GP), and while this has been studied in diploids, polyploids have not received the same attention. Polyploidy is a common feature in many crop plants, like for example banana and blueberry, but also potato which is the third most important crop in the world in terms of food consumption, after rice and wheat. The aim of this study was to investigate the impact of different training set construction methods using a publicly available diversity panel of tetraploid potatoes. Four methods of training set construction were compared: simple random sampling, stratified random sampling, genetic distance sampling and sampling based on the coefficient of determination (CDmean). For stratified random sampling, population structure analyses were carried out in order to define sub-populations, but since sub-populations accounted for only 16.6% of genetic variation, there were negligible differences between stratified and simple random sampling. For genetic distance sampling, four genetic distance measures were compared and though they performed similarly, Euclidean distance was the most consistent. In the majority of cases the CDmean method was the best sampling method, and compared to simple random sampling gave improvements of 4–14% in cross-validation scenarios, and 2–8% in scenarios with an independent test set, while genetic distance sampling gave improvements of 5.5–10.5% and 0.4–4.5%. No interaction was found between sampling method and the statistical model for the traits analyzed.Stefan WilsonMarcos MalosettiChris MaliepaardHan A. MulderRichard G. F. VisserFred van EeuwijkFrontiers Media S.A.articletraining set constructionpotatosampling technique(s)genomic prediction (GP)auto-tetraploidPlant cultureSB1-1110ENFrontiers in Plant Science, Vol 12 (2021)
institution DOAJ
collection DOAJ
language EN
topic training set construction
potato
sampling technique(s)
genomic prediction (GP)
auto-tetraploid
Plant culture
SB1-1110
spellingShingle training set construction
potato
sampling technique(s)
genomic prediction (GP)
auto-tetraploid
Plant culture
SB1-1110
Stefan Wilson
Marcos Malosetti
Chris Maliepaard
Han A. Mulder
Richard G. F. Visser
Fred van Eeuwijk
Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato
description Training set construction is an important prerequisite to Genomic Prediction (GP), and while this has been studied in diploids, polyploids have not received the same attention. Polyploidy is a common feature in many crop plants, like for example banana and blueberry, but also potato which is the third most important crop in the world in terms of food consumption, after rice and wheat. The aim of this study was to investigate the impact of different training set construction methods using a publicly available diversity panel of tetraploid potatoes. Four methods of training set construction were compared: simple random sampling, stratified random sampling, genetic distance sampling and sampling based on the coefficient of determination (CDmean). For stratified random sampling, population structure analyses were carried out in order to define sub-populations, but since sub-populations accounted for only 16.6% of genetic variation, there were negligible differences between stratified and simple random sampling. For genetic distance sampling, four genetic distance measures were compared and though they performed similarly, Euclidean distance was the most consistent. In the majority of cases the CDmean method was the best sampling method, and compared to simple random sampling gave improvements of 4–14% in cross-validation scenarios, and 2–8% in scenarios with an independent test set, while genetic distance sampling gave improvements of 5.5–10.5% and 0.4–4.5%. No interaction was found between sampling method and the statistical model for the traits analyzed.
format article
author Stefan Wilson
Marcos Malosetti
Chris Maliepaard
Han A. Mulder
Richard G. F. Visser
Fred van Eeuwijk
author_facet Stefan Wilson
Marcos Malosetti
Chris Maliepaard
Han A. Mulder
Richard G. F. Visser
Fred van Eeuwijk
author_sort Stefan Wilson
title Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato
title_short Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato
title_full Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato
title_fullStr Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato
title_full_unstemmed Training Set Construction for Genomic Prediction in Auto-Tetraploids: An Example in Potato
title_sort training set construction for genomic prediction in auto-tetraploids: an example in potato
publisher Frontiers Media S.A.
publishDate 2021
url https://doaj.org/article/6ed54ce3c91b4c1cbec52c3206b058aa
work_keys_str_mv AT stefanwilson trainingsetconstructionforgenomicpredictioninautotetraploidsanexampleinpotato
AT marcosmalosetti trainingsetconstructionforgenomicpredictioninautotetraploidsanexampleinpotato
AT chrismaliepaard trainingsetconstructionforgenomicpredictioninautotetraploidsanexampleinpotato
AT hanamulder trainingsetconstructionforgenomicpredictioninautotetraploidsanexampleinpotato
AT richardgfvisser trainingsetconstructionforgenomicpredictioninautotetraploidsanexampleinpotato
AT fredvaneeuwijk trainingsetconstructionforgenomicpredictioninautotetraploidsanexampleinpotato
_version_ 1718406385774886912