Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks

The development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the field level, such as soil and climatic information, can be relevant to improve predictive ability in geno...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Cathy C. Westhues, Gregory S. Mahone, Sofia da Silva, Patrick Thorwarth, Malthe Schmidt, Jan-Christoph Richter, Henner Simianer, Timothy M. Beissinger
Formato: article
Lenguaje:EN
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://doaj.org/article/17737aeb89214a10bc7bc10c5ba2deca
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:17737aeb89214a10bc7bc10c5ba2deca
record_format dspace
spelling oai:doaj.org-article:17737aeb89214a10bc7bc10c5ba2deca2021-11-30T10:14:12ZPrediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks1664-462X10.3389/fpls.2021.699589https://doaj.org/article/17737aeb89214a10bc7bc10c5ba2deca2021-11-01T00:00:00Zhttps://www.frontiersin.org/articles/10.3389/fpls.2021.699589/fullhttps://doaj.org/toc/1664-462XThe development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the field level, such as soil and climatic information, can be relevant to improve predictive ability in genomic prediction models by describing more precisely genotype-by-environment interactions, which represent a key component of the phenotypic response for complex crop agronomic traits. Modern predictive modeling approaches can efficiently handle various data types and are able to capture complex nonlinear relationships in large datasets. In particular, machine learning techniques have gained substantial interest in recent years. Here we examined the predictive ability of machine learning-based models for two phenotypic traits in maize using data collected by the Maize Genomes to Fields (G2F) Initiative. The data we analyzed consisted of multi-environment trials (METs) dispersed across the United States and Canada from 2014 to 2017. An assortment of soil- and weather-related variables was derived and used in prediction models alongside genotypic data. Linear random effects models were compared to a linear regularized regression method (elastic net) and to two nonlinear gradient boosting methods based on decision tree algorithms (XGBoost, LightGBM). These models were evaluated under four prediction problems: (1) tested and new genotypes in a new year; (2) only unobserved genotypes in a new year; (3) tested and new genotypes in a new site; (4) only unobserved genotypes in a new site. Accuracy in forecasting grain yield performance of new genotypes in a new year was improved by up to 20% over the baseline model by including environmental predictors with gradient boosting methods. For plant height, an enhancement of predictive ability could neither be observed by using machine learning-based methods nor by using detailed environmental information. An investigation of key environmental factors using gradient boosting frameworks also revealed that temperature at flowering stage, frequency and amount of water received during the vegetative and grain filling stage, and soil organic matter content appeared as important predictors for grain yield in our panel of environments.Cathy C. WesthuesCathy C. WesthuesGregory S. MahoneSofia da SilvaPatrick ThorwarthMalthe SchmidtJan-Christoph RichterHenner SimianerHenner SimianerTimothy M. BeissingerTimothy M. BeissingerFrontiers Media S.A.articlemachine learninggenotype-by-environment interactionsgradient boostingmaizeyieldgenomic predictionPlant cultureSB1-1110ENFrontiers in Plant Science, Vol 12 (2021)
institution DOAJ
collection DOAJ
language EN
topic machine learning
genotype-by-environment interactions
gradient boosting
maize
yield
genomic prediction
Plant culture
SB1-1110
spellingShingle machine learning
genotype-by-environment interactions
gradient boosting
maize
yield
genomic prediction
Plant culture
SB1-1110
Cathy C. Westhues
Cathy C. Westhues
Gregory S. Mahone
Sofia da Silva
Patrick Thorwarth
Malthe Schmidt
Jan-Christoph Richter
Henner Simianer
Henner Simianer
Timothy M. Beissinger
Timothy M. Beissinger
Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks
description The development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the field level, such as soil and climatic information, can be relevant to improve predictive ability in genomic prediction models by describing more precisely genotype-by-environment interactions, which represent a key component of the phenotypic response for complex crop agronomic traits. Modern predictive modeling approaches can efficiently handle various data types and are able to capture complex nonlinear relationships in large datasets. In particular, machine learning techniques have gained substantial interest in recent years. Here we examined the predictive ability of machine learning-based models for two phenotypic traits in maize using data collected by the Maize Genomes to Fields (G2F) Initiative. The data we analyzed consisted of multi-environment trials (METs) dispersed across the United States and Canada from 2014 to 2017. An assortment of soil- and weather-related variables was derived and used in prediction models alongside genotypic data. Linear random effects models were compared to a linear regularized regression method (elastic net) and to two nonlinear gradient boosting methods based on decision tree algorithms (XGBoost, LightGBM). These models were evaluated under four prediction problems: (1) tested and new genotypes in a new year; (2) only unobserved genotypes in a new year; (3) tested and new genotypes in a new site; (4) only unobserved genotypes in a new site. Accuracy in forecasting grain yield performance of new genotypes in a new year was improved by up to 20% over the baseline model by including environmental predictors with gradient boosting methods. For plant height, an enhancement of predictive ability could neither be observed by using machine learning-based methods nor by using detailed environmental information. An investigation of key environmental factors using gradient boosting frameworks also revealed that temperature at flowering stage, frequency and amount of water received during the vegetative and grain filling stage, and soil organic matter content appeared as important predictors for grain yield in our panel of environments.
format article
author Cathy C. Westhues
Cathy C. Westhues
Gregory S. Mahone
Sofia da Silva
Patrick Thorwarth
Malthe Schmidt
Jan-Christoph Richter
Henner Simianer
Henner Simianer
Timothy M. Beissinger
Timothy M. Beissinger
author_facet Cathy C. Westhues
Cathy C. Westhues
Gregory S. Mahone
Sofia da Silva
Patrick Thorwarth
Malthe Schmidt
Jan-Christoph Richter
Henner Simianer
Henner Simianer
Timothy M. Beissinger
Timothy M. Beissinger
author_sort Cathy C. Westhues
title Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks
title_short Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks
title_full Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks
title_fullStr Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks
title_full_unstemmed Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks
title_sort prediction of maize phenotypic traits with genomic and environmental predictors using gradient boosting frameworks
publisher Frontiers Media S.A.
publishDate 2021
url https://doaj.org/article/17737aeb89214a10bc7bc10c5ba2deca
work_keys_str_mv AT cathycwesthues predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks
AT cathycwesthues predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks
AT gregorysmahone predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks
AT sofiadasilva predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks
AT patrickthorwarth predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks
AT maltheschmidt predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks
AT janchristophrichter predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks
AT hennersimianer predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks
AT hennersimianer predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks
AT timothymbeissinger predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks
AT timothymbeissinger predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks
_version_ 1718406667145576448