Relationship between gene regulation network structure and prediction accuracy in high dimensional regression

Abstract The least absolute shrinkage and selection operator (lasso) and principal component regression (PCR) are popular methods of estimating traits from high-dimensional omics data, such as transcriptomes. The prediction accuracy of these estimation methods is highly dependent on the covariance s...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Yuichi Okinaga, Daisuke Kyogoku, Satoshi Kondo, Atsushi J. Nagano, Kei Hirose
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/38c9865acb7641c6914c6399fe62353d
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:38c9865acb7641c6914c6399fe62353d
record_format dspace
spelling oai:doaj.org-article:38c9865acb7641c6914c6399fe62353d2021-12-02T17:51:29ZRelationship between gene regulation network structure and prediction accuracy in high dimensional regression10.1038/s41598-021-90791-62045-2322https://doaj.org/article/38c9865acb7641c6914c6399fe62353d2021-06-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-90791-6https://doaj.org/toc/2045-2322Abstract The least absolute shrinkage and selection operator (lasso) and principal component regression (PCR) are popular methods of estimating traits from high-dimensional omics data, such as transcriptomes. The prediction accuracy of these estimation methods is highly dependent on the covariance structure, which is characterized by gene regulation networks. However, the manner in which the structure of a gene regulation network together with the sample size affects prediction accuracy has not yet been sufficiently investigated. In this study, Monte Carlo simulations are conducted to investigate the prediction accuracy for several network structures under various sample sizes. When the gene regulation network is a random graph, a sufficiently large number of observations are required to ensure good prediction accuracy with the lasso. The PCR provided poor prediction accuracy regardless of the sample size. However, a real gene regulation network is likely to exhibit a scale-free structure. In such cases, the simulation indicates that a relatively small number of observations, such as $$N=300$$ N = 300 , is sufficient to allow the accurate prediction of traits from a transcriptome with the lasso.Yuichi OkinagaDaisuke KyogokuSatoshi KondoAtsushi J. NaganoKei HiroseNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-10 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Yuichi Okinaga
Daisuke Kyogoku
Satoshi Kondo
Atsushi J. Nagano
Kei Hirose
Relationship between gene regulation network structure and prediction accuracy in high dimensional regression
description Abstract The least absolute shrinkage and selection operator (lasso) and principal component regression (PCR) are popular methods of estimating traits from high-dimensional omics data, such as transcriptomes. The prediction accuracy of these estimation methods is highly dependent on the covariance structure, which is characterized by gene regulation networks. However, the manner in which the structure of a gene regulation network together with the sample size affects prediction accuracy has not yet been sufficiently investigated. In this study, Monte Carlo simulations are conducted to investigate the prediction accuracy for several network structures under various sample sizes. When the gene regulation network is a random graph, a sufficiently large number of observations are required to ensure good prediction accuracy with the lasso. The PCR provided poor prediction accuracy regardless of the sample size. However, a real gene regulation network is likely to exhibit a scale-free structure. In such cases, the simulation indicates that a relatively small number of observations, such as $$N=300$$ N = 300 , is sufficient to allow the accurate prediction of traits from a transcriptome with the lasso.
format article
author Yuichi Okinaga
Daisuke Kyogoku
Satoshi Kondo
Atsushi J. Nagano
Kei Hirose
author_facet Yuichi Okinaga
Daisuke Kyogoku
Satoshi Kondo
Atsushi J. Nagano
Kei Hirose
author_sort Yuichi Okinaga
title Relationship between gene regulation network structure and prediction accuracy in high dimensional regression
title_short Relationship between gene regulation network structure and prediction accuracy in high dimensional regression
title_full Relationship between gene regulation network structure and prediction accuracy in high dimensional regression
title_fullStr Relationship between gene regulation network structure and prediction accuracy in high dimensional regression
title_full_unstemmed Relationship between gene regulation network structure and prediction accuracy in high dimensional regression
title_sort relationship between gene regulation network structure and prediction accuracy in high dimensional regression
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/38c9865acb7641c6914c6399fe62353d
work_keys_str_mv AT yuichiokinaga relationshipbetweengeneregulationnetworkstructureandpredictionaccuracyinhighdimensionalregression
AT daisukekyogoku relationshipbetweengeneregulationnetworkstructureandpredictionaccuracyinhighdimensionalregression
AT satoshikondo relationshipbetweengeneregulationnetworkstructureandpredictionaccuracyinhighdimensionalregression
AT atsushijnagano relationshipbetweengeneregulationnetworkstructureandpredictionaccuracyinhighdimensionalregression
AT keihirose relationshipbetweengeneregulationnetworkstructureandpredictionaccuracyinhighdimensionalregression
_version_ 1718379219159875584