“Guilt by association” is not competitive with genetic association for identifying autism risk genes

Abstract Discovering genes involved in complex human genetic disorders is a major challenge. Many have suggested that machine learning (ML) algorithms using gene networks can be used to supplement traditional genetic association-based approaches to predict or prioritize disease genes. However, quest...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Margot Gunning, Paul Pavlidis
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/57ad295e685c4a7cb9497eadb8411947
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:57ad295e685c4a7cb9497eadb8411947
record_format dspace
spelling oai:doaj.org-article:57ad295e685c4a7cb9497eadb84119472021-12-02T18:49:36Z“Guilt by association” is not competitive with genetic association for identifying autism risk genes10.1038/s41598-021-95321-y2045-2322https://doaj.org/article/57ad295e685c4a7cb9497eadb84119472021-08-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-95321-yhttps://doaj.org/toc/2045-2322Abstract Discovering genes involved in complex human genetic disorders is a major challenge. Many have suggested that machine learning (ML) algorithms using gene networks can be used to supplement traditional genetic association-based approaches to predict or prioritize disease genes. However, questions have been raised about the utility of ML methods for this type of task due to biases within the data, and poor real-world performance. Using autism spectrum disorder (ASD) as a test case, we sought to investigate the question: can machine learning aid in the discovery of disease genes? We collected 13 published ASD gene prioritization studies and evaluated their performance using known and novel high-confidence ASD genes. We also investigated their biases towards generic gene annotations, like number of association publications. We found that ML methods which do not incorporate genetics information have limited utility for prioritization of ASD risk genes. These studies perform at a comparable level to generic measures of likelihood for the involvement of genes in any condition, and do not out-perform genetic association studies. Future efforts to discover disease genes should be focused on developing and validating statistical models for genetic association, specifically for association between rare variants and disease, rather than developing complex machine learning methods using complex heterogeneous biological data with unknown reliability.Margot GunningPaul PavlidisNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-15 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Margot Gunning
Paul Pavlidis
“Guilt by association” is not competitive with genetic association for identifying autism risk genes
description Abstract Discovering genes involved in complex human genetic disorders is a major challenge. Many have suggested that machine learning (ML) algorithms using gene networks can be used to supplement traditional genetic association-based approaches to predict or prioritize disease genes. However, questions have been raised about the utility of ML methods for this type of task due to biases within the data, and poor real-world performance. Using autism spectrum disorder (ASD) as a test case, we sought to investigate the question: can machine learning aid in the discovery of disease genes? We collected 13 published ASD gene prioritization studies and evaluated their performance using known and novel high-confidence ASD genes. We also investigated their biases towards generic gene annotations, like number of association publications. We found that ML methods which do not incorporate genetics information have limited utility for prioritization of ASD risk genes. These studies perform at a comparable level to generic measures of likelihood for the involvement of genes in any condition, and do not out-perform genetic association studies. Future efforts to discover disease genes should be focused on developing and validating statistical models for genetic association, specifically for association between rare variants and disease, rather than developing complex machine learning methods using complex heterogeneous biological data with unknown reliability.
format article
author Margot Gunning
Paul Pavlidis
author_facet Margot Gunning
Paul Pavlidis
author_sort Margot Gunning
title “Guilt by association” is not competitive with genetic association for identifying autism risk genes
title_short “Guilt by association” is not competitive with genetic association for identifying autism risk genes
title_full “Guilt by association” is not competitive with genetic association for identifying autism risk genes
title_fullStr “Guilt by association” is not competitive with genetic association for identifying autism risk genes
title_full_unstemmed “Guilt by association” is not competitive with genetic association for identifying autism risk genes
title_sort “guilt by association” is not competitive with genetic association for identifying autism risk genes
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/57ad295e685c4a7cb9497eadb8411947
work_keys_str_mv AT margotgunning guiltbyassociationisnotcompetitivewithgeneticassociationforidentifyingautismriskgenes
AT paulpavlidis guiltbyassociationisnotcompetitivewithgeneticassociationforidentifyingautismriskgenes
_version_ 1718377561063424000