Genetic classification of populations using supervised learning.

There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study ha...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Michael Bridges, Elizabeth A Heron, Colm O'Dushlaine, Ricardo Segurado, International Schizophrenia Consortium (ISC), Derek Morris, Aiden Corvin, Michael Gill, Carlos Pinto
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2011
Materias:
R
Q
Acceso en línea:https://doaj.org/article/a816b14065fb44ddab5fcccfa6dae0ea
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:a816b14065fb44ddab5fcccfa6dae0ea
record_format dspace
spelling oai:doaj.org-article:a816b14065fb44ddab5fcccfa6dae0ea2021-11-18T06:54:03ZGenetic classification of populations using supervised learning.1932-620310.1371/journal.pone.0014802https://doaj.org/article/a816b14065fb44ddab5fcccfa6dae0ea2011-05-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/21589856/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.Michael BridgesElizabeth A HeronColm O'DushlaineRicardo SeguradoInternational Schizophrenia Consortium (ISC)Derek MorrisAiden CorvinMichael GillCarlos PintoPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 6, Iss 5, p e14802 (2011)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Michael Bridges
Elizabeth A Heron
Colm O'Dushlaine
Ricardo Segurado
International Schizophrenia Consortium (ISC)
Derek Morris
Aiden Corvin
Michael Gill
Carlos Pinto
Genetic classification of populations using supervised learning.
description There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.
format article
author Michael Bridges
Elizabeth A Heron
Colm O'Dushlaine
Ricardo Segurado
International Schizophrenia Consortium (ISC)
Derek Morris
Aiden Corvin
Michael Gill
Carlos Pinto
author_facet Michael Bridges
Elizabeth A Heron
Colm O'Dushlaine
Ricardo Segurado
International Schizophrenia Consortium (ISC)
Derek Morris
Aiden Corvin
Michael Gill
Carlos Pinto
author_sort Michael Bridges
title Genetic classification of populations using supervised learning.
title_short Genetic classification of populations using supervised learning.
title_full Genetic classification of populations using supervised learning.
title_fullStr Genetic classification of populations using supervised learning.
title_full_unstemmed Genetic classification of populations using supervised learning.
title_sort genetic classification of populations using supervised learning.
publisher Public Library of Science (PLoS)
publishDate 2011
url https://doaj.org/article/a816b14065fb44ddab5fcccfa6dae0ea
work_keys_str_mv AT michaelbridges geneticclassificationofpopulationsusingsupervisedlearning
AT elizabethaheron geneticclassificationofpopulationsusingsupervisedlearning
AT colmodushlaine geneticclassificationofpopulationsusingsupervisedlearning
AT ricardosegurado geneticclassificationofpopulationsusingsupervisedlearning
AT internationalschizophreniaconsortiumisc geneticclassificationofpopulationsusingsupervisedlearning
AT derekmorris geneticclassificationofpopulationsusingsupervisedlearning
AT aidencorvin geneticclassificationofpopulationsusingsupervisedlearning
AT michaelgill geneticclassificationofpopulationsusingsupervisedlearning
AT carlospinto geneticclassificationofpopulationsusingsupervisedlearning
_version_ 1718424246544236544