Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations

Abstract Principal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations. Ancestry Informative Markers (AIMs) are a small subset of SNPs capable of distinguishing populations. We integrate these two approaches by proposing an algorithm to ide...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Sridevi Padakanti, Khong-Loon Tiong, Yan-Bin Chen, Chen-Hsiang Yeang
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/a5c4bbe9b80145ef86edf6f369e464c3
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:a5c4bbe9b80145ef86edf6f369e464c3
record_format dspace
spelling oai:doaj.org-article:a5c4bbe9b80145ef86edf6f369e464c32021-12-02T17:41:13ZGenotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations10.1038/s41598-021-97129-22045-2322https://doaj.org/article/a5c4bbe9b80145ef86edf6f369e464c32021-09-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-97129-2https://doaj.org/toc/2045-2322Abstract Principal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations. Ancestry Informative Markers (AIMs) are a small subset of SNPs capable of distinguishing populations. We integrate these two approaches by proposing an algorithm to identify necessary informative loci whose removal from the data deteriorates the PCA structure. Unlike classical AIMs, necessary informative loci densely cover the genome, hence can illuminate the evolution and mixing history of populations. We conduct a comprehensive analysis to the genotype data of the 1000 Genomes Project using necessary informative loci. Projections along the top seven principal components demarcate populations at distinct geographic levels. Millions of necessary informative loci along each PC are identified. Population identities along each PC are approximately determined by weighted sums of minor (or major) alleles over the informative loci. Variations of allele frequencies are aligned with the history and direction of population evolution. The population distribution of projections along the top three PCs is recapitulated by a simple demographic model based on several waves of founder population separation and mixing. Informative loci possess locational concentration in the genome and functional enrichment. Genes at two hot spots encompassing dense PC 7 informative loci exhibit differential expressions among European populations. The mosaic of local ancestry in the genome of a mixed descendant from multiple populations can be inferred from partial PCA projections of informative loci. Finally, informative loci derived from the 1000 Genomes data well predict the projections of an independent genotype data of South Asians. These results demonstrate the utility and relevance of informative loci to investigate human evolution.Sridevi PadakantiKhong-Loon TiongYan-Bin ChenChen-Hsiang YeangNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-18 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Sridevi Padakanti
Khong-Loon Tiong
Yan-Bin Chen
Chen-Hsiang Yeang
Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
description Abstract Principal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations. Ancestry Informative Markers (AIMs) are a small subset of SNPs capable of distinguishing populations. We integrate these two approaches by proposing an algorithm to identify necessary informative loci whose removal from the data deteriorates the PCA structure. Unlike classical AIMs, necessary informative loci densely cover the genome, hence can illuminate the evolution and mixing history of populations. We conduct a comprehensive analysis to the genotype data of the 1000 Genomes Project using necessary informative loci. Projections along the top seven principal components demarcate populations at distinct geographic levels. Millions of necessary informative loci along each PC are identified. Population identities along each PC are approximately determined by weighted sums of minor (or major) alleles over the informative loci. Variations of allele frequencies are aligned with the history and direction of population evolution. The population distribution of projections along the top three PCs is recapitulated by a simple demographic model based on several waves of founder population separation and mixing. Informative loci possess locational concentration in the genome and functional enrichment. Genes at two hot spots encompassing dense PC 7 informative loci exhibit differential expressions among European populations. The mosaic of local ancestry in the genome of a mixed descendant from multiple populations can be inferred from partial PCA projections of informative loci. Finally, informative loci derived from the 1000 Genomes data well predict the projections of an independent genotype data of South Asians. These results demonstrate the utility and relevance of informative loci to investigate human evolution.
format article
author Sridevi Padakanti
Khong-Loon Tiong
Yan-Bin Chen
Chen-Hsiang Yeang
author_facet Sridevi Padakanti
Khong-Loon Tiong
Yan-Bin Chen
Chen-Hsiang Yeang
author_sort Sridevi Padakanti
title Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
title_short Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
title_full Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
title_fullStr Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
title_full_unstemmed Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
title_sort genotypes of informative loci from 1000 genomes data allude evolution and mixing of human populations
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/a5c4bbe9b80145ef86edf6f369e464c3
work_keys_str_mv AT sridevipadakanti genotypesofinformativelocifrom1000genomesdataalludeevolutionandmixingofhumanpopulations
AT khongloontiong genotypesofinformativelocifrom1000genomesdataalludeevolutionandmixingofhumanpopulations
AT yanbinchen genotypesofinformativelocifrom1000genomesdataalludeevolutionandmixingofhumanpopulations
AT chenhsiangyeang genotypesofinformativelocifrom1000genomesdataalludeevolutionandmixingofhumanpopulations
_version_ 1718379721241133056