Fine population structure analysis method for genomes of many

Abstract Fine population structure can be examined through the clustering of individuals into subpopulations. The clustering of individuals in large sequence datasets into subpopulations makes the calculation of subpopulation specific allele frequency possible, which may shed light on selection of c...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Xuedong Pan, Yi Wang, Emily H. M. Wong, Amalio Telenti, J. Craig Venter, Li Jin
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2017
Materias:
R
Q
Acceso en línea:https://doaj.org/article/c200df0cc40b4ab69b6fd5028613b674
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:c200df0cc40b4ab69b6fd5028613b674
record_format dspace
spelling oai:doaj.org-article:c200df0cc40b4ab69b6fd5028613b6742021-12-02T15:06:04ZFine population structure analysis method for genomes of many10.1038/s41598-017-12319-12045-2322https://doaj.org/article/c200df0cc40b4ab69b6fd5028613b6742017-10-01T00:00:00Zhttps://doi.org/10.1038/s41598-017-12319-1https://doaj.org/toc/2045-2322Abstract Fine population structure can be examined through the clustering of individuals into subpopulations. The clustering of individuals in large sequence datasets into subpopulations makes the calculation of subpopulation specific allele frequency possible, which may shed light on selection of candidate variants for rare diseases. However, as the magnitude of the data increases, computational burden becomes a challenge in fine population structure analysis. To address this issue, we propose fine population structure analysis (FIPSA), which is an individual-based non-parametric method for dissecting fine population structure. FIPSA maximizes the likelihood ratio of the contingency table of the allele counts multiplied by the group. We demonstrated that its speed and accuracy were superior to existing non-parametric methods when the simulated sample size was up to 5,000 individuals. When applied to real data, the method showed high resolution on the Human Genome Diversity Project (HGDP) East Asian dataset. FIPSA was independently validated on 11,257 human genomes. The group assignment given by FIPSA was 99.1% similar to those assigned based on supervised learning. Thus, FIPSA provides high resolution and is compatible with a real dataset of more than ten thousand individuals.Xuedong PanYi WangEmily H. M. WongAmalio TelentiJ. Craig VenterLi JinNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 7, Iss 1, Pp 1-9 (2017)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Xuedong Pan
Yi Wang
Emily H. M. Wong
Amalio Telenti
J. Craig Venter
Li Jin
Fine population structure analysis method for genomes of many
description Abstract Fine population structure can be examined through the clustering of individuals into subpopulations. The clustering of individuals in large sequence datasets into subpopulations makes the calculation of subpopulation specific allele frequency possible, which may shed light on selection of candidate variants for rare diseases. However, as the magnitude of the data increases, computational burden becomes a challenge in fine population structure analysis. To address this issue, we propose fine population structure analysis (FIPSA), which is an individual-based non-parametric method for dissecting fine population structure. FIPSA maximizes the likelihood ratio of the contingency table of the allele counts multiplied by the group. We demonstrated that its speed and accuracy were superior to existing non-parametric methods when the simulated sample size was up to 5,000 individuals. When applied to real data, the method showed high resolution on the Human Genome Diversity Project (HGDP) East Asian dataset. FIPSA was independently validated on 11,257 human genomes. The group assignment given by FIPSA was 99.1% similar to those assigned based on supervised learning. Thus, FIPSA provides high resolution and is compatible with a real dataset of more than ten thousand individuals.
format article
author Xuedong Pan
Yi Wang
Emily H. M. Wong
Amalio Telenti
J. Craig Venter
Li Jin
author_facet Xuedong Pan
Yi Wang
Emily H. M. Wong
Amalio Telenti
J. Craig Venter
Li Jin
author_sort Xuedong Pan
title Fine population structure analysis method for genomes of many
title_short Fine population structure analysis method for genomes of many
title_full Fine population structure analysis method for genomes of many
title_fullStr Fine population structure analysis method for genomes of many
title_full_unstemmed Fine population structure analysis method for genomes of many
title_sort fine population structure analysis method for genomes of many
publisher Nature Portfolio
publishDate 2017
url https://doaj.org/article/c200df0cc40b4ab69b6fd5028613b674
work_keys_str_mv AT xuedongpan finepopulationstructureanalysismethodforgenomesofmany
AT yiwang finepopulationstructureanalysismethodforgenomesofmany
AT emilyhmwong finepopulationstructureanalysismethodforgenomesofmany
AT amaliotelenti finepopulationstructureanalysismethodforgenomesofmany
AT jcraigventer finepopulationstructureanalysismethodforgenomesofmany
AT lijin finepopulationstructureanalysismethodforgenomesofmany
_version_ 1718388608153419776