Fine population structure analysis method for genomes of many
Abstract Fine population structure can be examined through the clustering of individuals into subpopulations. The clustering of individuals in large sequence datasets into subpopulations makes the calculation of subpopulation specific allele frequency possible, which may shed light on selection of c...
Guardado en:
Autores principales: | , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Nature Portfolio
2017
|
Materias: | |
Acceso en línea: | https://doaj.org/article/c200df0cc40b4ab69b6fd5028613b674 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:c200df0cc40b4ab69b6fd5028613b674 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:c200df0cc40b4ab69b6fd5028613b6742021-12-02T15:06:04ZFine population structure analysis method for genomes of many10.1038/s41598-017-12319-12045-2322https://doaj.org/article/c200df0cc40b4ab69b6fd5028613b6742017-10-01T00:00:00Zhttps://doi.org/10.1038/s41598-017-12319-1https://doaj.org/toc/2045-2322Abstract Fine population structure can be examined through the clustering of individuals into subpopulations. The clustering of individuals in large sequence datasets into subpopulations makes the calculation of subpopulation specific allele frequency possible, which may shed light on selection of candidate variants for rare diseases. However, as the magnitude of the data increases, computational burden becomes a challenge in fine population structure analysis. To address this issue, we propose fine population structure analysis (FIPSA), which is an individual-based non-parametric method for dissecting fine population structure. FIPSA maximizes the likelihood ratio of the contingency table of the allele counts multiplied by the group. We demonstrated that its speed and accuracy were superior to existing non-parametric methods when the simulated sample size was up to 5,000 individuals. When applied to real data, the method showed high resolution on the Human Genome Diversity Project (HGDP) East Asian dataset. FIPSA was independently validated on 11,257 human genomes. The group assignment given by FIPSA was 99.1% similar to those assigned based on supervised learning. Thus, FIPSA provides high resolution and is compatible with a real dataset of more than ten thousand individuals.Xuedong PanYi WangEmily H. M. WongAmalio TelentiJ. Craig VenterLi JinNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 7, Iss 1, Pp 1-9 (2017) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Medicine R Science Q |
spellingShingle |
Medicine R Science Q Xuedong Pan Yi Wang Emily H. M. Wong Amalio Telenti J. Craig Venter Li Jin Fine population structure analysis method for genomes of many |
description |
Abstract Fine population structure can be examined through the clustering of individuals into subpopulations. The clustering of individuals in large sequence datasets into subpopulations makes the calculation of subpopulation specific allele frequency possible, which may shed light on selection of candidate variants for rare diseases. However, as the magnitude of the data increases, computational burden becomes a challenge in fine population structure analysis. To address this issue, we propose fine population structure analysis (FIPSA), which is an individual-based non-parametric method for dissecting fine population structure. FIPSA maximizes the likelihood ratio of the contingency table of the allele counts multiplied by the group. We demonstrated that its speed and accuracy were superior to existing non-parametric methods when the simulated sample size was up to 5,000 individuals. When applied to real data, the method showed high resolution on the Human Genome Diversity Project (HGDP) East Asian dataset. FIPSA was independently validated on 11,257 human genomes. The group assignment given by FIPSA was 99.1% similar to those assigned based on supervised learning. Thus, FIPSA provides high resolution and is compatible with a real dataset of more than ten thousand individuals. |
format |
article |
author |
Xuedong Pan Yi Wang Emily H. M. Wong Amalio Telenti J. Craig Venter Li Jin |
author_facet |
Xuedong Pan Yi Wang Emily H. M. Wong Amalio Telenti J. Craig Venter Li Jin |
author_sort |
Xuedong Pan |
title |
Fine population structure analysis method for genomes of many |
title_short |
Fine population structure analysis method for genomes of many |
title_full |
Fine population structure analysis method for genomes of many |
title_fullStr |
Fine population structure analysis method for genomes of many |
title_full_unstemmed |
Fine population structure analysis method for genomes of many |
title_sort |
fine population structure analysis method for genomes of many |
publisher |
Nature Portfolio |
publishDate |
2017 |
url |
https://doaj.org/article/c200df0cc40b4ab69b6fd5028613b674 |
work_keys_str_mv |
AT xuedongpan finepopulationstructureanalysismethodforgenomesofmany AT yiwang finepopulationstructureanalysismethodforgenomesofmany AT emilyhmwong finepopulationstructureanalysismethodforgenomesofmany AT amaliotelenti finepopulationstructureanalysismethodforgenomesofmany AT jcraigventer finepopulationstructureanalysismethodforgenomesofmany AT lijin finepopulationstructureanalysismethodforgenomesofmany |
_version_ |
1718388608153419776 |