Reproducibility in the UK biobank of genome-wide significant signals discovered in earlier genome-wide association studies

Abstract With the establishment of large biobanks, discovery of single nucleotide variants (SNVs, also known as single nucleotide polymorphisms (SNVs)) associated with various phenotypes has accelerated. An open question is whether genome-wide significant SNVs identified in earlier genome-wide assoc...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Jack W. O’Sullivan, John P. A. Ioannidis
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/e525b95011534433b200d6aaa6cf748d
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Sumario:Abstract With the establishment of large biobanks, discovery of single nucleotide variants (SNVs, also known as single nucleotide polymorphisms (SNVs)) associated with various phenotypes has accelerated. An open question is whether genome-wide significant SNVs identified in earlier genome-wide association studies (GWAS) are replicated in later GWAS conducted in biobanks. To address this, we examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, “discovery” GWAS and a later, “replication” GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNVs (of which 6289 reached P < 5e−8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0%; although lower for binary than quantitative phenotypes (58.1% versus 94.8% respectively). There was a 18.0% decrease in SNV effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNV effect size, phenotype trait (binary or quantitative), and discovery P value, we built and validated a model that predicted SNV replication with area under the Receiver Operator Curve = 0.90. While non-replication may reflect lack of power rather than genuine false-positives, these results provide insights about which discovered associations are likely to be replicated across subsequent GWAS.