Epidemiological associations with genomic variation in SARS-CoV-2

Abstract SARS-CoV-2 (CoV) is the etiological agent of the COVID-19 pandemic and evolves to evade both host immune systems and intervention strategies. We divided the CoV genome into 29 constituent regions and applied novel analytical approaches to identify associations between CoV genomic features a...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ali Rahnavard, Tyson Dawson, Rebecca Clement, Nathaniel Stearrett, Marcos Pérez-Losada, Keith A. Crandall
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/f38b7a7ac21d46d6bbf4eef2b9aa3065
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:f38b7a7ac21d46d6bbf4eef2b9aa3065
record_format dspace
spelling oai:doaj.org-article:f38b7a7ac21d46d6bbf4eef2b9aa30652021-11-28T12:15:32ZEpidemiological associations with genomic variation in SARS-CoV-210.1038/s41598-021-02548-w2045-2322https://doaj.org/article/f38b7a7ac21d46d6bbf4eef2b9aa30652021-11-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-02548-whttps://doaj.org/toc/2045-2322Abstract SARS-CoV-2 (CoV) is the etiological agent of the COVID-19 pandemic and evolves to evade both host immune systems and intervention strategies. We divided the CoV genome into 29 constituent regions and applied novel analytical approaches to identify associations between CoV genomic features and epidemiological metadata. Our results show that nonstructural protein 3 (nsp3) and Spike protein (S) have the highest variation and greatest correlation with the viral whole-genome variation. S protein variation is correlated with nsp3, nsp6, and 3′-to-5′ exonuclease variation. Country of origin and time since the start of the pandemic were the most influential metadata associated with genomic variation, while host sex and age were the least influential. We define a novel statistic—coherence—and show its utility in identifying geographic regions (populations) with unusually high (many new variants) or low (isolated) viral phylogenetic diversity. Interestingly, at both global and regional scales, we identify geographic locations with high coherence neighboring regions of low coherence; this emphasizes the utility of this metric to inform public health measures for disease spread. Our results provide a direction to prioritize genes associated with outcome predictors (e.g., health, therapeutic, and vaccine outcomes) and to improve DNA tests for predicting disease status.Ali RahnavardTyson DawsonRebecca ClementNathaniel StearrettMarcos Pérez-LosadaKeith A. CrandallNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-10 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Ali Rahnavard
Tyson Dawson
Rebecca Clement
Nathaniel Stearrett
Marcos Pérez-Losada
Keith A. Crandall
Epidemiological associations with genomic variation in SARS-CoV-2
description Abstract SARS-CoV-2 (CoV) is the etiological agent of the COVID-19 pandemic and evolves to evade both host immune systems and intervention strategies. We divided the CoV genome into 29 constituent regions and applied novel analytical approaches to identify associations between CoV genomic features and epidemiological metadata. Our results show that nonstructural protein 3 (nsp3) and Spike protein (S) have the highest variation and greatest correlation with the viral whole-genome variation. S protein variation is correlated with nsp3, nsp6, and 3′-to-5′ exonuclease variation. Country of origin and time since the start of the pandemic were the most influential metadata associated with genomic variation, while host sex and age were the least influential. We define a novel statistic—coherence—and show its utility in identifying geographic regions (populations) with unusually high (many new variants) or low (isolated) viral phylogenetic diversity. Interestingly, at both global and regional scales, we identify geographic locations with high coherence neighboring regions of low coherence; this emphasizes the utility of this metric to inform public health measures for disease spread. Our results provide a direction to prioritize genes associated with outcome predictors (e.g., health, therapeutic, and vaccine outcomes) and to improve DNA tests for predicting disease status.
format article
author Ali Rahnavard
Tyson Dawson
Rebecca Clement
Nathaniel Stearrett
Marcos Pérez-Losada
Keith A. Crandall
author_facet Ali Rahnavard
Tyson Dawson
Rebecca Clement
Nathaniel Stearrett
Marcos Pérez-Losada
Keith A. Crandall
author_sort Ali Rahnavard
title Epidemiological associations with genomic variation in SARS-CoV-2
title_short Epidemiological associations with genomic variation in SARS-CoV-2
title_full Epidemiological associations with genomic variation in SARS-CoV-2
title_fullStr Epidemiological associations with genomic variation in SARS-CoV-2
title_full_unstemmed Epidemiological associations with genomic variation in SARS-CoV-2
title_sort epidemiological associations with genomic variation in sars-cov-2
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/f38b7a7ac21d46d6bbf4eef2b9aa3065
work_keys_str_mv AT alirahnavard epidemiologicalassociationswithgenomicvariationinsarscov2
AT tysondawson epidemiologicalassociationswithgenomicvariationinsarscov2
AT rebeccaclement epidemiologicalassociationswithgenomicvariationinsarscov2
AT nathanielstearrett epidemiologicalassociationswithgenomicvariationinsarscov2
AT marcosperezlosada epidemiologicalassociationswithgenomicvariationinsarscov2
AT keithacrandall epidemiologicalassociationswithgenomicvariationinsarscov2
_version_ 1718408113138171904