Constructing germline research cohorts from the discarded reads of clinical tumor sequences

Abstract Background Hundreds of thousands of cancer patients have had targeted (panel) tumor sequencing to identify clinically meaningful mutations. In addition to improving patient outcomes, this activity has led to significant discoveries in basic and translational domains. However, the targeted n...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Alexander Gusev, Stefan Groha, Kodi Taraszka, Yevgeniy R. Semenov, Noah Zaitlen
Formato:	article
Lenguaje:	EN
Publicado:	BMC 2021
Materias:	Medicine R Genetics QH426-470
Acceso en línea:	https://doaj.org/article/dfd1ed628f8e4314b50a12954fd90ff0
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:dfd1ed628f8e4314b50a12954fd90ff0
record_format	dspace
spelling	oai:doaj.org-article:dfd1ed628f8e4314b50a12954fd90ff02021-11-14T12:27:40ZConstructing germline research cohorts from the discarded reads of clinical tumor sequences10.1186/s13073-021-00999-41756-994Xhttps://doaj.org/article/dfd1ed628f8e4314b50a12954fd90ff02021-11-01T00:00:00Zhttps://doi.org/10.1186/s13073-021-00999-4https://doaj.org/toc/1756-994XAbstract Background Hundreds of thousands of cancer patients have had targeted (panel) tumor sequencing to identify clinically meaningful mutations. In addition to improving patient outcomes, this activity has led to significant discoveries in basic and translational domains. However, the targeted nature of clinical tumor sequencing has a limited scope, especially for germline genetics. In this work, we assess the utility of discarded, off-target reads from tumor-only panel sequencing for the recovery of genome-wide germline genotypes through imputation. Methods We developed a framework for inference of germline variants from tumor panel sequencing, including imputation, quality control, inference of genetic ancestry, germline polygenic risk scores, and HLA alleles. We benchmarked our framework on 833 individuals with tumor sequencing and matched germline SNP array data. We then applied our approach to a prospectively collected panel sequencing cohort of 25,889 tumors. Results We demonstrate high to moderate accuracy of each inferred feature relative to direct germline SNP array genotyping: individual common variants were imputed with a mean accuracy (correlation) of 0.86, genetic ancestry was inferred with a correlation of > 0.98, polygenic risk scores were inferred with a correlation of > 0.90, and individual HLA alleles were inferred with a correlation of > 0.80. We demonstrate a minimal influence on the accuracy of somatic copy number alterations and other tumor features. We showcase the feasibility and utility of our framework by analyzing 25,889 tumors and identifying the relationships between genetic ancestry, polygenic risk, and tumor characteristics that could not be studied with conventional on-target tumor data. Conclusions We conclude that targeted tumor sequencing can be leveraged to build rich germline research cohorts from existing data and make our analysis pipeline publicly available to facilitate this effort.Alexander GusevStefan GrohaKodi TaraszkaYevgeniy R. SemenovNoah ZaitlenBMCarticleMedicineRGeneticsQH426-470ENGenome Medicine, Vol 13, Iss 1, Pp 1-14 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Medicine R Genetics QH426-470
spellingShingle	Medicine R Genetics QH426-470 Alexander Gusev Stefan Groha Kodi Taraszka Yevgeniy R. Semenov Noah Zaitlen Constructing germline research cohorts from the discarded reads of clinical tumor sequences
description	Abstract Background Hundreds of thousands of cancer patients have had targeted (panel) tumor sequencing to identify clinically meaningful mutations. In addition to improving patient outcomes, this activity has led to significant discoveries in basic and translational domains. However, the targeted nature of clinical tumor sequencing has a limited scope, especially for germline genetics. In this work, we assess the utility of discarded, off-target reads from tumor-only panel sequencing for the recovery of genome-wide germline genotypes through imputation. Methods We developed a framework for inference of germline variants from tumor panel sequencing, including imputation, quality control, inference of genetic ancestry, germline polygenic risk scores, and HLA alleles. We benchmarked our framework on 833 individuals with tumor sequencing and matched germline SNP array data. We then applied our approach to a prospectively collected panel sequencing cohort of 25,889 tumors. Results We demonstrate high to moderate accuracy of each inferred feature relative to direct germline SNP array genotyping: individual common variants were imputed with a mean accuracy (correlation) of 0.86, genetic ancestry was inferred with a correlation of > 0.98, polygenic risk scores were inferred with a correlation of > 0.90, and individual HLA alleles were inferred with a correlation of > 0.80. We demonstrate a minimal influence on the accuracy of somatic copy number alterations and other tumor features. We showcase the feasibility and utility of our framework by analyzing 25,889 tumors and identifying the relationships between genetic ancestry, polygenic risk, and tumor characteristics that could not be studied with conventional on-target tumor data. Conclusions We conclude that targeted tumor sequencing can be leveraged to build rich germline research cohorts from existing data and make our analysis pipeline publicly available to facilitate this effort.
format	article
author	Alexander Gusev Stefan Groha Kodi Taraszka Yevgeniy R. Semenov Noah Zaitlen
author_facet	Alexander Gusev Stefan Groha Kodi Taraszka Yevgeniy R. Semenov Noah Zaitlen
author_sort	Alexander Gusev
title	Constructing germline research cohorts from the discarded reads of clinical tumor sequences
title_short	Constructing germline research cohorts from the discarded reads of clinical tumor sequences
title_full	Constructing germline research cohorts from the discarded reads of clinical tumor sequences
title_fullStr	Constructing germline research cohorts from the discarded reads of clinical tumor sequences
title_full_unstemmed	Constructing germline research cohorts from the discarded reads of clinical tumor sequences
title_sort	constructing germline research cohorts from the discarded reads of clinical tumor sequences
publisher	BMC
publishDate	2021
url	https://doaj.org/article/dfd1ed628f8e4314b50a12954fd90ff0
work_keys_str_mv	AT alexandergusev constructinggermlineresearchcohortsfromthediscardedreadsofclinicaltumorsequences AT stefangroha constructinggermlineresearchcohortsfromthediscardedreadsofclinicaltumorsequences AT koditaraszka constructinggermlineresearchcohortsfromthediscardedreadsofclinicaltumorsequences AT yevgeniyrsemenov constructinggermlineresearchcohortsfromthediscardedreadsofclinicaltumorsequences AT noahzaitlen constructinggermlineresearchcohortsfromthediscardedreadsofclinicaltumorsequences
_version_	1718429261038092288

Constructing germline research cohorts from the discarded reads of clinical tumor sequences

Ejemplares similares