Predicting pathogenic non-coding SVs disrupting the 3D genome in 1646 whole cancer genomes using multiple instance learning

Abstract Over the past years, large consortia have been established to fuel the sequencing of whole genomes of many cancer patients. Despite the increased abundance in tools to study the impact of SNVs, non-coding SVs have been largely ignored in these data. Here, we introduce svMIL2, an improved ve...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Marleen M. Nieboer, Luan Nguyen, Jeroen de Ridder
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/0b53ef2e3f784aba84c29c9f9d64c804
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:0b53ef2e3f784aba84c29c9f9d64c804
record_format dspace
spelling oai:doaj.org-article:0b53ef2e3f784aba84c29c9f9d64c8042021-12-02T16:14:17ZPredicting pathogenic non-coding SVs disrupting the 3D genome in 1646 whole cancer genomes using multiple instance learning10.1038/s41598-021-93917-y2045-2322https://doaj.org/article/0b53ef2e3f784aba84c29c9f9d64c8042021-07-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-93917-yhttps://doaj.org/toc/2045-2322Abstract Over the past years, large consortia have been established to fuel the sequencing of whole genomes of many cancer patients. Despite the increased abundance in tools to study the impact of SNVs, non-coding SVs have been largely ignored in these data. Here, we introduce svMIL2, an improved version of our Multiple Instance Learning-based method to study the effect of somatic non-coding SVs disrupting boundaries of TADs and CTCF loops in 1646 cancer genomes. We demonstrate that svMIL2 predicts pathogenic non-coding SVs with an average AUC of 0.86 across 12 cancer types, and identifies non-coding SVs affecting well-known driver genes. The disruption of active (super) enhancers in open chromatin regions appears to be a common mechanism by which non-coding SVs exert their pathogenicity. Finally, our results reveal that the contribution of pathogenic non-coding SVs as opposed to driver SNVs may highly vary between cancers, with notably high numbers of genes being disrupted by pathogenic non-coding SVs in ovarian and pancreatic cancer. Taken together, our machine learning method offers a potent way to prioritize putatively pathogenic non-coding SVs and leverage non-coding SVs to identify driver genes. Moreover, our analysis of 1646 cancer genomes demonstrates the importance of including non-coding SVs in cancer diagnostics.Marleen M. NieboerLuan NguyenJeroen de RidderNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-13 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Marleen M. Nieboer
Luan Nguyen
Jeroen de Ridder
Predicting pathogenic non-coding SVs disrupting the 3D genome in 1646 whole cancer genomes using multiple instance learning
description Abstract Over the past years, large consortia have been established to fuel the sequencing of whole genomes of many cancer patients. Despite the increased abundance in tools to study the impact of SNVs, non-coding SVs have been largely ignored in these data. Here, we introduce svMIL2, an improved version of our Multiple Instance Learning-based method to study the effect of somatic non-coding SVs disrupting boundaries of TADs and CTCF loops in 1646 cancer genomes. We demonstrate that svMIL2 predicts pathogenic non-coding SVs with an average AUC of 0.86 across 12 cancer types, and identifies non-coding SVs affecting well-known driver genes. The disruption of active (super) enhancers in open chromatin regions appears to be a common mechanism by which non-coding SVs exert their pathogenicity. Finally, our results reveal that the contribution of pathogenic non-coding SVs as opposed to driver SNVs may highly vary between cancers, with notably high numbers of genes being disrupted by pathogenic non-coding SVs in ovarian and pancreatic cancer. Taken together, our machine learning method offers a potent way to prioritize putatively pathogenic non-coding SVs and leverage non-coding SVs to identify driver genes. Moreover, our analysis of 1646 cancer genomes demonstrates the importance of including non-coding SVs in cancer diagnostics.
format article
author Marleen M. Nieboer
Luan Nguyen
Jeroen de Ridder
author_facet Marleen M. Nieboer
Luan Nguyen
Jeroen de Ridder
author_sort Marleen M. Nieboer
title Predicting pathogenic non-coding SVs disrupting the 3D genome in 1646 whole cancer genomes using multiple instance learning
title_short Predicting pathogenic non-coding SVs disrupting the 3D genome in 1646 whole cancer genomes using multiple instance learning
title_full Predicting pathogenic non-coding SVs disrupting the 3D genome in 1646 whole cancer genomes using multiple instance learning
title_fullStr Predicting pathogenic non-coding SVs disrupting the 3D genome in 1646 whole cancer genomes using multiple instance learning
title_full_unstemmed Predicting pathogenic non-coding SVs disrupting the 3D genome in 1646 whole cancer genomes using multiple instance learning
title_sort predicting pathogenic non-coding svs disrupting the 3d genome in 1646 whole cancer genomes using multiple instance learning
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/0b53ef2e3f784aba84c29c9f9d64c804
work_keys_str_mv AT marleenmnieboer predictingpathogenicnoncodingsvsdisruptingthe3dgenomein1646wholecancergenomesusingmultipleinstancelearning
AT luannguyen predictingpathogenicnoncodingsvsdisruptingthe3dgenomein1646wholecancergenomesusingmultipleinstancelearning
AT jeroenderidder predictingpathogenicnoncodingsvsdisruptingthe3dgenomein1646wholecancergenomesusingmultipleinstancelearning
_version_ 1718384333463486464