HiCancer: accurate and complete cancer genome phasing with Hi-C reads

Abstract Due to the high complexity of cancer genome, it is too difficult to generate complete cancer genome map which contains the sequence of every DNA molecule until now. Nevertheless, phasing each chromosome in cancer genome into two haplotypes according to germline mutations provides a suboptim...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Weihua Pan, Desheng Gong, Da Sun, Haohui Luo
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/460b0ebe7f5446b4a76b9c5335960bcc
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:460b0ebe7f5446b4a76b9c5335960bcc
record_format dspace
spelling oai:doaj.org-article:460b0ebe7f5446b4a76b9c5335960bcc2021-12-02T13:24:26ZHiCancer: accurate and complete cancer genome phasing with Hi-C reads10.1038/s41598-021-86104-62045-2322https://doaj.org/article/460b0ebe7f5446b4a76b9c5335960bcc2021-03-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-86104-6https://doaj.org/toc/2045-2322Abstract Due to the high complexity of cancer genome, it is too difficult to generate complete cancer genome map which contains the sequence of every DNA molecule until now. Nevertheless, phasing each chromosome in cancer genome into two haplotypes according to germline mutations provides a suboptimal solution to understand cancer genome. However, phasing cancer genome is also a challenging problem, due to the limit in experimental and computational technologies. Hi-C data is widely used in phasing in recent years due to its long-range linkage information and provides an opportunity for solving the problem of phasing cancer genome. The existing Hi-C based phasing methods can not be applied to cancer genome directly, because the somatic mutations in cancer genome such as somatic SNPs, copy number variations and structural variations greatly reduce the correctness and completeness. Here, we propose a new Hi-C based pipeline for phasing cancer genome called HiCancer. HiCancer solves different kinds of somatic mutations and variations, and take advantage of allelic copy number imbalance and linkage disequilibrium to improve the correctness and completeness of phasing. According to our experiments in K562 and KBM-7 cell lines, HiCancer is able to generate very high-quality chromosome-level haplotypes for cancer genome with only Hi-C data.Weihua PanDesheng GongDa SunHaohui LuoNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-10 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Weihua Pan
Desheng Gong
Da Sun
Haohui Luo
HiCancer: accurate and complete cancer genome phasing with Hi-C reads
description Abstract Due to the high complexity of cancer genome, it is too difficult to generate complete cancer genome map which contains the sequence of every DNA molecule until now. Nevertheless, phasing each chromosome in cancer genome into two haplotypes according to germline mutations provides a suboptimal solution to understand cancer genome. However, phasing cancer genome is also a challenging problem, due to the limit in experimental and computational technologies. Hi-C data is widely used in phasing in recent years due to its long-range linkage information and provides an opportunity for solving the problem of phasing cancer genome. The existing Hi-C based phasing methods can not be applied to cancer genome directly, because the somatic mutations in cancer genome such as somatic SNPs, copy number variations and structural variations greatly reduce the correctness and completeness. Here, we propose a new Hi-C based pipeline for phasing cancer genome called HiCancer. HiCancer solves different kinds of somatic mutations and variations, and take advantage of allelic copy number imbalance and linkage disequilibrium to improve the correctness and completeness of phasing. According to our experiments in K562 and KBM-7 cell lines, HiCancer is able to generate very high-quality chromosome-level haplotypes for cancer genome with only Hi-C data.
format article
author Weihua Pan
Desheng Gong
Da Sun
Haohui Luo
author_facet Weihua Pan
Desheng Gong
Da Sun
Haohui Luo
author_sort Weihua Pan
title HiCancer: accurate and complete cancer genome phasing with Hi-C reads
title_short HiCancer: accurate and complete cancer genome phasing with Hi-C reads
title_full HiCancer: accurate and complete cancer genome phasing with Hi-C reads
title_fullStr HiCancer: accurate and complete cancer genome phasing with Hi-C reads
title_full_unstemmed HiCancer: accurate and complete cancer genome phasing with Hi-C reads
title_sort hicancer: accurate and complete cancer genome phasing with hi-c reads
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/460b0ebe7f5446b4a76b9c5335960bcc
work_keys_str_mv AT weihuapan hicanceraccurateandcompletecancergenomephasingwithhicreads
AT deshenggong hicanceraccurateandcompletecancergenomephasingwithhicreads
AT dasun hicanceraccurateandcompletecancergenomephasingwithhicreads
AT haohuiluo hicanceraccurateandcompletecancergenomephasingwithhicreads
_version_ 1718393090493906944