Efficient iterative Hi-C scaffolder based on N-best neighbors

Abstract Background Efficient and effective genome scaffolding tools are still in high demand for generating reference-quality assemblies. While long read data itself is unlikely to create a chromosome-scale assembly for most eukaryotic species, the inexpensive Hi-C sequencing technology, capable of...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Dengfeng Guan, Shane A. McCarthy, Zemin Ning, Guohua Wang, Yadong Wang, Richard Durbin
Formato: article
Lenguaje:EN
Publicado: BMC 2021
Materias:
Acceso en línea:https://doaj.org/article/40599c4ac5b24cfa8f1891cdcffd80ae
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:40599c4ac5b24cfa8f1891cdcffd80ae
record_format dspace
spelling oai:doaj.org-article:40599c4ac5b24cfa8f1891cdcffd80ae2021-11-28T12:11:04ZEfficient iterative Hi-C scaffolder based on N-best neighbors10.1186/s12859-021-04453-51471-2105https://doaj.org/article/40599c4ac5b24cfa8f1891cdcffd80ae2021-11-01T00:00:00Zhttps://doi.org/10.1186/s12859-021-04453-5https://doaj.org/toc/1471-2105Abstract Background Efficient and effective genome scaffolding tools are still in high demand for generating reference-quality assemblies. While long read data itself is unlikely to create a chromosome-scale assembly for most eukaryotic species, the inexpensive Hi-C sequencing technology, capable of capturing the chromosomal profile of a genome, is now widely used to complete the task. However, the existing Hi-C based scaffolding tools either require a priori chromosome number as input, or lack the ability to build highly continuous scaffolds. Results We design and develop a novel Hi-C based scaffolding tool, pin_hic, which takes advantage of contact information from Hi-C reads to construct a scaffolding graph iteratively based on N-best neighbors of contigs. Subsequent to scaffolding, it identifies potential misjoins and breaks them to keep the scaffolding accuracy. Through our tests on three long read based de novo assemblies from three different species, we demonstrate that pin_hic is more efficient than current standard state-of-art tools, and it can generate much more continuous scaffolds, while achieving a higher or comparable accuracy. Conclusions Pin_hic is an efficient Hi-C based scaffolding tool, which can be useful for building chromosome-scale assemblies. As many sequencing projects have been launched in the recent years, we believe pin_hic has potential to be applied in these projects and makes a meaningful contribution.Dengfeng GuanShane A. McCarthyZemin NingGuohua WangYadong WangRichard DurbinBMCarticleHi-CScaffoldingComputer applications to medicine. Medical informaticsR858-859.7Biology (General)QH301-705.5ENBMC Bioinformatics, Vol 22, Iss 1, Pp 1-16 (2021)
institution DOAJ
collection DOAJ
language EN
topic Hi-C
Scaffolding
Computer applications to medicine. Medical informatics
R858-859.7
Biology (General)
QH301-705.5
spellingShingle Hi-C
Scaffolding
Computer applications to medicine. Medical informatics
R858-859.7
Biology (General)
QH301-705.5
Dengfeng Guan
Shane A. McCarthy
Zemin Ning
Guohua Wang
Yadong Wang
Richard Durbin
Efficient iterative Hi-C scaffolder based on N-best neighbors
description Abstract Background Efficient and effective genome scaffolding tools are still in high demand for generating reference-quality assemblies. While long read data itself is unlikely to create a chromosome-scale assembly for most eukaryotic species, the inexpensive Hi-C sequencing technology, capable of capturing the chromosomal profile of a genome, is now widely used to complete the task. However, the existing Hi-C based scaffolding tools either require a priori chromosome number as input, or lack the ability to build highly continuous scaffolds. Results We design and develop a novel Hi-C based scaffolding tool, pin_hic, which takes advantage of contact information from Hi-C reads to construct a scaffolding graph iteratively based on N-best neighbors of contigs. Subsequent to scaffolding, it identifies potential misjoins and breaks them to keep the scaffolding accuracy. Through our tests on three long read based de novo assemblies from three different species, we demonstrate that pin_hic is more efficient than current standard state-of-art tools, and it can generate much more continuous scaffolds, while achieving a higher or comparable accuracy. Conclusions Pin_hic is an efficient Hi-C based scaffolding tool, which can be useful for building chromosome-scale assemblies. As many sequencing projects have been launched in the recent years, we believe pin_hic has potential to be applied in these projects and makes a meaningful contribution.
format article
author Dengfeng Guan
Shane A. McCarthy
Zemin Ning
Guohua Wang
Yadong Wang
Richard Durbin
author_facet Dengfeng Guan
Shane A. McCarthy
Zemin Ning
Guohua Wang
Yadong Wang
Richard Durbin
author_sort Dengfeng Guan
title Efficient iterative Hi-C scaffolder based on N-best neighbors
title_short Efficient iterative Hi-C scaffolder based on N-best neighbors
title_full Efficient iterative Hi-C scaffolder based on N-best neighbors
title_fullStr Efficient iterative Hi-C scaffolder based on N-best neighbors
title_full_unstemmed Efficient iterative Hi-C scaffolder based on N-best neighbors
title_sort efficient iterative hi-c scaffolder based on n-best neighbors
publisher BMC
publishDate 2021
url https://doaj.org/article/40599c4ac5b24cfa8f1891cdcffd80ae
work_keys_str_mv AT dengfengguan efficientiterativehicscaffolderbasedonnbestneighbors
AT shaneamccarthy efficientiterativehicscaffolderbasedonnbestneighbors
AT zeminning efficientiterativehicscaffolderbasedonnbestneighbors
AT guohuawang efficientiterativehicscaffolderbasedonnbestneighbors
AT yadongwang efficientiterativehicscaffolderbasedonnbestneighbors
AT richarddurbin efficientiterativehicscaffolderbasedonnbestneighbors
_version_ 1718408133179604992