qc3C: Reference-free quality control for Hi-C sequencing data.

Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Matthew Z DeMaere, Aaron E Darling
Formato:	article
Lenguaje:	EN
Publicado:	Public Library of Science (PLoS) 2021
Materias:	Biology (General) QH301-705.5
Acceso en línea:	https://doaj.org/article/fb8f07d6b62e4a17be3222af8fc8d2ca
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:fb8f07d6b62e4a17be3222af8fc8d2ca
record_format	dspace
spelling	oai:doaj.org-article:fb8f07d6b62e4a17be3222af8fc8d2ca2021-11-25T05:42:27Zqc3C: Reference-free quality control for Hi-C sequencing data.1553-734X1553-735810.1371/journal.pcbi.1008839https://doaj.org/article/fb8f07d6b62e4a17be3222af8fc8d2ca2021-10-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1008839https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, preparing a Hi-C library remains a complex laboratory protocol. To avoid costly failures and maximise the odds of successful outcomes, diligent quality management is recommended. Current wet-lab methods provide only a crude assay of Hi-C library quality, while key post-sequencing quality indicators used have-thus far-relied upon reference-based read-mapping. When a reference is accessible, this reliance introduces a concern for quality, where an incomplete or inexact reference skews the resulting quality indicators. We propose a new, reference-free approach that infers the total fraction of read-pairs that are a product of proximity ligation. This quantification of Hi-C library quality requires only a modest amount of sequencing data and is independent of other application-specific criteria. The algorithm builds upon the observation that proximity ligation events are likely to create k-mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods.Matthew Z DeMaereAaron E DarlingPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 10, p e1008839 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Biology (General) QH301-705.5
spellingShingle	Biology (General) QH301-705.5 Matthew Z DeMaere Aaron E Darling qc3C: Reference-free quality control for Hi-C sequencing data.
description	Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, preparing a Hi-C library remains a complex laboratory protocol. To avoid costly failures and maximise the odds of successful outcomes, diligent quality management is recommended. Current wet-lab methods provide only a crude assay of Hi-C library quality, while key post-sequencing quality indicators used have-thus far-relied upon reference-based read-mapping. When a reference is accessible, this reliance introduces a concern for quality, where an incomplete or inexact reference skews the resulting quality indicators. We propose a new, reference-free approach that infers the total fraction of read-pairs that are a product of proximity ligation. This quantification of Hi-C library quality requires only a modest amount of sequencing data and is independent of other application-specific criteria. The algorithm builds upon the observation that proximity ligation events are likely to create k-mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods.
format	article
author	Matthew Z DeMaere Aaron E Darling
author_facet	Matthew Z DeMaere Aaron E Darling
author_sort	Matthew Z DeMaere
title	qc3C: Reference-free quality control for Hi-C sequencing data.
title_short	qc3C: Reference-free quality control for Hi-C sequencing data.
title_full	qc3C: Reference-free quality control for Hi-C sequencing data.
title_fullStr	qc3C: Reference-free quality control for Hi-C sequencing data.
title_full_unstemmed	qc3C: Reference-free quality control for Hi-C sequencing data.
title_sort	qc3c: reference-free quality control for hi-c sequencing data.
publisher	Public Library of Science (PLoS)
publishDate	2021
url	https://doaj.org/article/fb8f07d6b62e4a17be3222af8fc8d2ca
work_keys_str_mv	AT matthewzdemaere qc3creferencefreequalitycontrolforhicsequencingdata AT aaronedarling qc3creferencefreequalitycontrolforhicsequencingdata
_version_	1718414556161638400

qc3C: Reference-free quality control for Hi-C sequencing data.

Ejemplares similares