qc3C: Reference-free quality control for Hi-C sequencing data.

Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Matthew Z DeMaere, Aaron E Darling
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
Acceso en línea:https://doaj.org/article/fb8f07d6b62e4a17be3222af8fc8d2ca
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:fb8f07d6b62e4a17be3222af8fc8d2ca
record_format dspace
spelling oai:doaj.org-article:fb8f07d6b62e4a17be3222af8fc8d2ca2021-11-25T05:42:27Zqc3C: Reference-free quality control for Hi-C sequencing data.1553-734X1553-735810.1371/journal.pcbi.1008839https://doaj.org/article/fb8f07d6b62e4a17be3222af8fc8d2ca2021-10-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1008839https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, preparing a Hi-C library remains a complex laboratory protocol. To avoid costly failures and maximise the odds of successful outcomes, diligent quality management is recommended. Current wet-lab methods provide only a crude assay of Hi-C library quality, while key post-sequencing quality indicators used have-thus far-relied upon reference-based read-mapping. When a reference is accessible, this reliance introduces a concern for quality, where an incomplete or inexact reference skews the resulting quality indicators. We propose a new, reference-free approach that infers the total fraction of read-pairs that are a product of proximity ligation. This quantification of Hi-C library quality requires only a modest amount of sequencing data and is independent of other application-specific criteria. The algorithm builds upon the observation that proximity ligation events are likely to create k-mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods.Matthew Z DeMaereAaron E DarlingPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 10, p e1008839 (2021)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Matthew Z DeMaere
Aaron E Darling
qc3C: Reference-free quality control for Hi-C sequencing data.
description Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, preparing a Hi-C library remains a complex laboratory protocol. To avoid costly failures and maximise the odds of successful outcomes, diligent quality management is recommended. Current wet-lab methods provide only a crude assay of Hi-C library quality, while key post-sequencing quality indicators used have-thus far-relied upon reference-based read-mapping. When a reference is accessible, this reliance introduces a concern for quality, where an incomplete or inexact reference skews the resulting quality indicators. We propose a new, reference-free approach that infers the total fraction of read-pairs that are a product of proximity ligation. This quantification of Hi-C library quality requires only a modest amount of sequencing data and is independent of other application-specific criteria. The algorithm builds upon the observation that proximity ligation events are likely to create k-mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods.
format article
author Matthew Z DeMaere
Aaron E Darling
author_facet Matthew Z DeMaere
Aaron E Darling
author_sort Matthew Z DeMaere
title qc3C: Reference-free quality control for Hi-C sequencing data.
title_short qc3C: Reference-free quality control for Hi-C sequencing data.
title_full qc3C: Reference-free quality control for Hi-C sequencing data.
title_fullStr qc3C: Reference-free quality control for Hi-C sequencing data.
title_full_unstemmed qc3C: Reference-free quality control for Hi-C sequencing data.
title_sort qc3c: reference-free quality control for hi-c sequencing data.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/fb8f07d6b62e4a17be3222af8fc8d2ca
work_keys_str_mv AT matthewzdemaere qc3creferencefreequalitycontrolforhicsequencingdata
AT aaronedarling qc3creferencefreequalitycontrolforhicsequencingdata
_version_ 1718414556161638400