Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks

Background Whole genome sequencing (WGS) has gained increasing importance in responses to enteric bacterial outbreaks. Common analysis procedures for WGS, single nucleotide polymorphisms (SNPs) and genome assembly, are highly dependent upon WGS data quality. Methods Raw, unprocessed WGS reads from E...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Darlene D. Wagner, Heather A. Carleton, Eija Trees, Lee S. Katz
Formato: article
Lenguaje:EN
Publicado: PeerJ Inc. 2021
Materias:
SNP
R
Acceso en línea:https://doaj.org/article/ca440bb680684ba09abb720b0731d66d
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:ca440bb680684ba09abb720b0731d66d
record_format dspace
spelling oai:doaj.org-article:ca440bb680684ba09abb720b0731d66d2021-11-27T15:05:04ZEvaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks10.7717/peerj.124462167-8359https://doaj.org/article/ca440bb680684ba09abb720b0731d66d2021-11-01T00:00:00Zhttps://peerj.com/articles/12446.pdfhttps://peerj.com/articles/12446/https://doaj.org/toc/2167-8359Background Whole genome sequencing (WGS) has gained increasing importance in responses to enteric bacterial outbreaks. Common analysis procedures for WGS, single nucleotide polymorphisms (SNPs) and genome assembly, are highly dependent upon WGS data quality. Methods Raw, unprocessed WGS reads from Escherichia coli, Salmonella enterica, and Shigella sonnei outbreak clusters were characterized for four quality metrics: PHRED score, read length, library insert size, and ambiguous nucleotide composition. PHRED scores were strongly correlated with improved SNPs analysis results in E. coli and S. enterica clusters. Results Assembly quality showed only moderate correlations with PHRED scores and library insert size, and then only for Salmonella. To improve SNP analyses and assemblies, we compared seven read-healing pipelines to improve these four quality metrics and to see how well they improved SNP analysis and genome assembly. The most effective read healing pipelines for SNPs analysis incorporated quality-based trimming, fixed-width trimming, or both. The Lyve-SET SNPs pipeline showed a more marked improvement than the CFSAN SNP Pipeline, but the latter performed better on raw, unhealed reads. For genome assembly, SPAdes enabled significant improvements in healed E. coli reads only, while Skesa yielded no significant improvements on healed reads. Conclusions PHRED scores will continue to be a crucial quality metric albeit not of equal impact across all types of analyses for all enteric bacteria. While trimming-based read healing performed well for SNPs analyses, different read healing approaches are likely needed for genome assembly or other, emerging WGS analysis methodologies.Darlene D. WagnerHeather A. CarletonEija TreesLee S. KatzPeerJ Inc.articleSNPRead cleaningRead healingMultihealAssemblyMedicineRENPeerJ, Vol 9, p e12446 (2021)
institution DOAJ
collection DOAJ
language EN
topic SNP
Read cleaning
Read healing
Multiheal
Assembly
Medicine
R
spellingShingle SNP
Read cleaning
Read healing
Multiheal
Assembly
Medicine
R
Darlene D. Wagner
Heather A. Carleton
Eija Trees
Lee S. Katz
Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks
description Background Whole genome sequencing (WGS) has gained increasing importance in responses to enteric bacterial outbreaks. Common analysis procedures for WGS, single nucleotide polymorphisms (SNPs) and genome assembly, are highly dependent upon WGS data quality. Methods Raw, unprocessed WGS reads from Escherichia coli, Salmonella enterica, and Shigella sonnei outbreak clusters were characterized for four quality metrics: PHRED score, read length, library insert size, and ambiguous nucleotide composition. PHRED scores were strongly correlated with improved SNPs analysis results in E. coli and S. enterica clusters. Results Assembly quality showed only moderate correlations with PHRED scores and library insert size, and then only for Salmonella. To improve SNP analyses and assemblies, we compared seven read-healing pipelines to improve these four quality metrics and to see how well they improved SNP analysis and genome assembly. The most effective read healing pipelines for SNPs analysis incorporated quality-based trimming, fixed-width trimming, or both. The Lyve-SET SNPs pipeline showed a more marked improvement than the CFSAN SNP Pipeline, but the latter performed better on raw, unhealed reads. For genome assembly, SPAdes enabled significant improvements in healed E. coli reads only, while Skesa yielded no significant improvements on healed reads. Conclusions PHRED scores will continue to be a crucial quality metric albeit not of equal impact across all types of analyses for all enteric bacteria. While trimming-based read healing performed well for SNPs analyses, different read healing approaches are likely needed for genome assembly or other, emerging WGS analysis methodologies.
format article
author Darlene D. Wagner
Heather A. Carleton
Eija Trees
Lee S. Katz
author_facet Darlene D. Wagner
Heather A. Carleton
Eija Trees
Lee S. Katz
author_sort Darlene D. Wagner
title Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks
title_short Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks
title_full Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks
title_fullStr Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks
title_full_unstemmed Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks
title_sort evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks
publisher PeerJ Inc.
publishDate 2021
url https://doaj.org/article/ca440bb680684ba09abb720b0731d66d
work_keys_str_mv AT darlenedwagner evaluatingwholegenomesequencingqualitymetricsforentericpathogenoutbreaks
AT heatheracarleton evaluatingwholegenomesequencingqualitymetricsforentericpathogenoutbreaks
AT eijatrees evaluatingwholegenomesequencingqualitymetricsforentericpathogenoutbreaks
AT leeskatz evaluatingwholegenomesequencingqualitymetricsforentericpathogenoutbreaks
_version_ 1718408529652482048