Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks
Background Whole genome sequencing (WGS) has gained increasing importance in responses to enteric bacterial outbreaks. Common analysis procedures for WGS, single nucleotide polymorphisms (SNPs) and genome assembly, are highly dependent upon WGS data quality. Methods Raw, unprocessed WGS reads from E...
Guardado en:
Autores principales: | , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
PeerJ Inc.
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/ca440bb680684ba09abb720b0731d66d |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:ca440bb680684ba09abb720b0731d66d |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:ca440bb680684ba09abb720b0731d66d2021-11-27T15:05:04ZEvaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks10.7717/peerj.124462167-8359https://doaj.org/article/ca440bb680684ba09abb720b0731d66d2021-11-01T00:00:00Zhttps://peerj.com/articles/12446.pdfhttps://peerj.com/articles/12446/https://doaj.org/toc/2167-8359Background Whole genome sequencing (WGS) has gained increasing importance in responses to enteric bacterial outbreaks. Common analysis procedures for WGS, single nucleotide polymorphisms (SNPs) and genome assembly, are highly dependent upon WGS data quality. Methods Raw, unprocessed WGS reads from Escherichia coli, Salmonella enterica, and Shigella sonnei outbreak clusters were characterized for four quality metrics: PHRED score, read length, library insert size, and ambiguous nucleotide composition. PHRED scores were strongly correlated with improved SNPs analysis results in E. coli and S. enterica clusters. Results Assembly quality showed only moderate correlations with PHRED scores and library insert size, and then only for Salmonella. To improve SNP analyses and assemblies, we compared seven read-healing pipelines to improve these four quality metrics and to see how well they improved SNP analysis and genome assembly. The most effective read healing pipelines for SNPs analysis incorporated quality-based trimming, fixed-width trimming, or both. The Lyve-SET SNPs pipeline showed a more marked improvement than the CFSAN SNP Pipeline, but the latter performed better on raw, unhealed reads. For genome assembly, SPAdes enabled significant improvements in healed E. coli reads only, while Skesa yielded no significant improvements on healed reads. Conclusions PHRED scores will continue to be a crucial quality metric albeit not of equal impact across all types of analyses for all enteric bacteria. While trimming-based read healing performed well for SNPs analyses, different read healing approaches are likely needed for genome assembly or other, emerging WGS analysis methodologies.Darlene D. WagnerHeather A. CarletonEija TreesLee S. KatzPeerJ Inc.articleSNPRead cleaningRead healingMultihealAssemblyMedicineRENPeerJ, Vol 9, p e12446 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
SNP Read cleaning Read healing Multiheal Assembly Medicine R |
spellingShingle |
SNP Read cleaning Read healing Multiheal Assembly Medicine R Darlene D. Wagner Heather A. Carleton Eija Trees Lee S. Katz Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks |
description |
Background Whole genome sequencing (WGS) has gained increasing importance in responses to enteric bacterial outbreaks. Common analysis procedures for WGS, single nucleotide polymorphisms (SNPs) and genome assembly, are highly dependent upon WGS data quality. Methods Raw, unprocessed WGS reads from Escherichia coli, Salmonella enterica, and Shigella sonnei outbreak clusters were characterized for four quality metrics: PHRED score, read length, library insert size, and ambiguous nucleotide composition. PHRED scores were strongly correlated with improved SNPs analysis results in E. coli and S. enterica clusters. Results Assembly quality showed only moderate correlations with PHRED scores and library insert size, and then only for Salmonella. To improve SNP analyses and assemblies, we compared seven read-healing pipelines to improve these four quality metrics and to see how well they improved SNP analysis and genome assembly. The most effective read healing pipelines for SNPs analysis incorporated quality-based trimming, fixed-width trimming, or both. The Lyve-SET SNPs pipeline showed a more marked improvement than the CFSAN SNP Pipeline, but the latter performed better on raw, unhealed reads. For genome assembly, SPAdes enabled significant improvements in healed E. coli reads only, while Skesa yielded no significant improvements on healed reads. Conclusions PHRED scores will continue to be a crucial quality metric albeit not of equal impact across all types of analyses for all enteric bacteria. While trimming-based read healing performed well for SNPs analyses, different read healing approaches are likely needed for genome assembly or other, emerging WGS analysis methodologies. |
format |
article |
author |
Darlene D. Wagner Heather A. Carleton Eija Trees Lee S. Katz |
author_facet |
Darlene D. Wagner Heather A. Carleton Eija Trees Lee S. Katz |
author_sort |
Darlene D. Wagner |
title |
Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks |
title_short |
Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks |
title_full |
Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks |
title_fullStr |
Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks |
title_full_unstemmed |
Evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks |
title_sort |
evaluating whole-genome sequencing quality metrics for enteric pathogen outbreaks |
publisher |
PeerJ Inc. |
publishDate |
2021 |
url |
https://doaj.org/article/ca440bb680684ba09abb720b0731d66d |
work_keys_str_mv |
AT darlenedwagner evaluatingwholegenomesequencingqualitymetricsforentericpathogenoutbreaks AT heatheracarleton evaluatingwholegenomesequencingqualitymetricsforentericpathogenoutbreaks AT eijatrees evaluatingwholegenomesequencingqualitymetricsforentericpathogenoutbreaks AT leeskatz evaluatingwholegenomesequencingqualitymetricsforentericpathogenoutbreaks |
_version_ |
1718408529652482048 |