To Dereplicate or Not To Dereplicate?

ABSTRACT Metagenome-assembled genomes (MAGs) expand our understanding of microbial diversity, evolution, and ecology. Concerns have been raised on how sequencing, assembly, binning, and quality assessment tools may result in MAGs that do not reflect single populations in nature. Here, we reflect on...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Jacob T. Evans, Vincent J. Denef
Formato: article
Lenguaje:EN
Publicado: American Society for Microbiology 2020
Materias:
MAG
Acceso en línea:https://doaj.org/article/59970f89bc9b44a7abe2121e3000f68c
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:59970f89bc9b44a7abe2121e3000f68c
record_format dspace
spelling oai:doaj.org-article:59970f89bc9b44a7abe2121e3000f68c2021-11-15T15:30:14ZTo Dereplicate or Not To Dereplicate?10.1128/mSphere.00971-192379-5042https://doaj.org/article/59970f89bc9b44a7abe2121e3000f68c2020-06-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSphere.00971-19https://doaj.org/toc/2379-5042ABSTRACT Metagenome-assembled genomes (MAGs) expand our understanding of microbial diversity, evolution, and ecology. Concerns have been raised on how sequencing, assembly, binning, and quality assessment tools may result in MAGs that do not reflect single populations in nature. Here, we reflect on another issue, i.e., how to handle highly similar MAGs assembled from independent data sets. Obtaining multiple genomic representatives for a species is highly valuable, as it allows for population genomic analyses; however, when retaining genomes of closely related populations, it complicates MAG quality assessment and abundance inferences. We show that (i) published data sets contain a large fraction of MAGs sharing >99% average nucleotide identity, (ii) different software packages and parameters used to resolve this redundancy remove very different numbers of MAGs, and (iii) the removal of closely related genomes leads to losses of population-specific auxiliary genes. Finally, we highlight some approaches that can infer strain-specific dynamics across a sample series without dereplication.Jacob T. EvansVincent J. DenefAmerican Society for MicrobiologyarticleMAGbinningdereplicationmetagenomicspopulation genomicssoftwareMicrobiologyQR1-502ENmSphere, Vol 5, Iss 3 (2020)
institution DOAJ
collection DOAJ
language EN
topic MAG
binning
dereplication
metagenomics
population genomics
software
Microbiology
QR1-502
spellingShingle MAG
binning
dereplication
metagenomics
population genomics
software
Microbiology
QR1-502
Jacob T. Evans
Vincent J. Denef
To Dereplicate or Not To Dereplicate?
description ABSTRACT Metagenome-assembled genomes (MAGs) expand our understanding of microbial diversity, evolution, and ecology. Concerns have been raised on how sequencing, assembly, binning, and quality assessment tools may result in MAGs that do not reflect single populations in nature. Here, we reflect on another issue, i.e., how to handle highly similar MAGs assembled from independent data sets. Obtaining multiple genomic representatives for a species is highly valuable, as it allows for population genomic analyses; however, when retaining genomes of closely related populations, it complicates MAG quality assessment and abundance inferences. We show that (i) published data sets contain a large fraction of MAGs sharing >99% average nucleotide identity, (ii) different software packages and parameters used to resolve this redundancy remove very different numbers of MAGs, and (iii) the removal of closely related genomes leads to losses of population-specific auxiliary genes. Finally, we highlight some approaches that can infer strain-specific dynamics across a sample series without dereplication.
format article
author Jacob T. Evans
Vincent J. Denef
author_facet Jacob T. Evans
Vincent J. Denef
author_sort Jacob T. Evans
title To Dereplicate or Not To Dereplicate?
title_short To Dereplicate or Not To Dereplicate?
title_full To Dereplicate or Not To Dereplicate?
title_fullStr To Dereplicate or Not To Dereplicate?
title_full_unstemmed To Dereplicate or Not To Dereplicate?
title_sort to dereplicate or not to dereplicate?
publisher American Society for Microbiology
publishDate 2020
url https://doaj.org/article/59970f89bc9b44a7abe2121e3000f68c
work_keys_str_mv AT jacobtevans todereplicateornottodereplicate
AT vincentjdenef todereplicateornottodereplicate
_version_ 1718427886574108672