phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes

ABSTRACT The small-subunit rRNA (SSU rRNA) gene is the key marker in molecular ecology for all domains of life, but it is largely absent from metagenome-assembled genomes that often are the only resource available for environmental microbes. Here, we present phyloFlash, a pipeline to overcome this g...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Harald R. Gruber-Vodicka, Brandon K. B. Seah, Elmar Pruesse
Formato: article
Lenguaje:EN
Publicado: American Society for Microbiology 2020
Materias:
SSU
Acceso en línea:https://doaj.org/article/62e65737d0f24c2bbc340a5ce90c8a30
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:62e65737d0f24c2bbc340a5ce90c8a30
record_format dspace
spelling oai:doaj.org-article:62e65737d0f24c2bbc340a5ce90c8a302021-12-02T19:47:36ZphyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes10.1128/mSystems.00920-202379-5077https://doaj.org/article/62e65737d0f24c2bbc340a5ce90c8a302020-10-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSystems.00920-20https://doaj.org/toc/2379-5077ABSTRACT The small-subunit rRNA (SSU rRNA) gene is the key marker in molecular ecology for all domains of life, but it is largely absent from metagenome-assembled genomes that often are the only resource available for environmental microbes. Here, we present phyloFlash, a pipeline to overcome this gap with rapid, SSU rRNA-centered taxonomic classification, targeted assembly, and graph-based binning of full metagenomic assemblies. We show that a cleanup of artifacts is pivotal even with a curated reference database. With such a filtered database, the general-purpose mapper BBmap extracts SSU rRNA reads five times faster than the rRNA-specialized tool SortMeRNA with similar sensitivity and higher selectivity on simulated metagenomes. Reference-based targeted assemblers yielded either highly fragmented assemblies or high levels of chimerism, so we employ the general-purpose genomic assembler SPAdes. Our optimized implementation is independent of reference database composition and has satisfactory levels of chimera formation. phyloFlash quickly processes Illumina (meta)genomic data, is straightforward to use, even as part of high-throughput quality control, and has user-friendly output reports. The software is available at https://github.com/HRGV/phyloFlash (GPL3 license) and is documented with an online manual. IMPORTANCE To track organisms across all domains of life, the SSU rRNA gene is the gold standard. Many environmental microbes are known only from high-throughput sequence data, but the SSU rRNA gene, the key to visualization by molecular probes and link to existing literature, is often missing from metagenome-assembled genomes (MAGs). The easy-to-use phyloFlash software suite tackles this gap with rapid, SSU rRNA-centered taxonomic classification, targeted assembly, and graph-based linking to MAGs. Starting from a cleaned reference database, phyloFlash profiles the taxonomic diversity and assembles the sorted SSU rRNA reads. The phyloFlash design is domain agnostic and covers eukaryotes, archaea, and bacteria alike. phyloFlash also provides utilities to visualize multisample comparisons and to integrate the recovered SSU rRNAs in a metagenomics workflow by linking them to MAGs using assembly graph parsing.Harald R. Gruber-VodickaBrandon K. B. SeahElmar PruesseAmerican Society for MicrobiologyarticleSSUgene assemblymetagenomicstaxonomic profilingMicrobiologyQR1-502ENmSystems, Vol 5, Iss 5 (2020)
institution DOAJ
collection DOAJ
language EN
topic SSU
gene assembly
metagenomics
taxonomic profiling
Microbiology
QR1-502
spellingShingle SSU
gene assembly
metagenomics
taxonomic profiling
Microbiology
QR1-502
Harald R. Gruber-Vodicka
Brandon K. B. Seah
Elmar Pruesse
phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes
description ABSTRACT The small-subunit rRNA (SSU rRNA) gene is the key marker in molecular ecology for all domains of life, but it is largely absent from metagenome-assembled genomes that often are the only resource available for environmental microbes. Here, we present phyloFlash, a pipeline to overcome this gap with rapid, SSU rRNA-centered taxonomic classification, targeted assembly, and graph-based binning of full metagenomic assemblies. We show that a cleanup of artifacts is pivotal even with a curated reference database. With such a filtered database, the general-purpose mapper BBmap extracts SSU rRNA reads five times faster than the rRNA-specialized tool SortMeRNA with similar sensitivity and higher selectivity on simulated metagenomes. Reference-based targeted assemblers yielded either highly fragmented assemblies or high levels of chimerism, so we employ the general-purpose genomic assembler SPAdes. Our optimized implementation is independent of reference database composition and has satisfactory levels of chimera formation. phyloFlash quickly processes Illumina (meta)genomic data, is straightforward to use, even as part of high-throughput quality control, and has user-friendly output reports. The software is available at https://github.com/HRGV/phyloFlash (GPL3 license) and is documented with an online manual. IMPORTANCE To track organisms across all domains of life, the SSU rRNA gene is the gold standard. Many environmental microbes are known only from high-throughput sequence data, but the SSU rRNA gene, the key to visualization by molecular probes and link to existing literature, is often missing from metagenome-assembled genomes (MAGs). The easy-to-use phyloFlash software suite tackles this gap with rapid, SSU rRNA-centered taxonomic classification, targeted assembly, and graph-based linking to MAGs. Starting from a cleaned reference database, phyloFlash profiles the taxonomic diversity and assembles the sorted SSU rRNA reads. The phyloFlash design is domain agnostic and covers eukaryotes, archaea, and bacteria alike. phyloFlash also provides utilities to visualize multisample comparisons and to integrate the recovered SSU rRNAs in a metagenomics workflow by linking them to MAGs using assembly graph parsing.
format article
author Harald R. Gruber-Vodicka
Brandon K. B. Seah
Elmar Pruesse
author_facet Harald R. Gruber-Vodicka
Brandon K. B. Seah
Elmar Pruesse
author_sort Harald R. Gruber-Vodicka
title phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes
title_short phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes
title_full phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes
title_fullStr phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes
title_full_unstemmed phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes
title_sort phyloflash: rapid small-subunit rrna profiling and targeted assembly from metagenomes
publisher American Society for Microbiology
publishDate 2020
url https://doaj.org/article/62e65737d0f24c2bbc340a5ce90c8a30
work_keys_str_mv AT haraldrgrubervodicka phyloflashrapidsmallsubunitrrnaprofilingandtargetedassemblyfrommetagenomes
AT brandonkbseah phyloflashrapidsmallsubunitrrnaprofilingandtargetedassemblyfrommetagenomes
AT elmarpruesse phyloflashrapidsmallsubunitrrnaprofilingandtargetedassemblyfrommetagenomes
_version_ 1718375964469100544