Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes

ABSTRACT Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparati...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Robert A. Petit, Timothy D. Read
Formato: article
Lenguaje:EN
Publicado: American Society for Microbiology 2020
Materias:
Acceso en línea:https://doaj.org/article/a758e480a17348c4a30a7dfee665d185
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:a758e480a17348c4a30a7dfee665d185
record_format dspace
spelling oai:doaj.org-article:a758e480a17348c4a30a7dfee665d1852021-12-02T18:44:37ZBactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes10.1128/mSystems.00190-202379-5077https://doaj.org/article/a758e480a17348c4a30a7dfee665d1852020-08-01T00:00:00Zhttps://journals.asm.org/doi/10.1128/mSystems.00190-20https://doaj.org/toc/2379-5077ABSTRACT Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a data set setup step (Bactopia Data Sets [BaDs]), which creates a series of customizable data sets for the species of interest, the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly, and several other functions based on the available data sets and outputs the processed data to a structured directory format, and a series of Bactopia Tools (BaTs) that perform specific postprocessing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes, and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on Lactobacillus crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to ones including thousands of genomes and that allows for great flexibility in choosing comparison data sets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia. IMPORTANCE It is now relatively easy to obtain a high-quality draft genome sequence of a bacterium, but bioinformatic analysis requires organization and optimization of multiple open source software tools. We present Bactopia, a pipeline for bacterial genome analysis, as an option for processing bacterial genome data. Bactopia also automates downloading of data from multiple public sources and species-specific customization. Because the pipeline is written in the Nextflow language, analyses can be scaled from individual genomes on a local computer to thousands of genomes using cloud resources. As a usage example, we processed 1,664 Lactobacillus genomes from public sources and used comparative analysis workflows (Bactopia Tools) to identify and analyze members of the L. crispatus species.Robert A. PetitTimothy D. ReadAmerican Society for MicrobiologyarticleannotationassemblybacteriagenomicsLactobacillussoftwareMicrobiologyQR1-502ENmSystems, Vol 5, Iss 4 (2020)
institution DOAJ
collection DOAJ
language EN
topic annotation
assembly
bacteria
genomics
Lactobacillus
software
Microbiology
QR1-502
spellingShingle annotation
assembly
bacteria
genomics
Lactobacillus
software
Microbiology
QR1-502
Robert A. Petit
Timothy D. Read
Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes
description ABSTRACT Sequencing of bacterial genomes using Illumina technology has become such a standard procedure that often data are generated faster than can be conveniently analyzed. We created a new series of pipelines called Bactopia, built using Nextflow workflow software, to provide efficient comparative genomic analyses for bacterial species or genera. Bactopia consists of a data set setup step (Bactopia Data Sets [BaDs]), which creates a series of customizable data sets for the species of interest, the Bactopia Analysis Pipeline (BaAP), which performs quality control, genome assembly, and several other functions based on the available data sets and outputs the processed data to a structured directory format, and a series of Bactopia Tools (BaTs) that perform specific postprocessing on some or all of the processed data. BaTs include pan-genome analysis, computing average nucleotide identity between samples, extracting and profiling the 16S genes, and taxonomic classification using highly conserved genes. It is expected that the number of BaTs will increase to fill specific applications in the future. As a demonstration, we performed an analysis of 1,664 public Lactobacillus genomes, focusing on Lactobacillus crispatus, a species that is a common part of the human vaginal microbiome. Bactopia is an open source system that can scale from projects as small as one bacterial genome to ones including thousands of genomes and that allows for great flexibility in choosing comparison data sets and options for downstream analysis. Bactopia code can be accessed at https://www.github.com/bactopia/bactopia. IMPORTANCE It is now relatively easy to obtain a high-quality draft genome sequence of a bacterium, but bioinformatic analysis requires organization and optimization of multiple open source software tools. We present Bactopia, a pipeline for bacterial genome analysis, as an option for processing bacterial genome data. Bactopia also automates downloading of data from multiple public sources and species-specific customization. Because the pipeline is written in the Nextflow language, analyses can be scaled from individual genomes on a local computer to thousands of genomes using cloud resources. As a usage example, we processed 1,664 Lactobacillus genomes from public sources and used comparative analysis workflows (Bactopia Tools) to identify and analyze members of the L. crispatus species.
format article
author Robert A. Petit
Timothy D. Read
author_facet Robert A. Petit
Timothy D. Read
author_sort Robert A. Petit
title Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes
title_short Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes
title_full Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes
title_fullStr Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes
title_full_unstemmed Bactopia: a Flexible Pipeline for Complete Analysis of Bacterial Genomes
title_sort bactopia: a flexible pipeline for complete analysis of bacterial genomes
publisher American Society for Microbiology
publishDate 2020
url https://doaj.org/article/a758e480a17348c4a30a7dfee665d185
work_keys_str_mv AT robertapetit bactopiaaflexiblepipelineforcompleteanalysisofbacterialgenomes
AT timothydread bactopiaaflexiblepipelineforcompleteanalysisofbacterialgenomes
_version_ 1718377697090994176