An integrated pipeline for de novo assembly of microbial genomes.

Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Andrew Tritt, Jonathan A Eisen, Marc T Facciotti, Aaron E Darling
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2012
Materias:
R
Q
Acceso en línea:https://doaj.org/article/d171c67994c14df3adb5755987004490
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:d171c67994c14df3adb5755987004490
record_format dspace
spelling oai:doaj.org-article:d171c67994c14df3adb57559870044902021-11-18T07:05:45ZAn integrated pipeline for de novo assembly of microbial genomes.1932-620310.1371/journal.pone.0042304https://doaj.org/article/d171c67994c14df3adb57559870044902012-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/23028432/?tool=EBIhttps://doaj.org/toc/1932-6203Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron's Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage.Andrew TrittJonathan A EisenMarc T FacciottiAaron E DarlingPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 7, Iss 9, p e42304 (2012)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Andrew Tritt
Jonathan A Eisen
Marc T Facciotti
Aaron E Darling
An integrated pipeline for de novo assembly of microbial genomes.
description Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron's Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage.
format article
author Andrew Tritt
Jonathan A Eisen
Marc T Facciotti
Aaron E Darling
author_facet Andrew Tritt
Jonathan A Eisen
Marc T Facciotti
Aaron E Darling
author_sort Andrew Tritt
title An integrated pipeline for de novo assembly of microbial genomes.
title_short An integrated pipeline for de novo assembly of microbial genomes.
title_full An integrated pipeline for de novo assembly of microbial genomes.
title_fullStr An integrated pipeline for de novo assembly of microbial genomes.
title_full_unstemmed An integrated pipeline for de novo assembly of microbial genomes.
title_sort integrated pipeline for de novo assembly of microbial genomes.
publisher Public Library of Science (PLoS)
publishDate 2012
url https://doaj.org/article/d171c67994c14df3adb5755987004490
work_keys_str_mv AT andrewtritt anintegratedpipelinefordenovoassemblyofmicrobialgenomes
AT jonathanaeisen anintegratedpipelinefordenovoassemblyofmicrobialgenomes
AT marctfacciotti anintegratedpipelinefordenovoassemblyofmicrobialgenomes
AT aaronedarling anintegratedpipelinefordenovoassemblyofmicrobialgenomes
AT andrewtritt integratedpipelinefordenovoassemblyofmicrobialgenomes
AT jonathanaeisen integratedpipelinefordenovoassemblyofmicrobialgenomes
AT marctfacciotti integratedpipelinefordenovoassemblyofmicrobialgenomes
AT aaronedarling integratedpipelinefordenovoassemblyofmicrobialgenomes
_version_ 1718423944912961536