Assessment of metagenomic assembly using simulated next generation sequencing data.

Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators with...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Daniel R Mende, Alison S Waller, Shinichi Sunagawa, Aino I Järvelin, Michelle M Chan, Manimozhiyan Arumugam, Jeroen Raes, Peer Bork
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2012
Materias:
R
Q
Acceso en línea:https://doaj.org/article/b0b0ef110d5a498db462510b84130a34
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:b0b0ef110d5a498db462510b84130a34
record_format dspace
spelling oai:doaj.org-article:b0b0ef110d5a498db462510b84130a342021-11-18T07:26:57ZAssessment of metagenomic assembly using simulated next generation sequencing data.1932-620310.1371/journal.pone.0031386https://doaj.org/article/b0b0ef110d5a498db462510b84130a342012-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22384016/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes of differing community complexities. We first evaluated the effect of rigorous quality control on Illumina data. Although quality filtering removed a large proportion of the data, it greatly improved the accuracy and contig lengths of resulting assemblies. We then compared the quality-trimmed Illumina assemblies to those from Sanger and pyrosequencing. For the simple community (10 genomes) all sequencing technologies assembled a similar amount and accurately represented the expected functional composition. For the more complex community (100 genomes) Illumina produced the best assemblies and more correctly resembled the expected functional composition. For the most complex community (400 genomes) there was very little assembly of reads from any sequencing technology. However, due to the longer read length the Sanger reads still represented the overall functional composition reasonably well. We further examined the effect of scaffolding of contigs using paired-end Illumina reads. It dramatically increased contig lengths of the simple community and yielded minor improvements to the more complex communities. Although the increase in contig length was accompanied by increased chimericity, it resulted in more complete genes and a better characterization of the functional repertoire. The metagenomic simulators developed for this research are freely available.Daniel R MendeAlison S WallerShinichi SunagawaAino I JärvelinMichelle M ChanManimozhiyan ArumugamJeroen RaesPeer BorkPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 7, Iss 2, p e31386 (2012)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Daniel R Mende
Alison S Waller
Shinichi Sunagawa
Aino I Järvelin
Michelle M Chan
Manimozhiyan Arumugam
Jeroen Raes
Peer Bork
Assessment of metagenomic assembly using simulated next generation sequencing data.
description Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes of differing community complexities. We first evaluated the effect of rigorous quality control on Illumina data. Although quality filtering removed a large proportion of the data, it greatly improved the accuracy and contig lengths of resulting assemblies. We then compared the quality-trimmed Illumina assemblies to those from Sanger and pyrosequencing. For the simple community (10 genomes) all sequencing technologies assembled a similar amount and accurately represented the expected functional composition. For the more complex community (100 genomes) Illumina produced the best assemblies and more correctly resembled the expected functional composition. For the most complex community (400 genomes) there was very little assembly of reads from any sequencing technology. However, due to the longer read length the Sanger reads still represented the overall functional composition reasonably well. We further examined the effect of scaffolding of contigs using paired-end Illumina reads. It dramatically increased contig lengths of the simple community and yielded minor improvements to the more complex communities. Although the increase in contig length was accompanied by increased chimericity, it resulted in more complete genes and a better characterization of the functional repertoire. The metagenomic simulators developed for this research are freely available.
format article
author Daniel R Mende
Alison S Waller
Shinichi Sunagawa
Aino I Järvelin
Michelle M Chan
Manimozhiyan Arumugam
Jeroen Raes
Peer Bork
author_facet Daniel R Mende
Alison S Waller
Shinichi Sunagawa
Aino I Järvelin
Michelle M Chan
Manimozhiyan Arumugam
Jeroen Raes
Peer Bork
author_sort Daniel R Mende
title Assessment of metagenomic assembly using simulated next generation sequencing data.
title_short Assessment of metagenomic assembly using simulated next generation sequencing data.
title_full Assessment of metagenomic assembly using simulated next generation sequencing data.
title_fullStr Assessment of metagenomic assembly using simulated next generation sequencing data.
title_full_unstemmed Assessment of metagenomic assembly using simulated next generation sequencing data.
title_sort assessment of metagenomic assembly using simulated next generation sequencing data.
publisher Public Library of Science (PLoS)
publishDate 2012
url https://doaj.org/article/b0b0ef110d5a498db462510b84130a34
work_keys_str_mv AT danielrmende assessmentofmetagenomicassemblyusingsimulatednextgenerationsequencingdata
AT alisonswaller assessmentofmetagenomicassemblyusingsimulatednextgenerationsequencingdata
AT shinichisunagawa assessmentofmetagenomicassemblyusingsimulatednextgenerationsequencingdata
AT ainoijarvelin assessmentofmetagenomicassemblyusingsimulatednextgenerationsequencingdata
AT michellemchan assessmentofmetagenomicassemblyusingsimulatednextgenerationsequencingdata
AT manimozhiyanarumugam assessmentofmetagenomicassemblyusingsimulatednextgenerationsequencingdata
AT jeroenraes assessmentofmetagenomicassemblyusingsimulatednextgenerationsequencingdata
AT peerbork assessmentofmetagenomicassemblyusingsimulatednextgenerationsequencingdata
_version_ 1718423432194949120