Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach.

<h4>Background</h4>The quantity of transcriptome data is rapidly increasing for non-model organisms. As sequencing technology advances, focus shifts towards solving bioinformatic challenges, of which sequence read assembly is the first task. Recent studies have compared the performance o...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Marvin Mundry, Erich Bornberg-Bauer, Michael Sammeth, Philine G D Feulner
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2012
Materias:
R
Q
Acceso en línea:https://doaj.org/article/c413be02574b4a71961982f054b944cd
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:c413be02574b4a71961982f054b944cd
record_format dspace
spelling oai:doaj.org-article:c413be02574b4a71961982f054b944cd2021-11-18T07:26:44ZEvaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach.1932-620310.1371/journal.pone.0031410https://doaj.org/article/c413be02574b4a71961982f054b944cd2012-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22384018/?tool=EBIhttps://doaj.org/toc/1932-6203<h4>Background</h4>The quantity of transcriptome data is rapidly increasing for non-model organisms. As sequencing technology advances, focus shifts towards solving bioinformatic challenges, of which sequence read assembly is the first task. Recent studies have compared the performance of different software to establish a best practice for transcriptome assembly. Here, we adapted a simulation approach to evaluate specific features of assembly programs on 454 data. The novelty of our study is that the simulation allows us to calculate a model assembly as reference point for comparison.<h4>Findings</h4>The simulation approach allows us to compare basic metrics of assemblies computed by different software applications (CAP3, MIRA, Newbler, and Oases) to a known optimal solution. We found MIRA and CAP3 are conservative in merging reads. This resulted in comparably high number of short contigs. In contrast, Newbler more readily merged reads into longer contigs, while Oases produced the overall shortest assembly. Due to the simulation approach, reads could be traced back to their correct placement within the transcriptome. Together with mapping reads onto the assembled contigs, we were able to evaluate ambiguity in the assemblies. This analysis further supported the conservative nature of MIRA and CAP3, which resulted in low proportions of chimeric contigs, but high redundancy. Newbler produced less redundancy, but the proportion of chimeric contigs was higher.<h4>Conclusion</h4>Our evaluation of four assemblers suggested that MIRA and Newbler slightly outperformed the other programs, while showing contrasting characteristics. Oases did not perform very well on the 454 reads. Our evaluation indicated that the software was either conservative (MIRA) or liberal (Newbler) about merging reads into contigs. This suggested that in choosing an assembly program researchers should carefully consider their follow up analysis and consequences of the chosen approach to gain an assembly.Marvin MundryErich Bornberg-BauerMichael SammethPhiline G D FeulnerPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 7, Iss 2, p e31410 (2012)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Marvin Mundry
Erich Bornberg-Bauer
Michael Sammeth
Philine G D Feulner
Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach.
description <h4>Background</h4>The quantity of transcriptome data is rapidly increasing for non-model organisms. As sequencing technology advances, focus shifts towards solving bioinformatic challenges, of which sequence read assembly is the first task. Recent studies have compared the performance of different software to establish a best practice for transcriptome assembly. Here, we adapted a simulation approach to evaluate specific features of assembly programs on 454 data. The novelty of our study is that the simulation allows us to calculate a model assembly as reference point for comparison.<h4>Findings</h4>The simulation approach allows us to compare basic metrics of assemblies computed by different software applications (CAP3, MIRA, Newbler, and Oases) to a known optimal solution. We found MIRA and CAP3 are conservative in merging reads. This resulted in comparably high number of short contigs. In contrast, Newbler more readily merged reads into longer contigs, while Oases produced the overall shortest assembly. Due to the simulation approach, reads could be traced back to their correct placement within the transcriptome. Together with mapping reads onto the assembled contigs, we were able to evaluate ambiguity in the assemblies. This analysis further supported the conservative nature of MIRA and CAP3, which resulted in low proportions of chimeric contigs, but high redundancy. Newbler produced less redundancy, but the proportion of chimeric contigs was higher.<h4>Conclusion</h4>Our evaluation of four assemblers suggested that MIRA and Newbler slightly outperformed the other programs, while showing contrasting characteristics. Oases did not perform very well on the 454 reads. Our evaluation indicated that the software was either conservative (MIRA) or liberal (Newbler) about merging reads into contigs. This suggested that in choosing an assembly program researchers should carefully consider their follow up analysis and consequences of the chosen approach to gain an assembly.
format article
author Marvin Mundry
Erich Bornberg-Bauer
Michael Sammeth
Philine G D Feulner
author_facet Marvin Mundry
Erich Bornberg-Bauer
Michael Sammeth
Philine G D Feulner
author_sort Marvin Mundry
title Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach.
title_short Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach.
title_full Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach.
title_fullStr Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach.
title_full_unstemmed Evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach.
title_sort evaluating characteristics of de novo assembly software on 454 transcriptome data: a simulation approach.
publisher Public Library of Science (PLoS)
publishDate 2012
url https://doaj.org/article/c413be02574b4a71961982f054b944cd
work_keys_str_mv AT marvinmundry evaluatingcharacteristicsofdenovoassemblysoftwareon454transcriptomedataasimulationapproach
AT erichbornbergbauer evaluatingcharacteristicsofdenovoassemblysoftwareon454transcriptomedataasimulationapproach
AT michaelsammeth evaluatingcharacteristicsofdenovoassemblysoftwareon454transcriptomedataasimulationapproach
AT philinegdfeulner evaluatingcharacteristicsofdenovoassemblysoftwareon454transcriptomedataasimulationapproach
_version_ 1718423436704874496