Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.

<h4>Background</h4>The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services h...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Samuel V Angiuoli, James R White, Malcolm Matalka, Owen White, W Florian Fricke
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2011
Materias:
R
Q
Acceso en línea:https://doaj.org/article/5ed3e6931a5a4900925e47768e9c78b7
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:5ed3e6931a5a4900925e47768e9c78b7
record_format dspace
spelling oai:doaj.org-article:5ed3e6931a5a4900925e47768e9c78b72021-11-18T07:36:13ZResources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.1932-620310.1371/journal.pone.0026624https://doaj.org/article/5ed3e6931a5a4900925e47768e9c78b72011-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22028928/?tool=EBIhttps://doaj.org/toc/1932-6203<h4>Background</h4>The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly.<h4>Results</h4>We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers.<h4>Conclusions</h4>Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.Samuel V AngiuoliJames R WhiteMalcolm MatalkaOwen WhiteW Florian FrickePublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 6, Iss 10, p e26624 (2011)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Samuel V Angiuoli
James R White
Malcolm Matalka
Owen White
W Florian Fricke
Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.
description <h4>Background</h4>The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly.<h4>Results</h4>We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers.<h4>Conclusions</h4>Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.
format article
author Samuel V Angiuoli
James R White
Malcolm Matalka
Owen White
W Florian Fricke
author_facet Samuel V Angiuoli
James R White
Malcolm Matalka
Owen White
W Florian Fricke
author_sort Samuel V Angiuoli
title Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.
title_short Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.
title_full Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.
title_fullStr Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.
title_full_unstemmed Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.
title_sort resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.
publisher Public Library of Science (PLoS)
publishDate 2011
url https://doaj.org/article/5ed3e6931a5a4900925e47768e9c78b7
work_keys_str_mv AT samuelvangiuoli resourcesandcostsformicrobialsequenceanalysisevaluatedusingvirtualmachinesandcloudcomputing
AT jamesrwhite resourcesandcostsformicrobialsequenceanalysisevaluatedusingvirtualmachinesandcloudcomputing
AT malcolmmatalka resourcesandcostsformicrobialsequenceanalysisevaluatedusingvirtualmachinesandcloudcomputing
AT owenwhite resourcesandcostsformicrobialsequenceanalysisevaluatedusingvirtualmachinesandcloudcomputing
AT wflorianfricke resourcesandcostsformicrobialsequenceanalysisevaluatedusingvirtualmachinesandcloudcomputing
_version_ 1718423224289591296