CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.

<h4>Background</h4>Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workl...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Wei-Chun Chung, Chien-Chih Chen, Jan-Ming Ho, Chung-Yen Lin, Wen-Lian Hsu, Yu-Chun Wang, D T Lee, Feipei Lai, Chih-Wei Huang, Yu-Jung Chang
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2014
Materias:
R
Q
Acceso en línea:https://doaj.org/article/e7264bcae4304649936f0264f83955d0
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:e7264bcae4304649936f0264f83955d0
record_format dspace
spelling oai:doaj.org-article:e7264bcae4304649936f0264f83955d02021-11-18T08:17:07ZCloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.1932-620310.1371/journal.pone.0098146https://doaj.org/article/e7264bcae4304649936f0264f83955d02014-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24897343/pdf/?tool=EBIhttps://doaj.org/toc/1932-6203<h4>Background</h4>Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce.<h4>Results</h4>We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard.<h4>Conclusions</h4>CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark.<h4>Availability</h4>CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/.Wei-Chun ChungChien-Chih ChenJan-Ming HoChung-Yen LinWen-Lian HsuYu-Chun WangD T LeeFeipei LaiChih-Wei HuangYu-Jung ChangPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 9, Iss 6, p e98146 (2014)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Wei-Chun Chung
Chien-Chih Chen
Jan-Ming Ho
Chung-Yen Lin
Wen-Lian Hsu
Yu-Chun Wang
D T Lee
Feipei Lai
Chih-Wei Huang
Yu-Jung Chang
CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.
description <h4>Background</h4>Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/MapReduce has been successfully adopted in bioinformatics for genome assembly, mapping reads to genomes, and finding single nucleotide polymorphisms. Major cloud providers offer Hadoop cloud services to their users. However, it remains technically challenging to deploy a Hadoop cloud for those who prefer to run MapReduce programs in a cluster without built-in Hadoop/MapReduce.<h4>Results</h4>We present CloudDOE, a platform-independent software package implemented in Java. CloudDOE encapsulates technical details behind a user-friendly graphical interface, thus liberating scientists from having to perform complicated operational procedures. Users are guided through the user interface to deploy a Hadoop cloud within in-house computing environments and to run applications specifically targeted for bioinformatics, including CloudBurst, CloudBrush, and CloudRS. One may also use CloudDOE on top of a public cloud. CloudDOE consists of three wizards, i.e., Deploy, Operate, and Extend wizards. Deploy wizard is designed to aid the system administrator to deploy a Hadoop cloud. It installs Java runtime environment version 1.6 and Hadoop version 0.20.203, and initiates the service automatically. Operate wizard allows the user to run a MapReduce application on the dashboard list. To extend the dashboard list, the administrator may install a new MapReduce application using Extend wizard.<h4>Conclusions</h4>CloudDOE is a user-friendly tool for deploying a Hadoop cloud. Its smart wizards substantially reduce the complexity and costs of deployment, execution, enhancement, and management. Interested users may collaborate to improve the source code of CloudDOE to further incorporate more MapReduce bioinformatics tools into CloudDOE and support next-generation big data open source tools, e.g., Hadoop BigTop and Spark.<h4>Availability</h4>CloudDOE is distributed under Apache License 2.0 and is freely available at http://clouddoe.iis.sinica.edu.tw/.
format article
author Wei-Chun Chung
Chien-Chih Chen
Jan-Ming Ho
Chung-Yen Lin
Wen-Lian Hsu
Yu-Chun Wang
D T Lee
Feipei Lai
Chih-Wei Huang
Yu-Jung Chang
author_facet Wei-Chun Chung
Chien-Chih Chen
Jan-Ming Ho
Chung-Yen Lin
Wen-Lian Hsu
Yu-Chun Wang
D T Lee
Feipei Lai
Chih-Wei Huang
Yu-Jung Chang
author_sort Wei-Chun Chung
title CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.
title_short CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.
title_full CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.
title_fullStr CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.
title_full_unstemmed CloudDOE: a user-friendly tool for deploying Hadoop clouds and analyzing high-throughput sequencing data with MapReduce.
title_sort clouddoe: a user-friendly tool for deploying hadoop clouds and analyzing high-throughput sequencing data with mapreduce.
publisher Public Library of Science (PLoS)
publishDate 2014
url https://doaj.org/article/e7264bcae4304649936f0264f83955d0
work_keys_str_mv AT weichunchung clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce
AT chienchihchen clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce
AT janmingho clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce
AT chungyenlin clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce
AT wenlianhsu clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce
AT yuchunwang clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce
AT dtlee clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce
AT feipeilai clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce
AT chihweihuang clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce
AT yujungchang clouddoeauserfriendlytoolfordeployinghadoopcloudsandanalyzinghighthroughputsequencingdatawithmapreduce
_version_ 1718422018745958400