Evaluation of serverless computing for scalable execution of a joint variant calling workflow.

Advances in whole-genome sequencing have greatly reduced the cost and time of obtaining raw genetic information, but the computational requirements of analysis remain a challenge. Serverless computing has emerged as an alternative to using dedicated compute resources, but its utility has not been wi...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Aji John, Kathleen Muenzen, Kristiina Ausmees
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/2b72b067a19a473b82e50d3191c4398e
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:2b72b067a19a473b82e50d3191c4398e
record_format dspace
spelling oai:doaj.org-article:2b72b067a19a473b82e50d3191c4398e2021-12-02T20:15:31ZEvaluation of serverless computing for scalable execution of a joint variant calling workflow.1932-620310.1371/journal.pone.0254363https://doaj.org/article/2b72b067a19a473b82e50d3191c4398e2021-01-01T00:00:00Zhttps://doi.org/10.1371/journal.pone.0254363https://doaj.org/toc/1932-6203Advances in whole-genome sequencing have greatly reduced the cost and time of obtaining raw genetic information, but the computational requirements of analysis remain a challenge. Serverless computing has emerged as an alternative to using dedicated compute resources, but its utility has not been widely evaluated for standardized genomic workflows. In this study, we define and execute a best-practice joint variant calling workflow using the SWEEP workflow management system. We present an analysis of performance and scalability, and discuss the utility of the serverless paradigm for executing workflows in the field of genomics research. The GATK best-practice short germline joint variant calling pipeline was implemented as a SWEEP workflow comprising 18 tasks. The workflow was executed on Illumina paired-end read samples from the European and African super populations of the 1000 Genomes project phase III. Cost and runtime increased linearly with increasing sample size, although runtime was driven primarily by a single task for larger problem sizes. Execution took a minimum of around 3 hours for 2 samples, up to nearly 13 hours for 62 samples, with costs ranging from $2 to $70.Aji JohnKathleen MuenzenKristiina AusmeesPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 7, p e0254363 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Aji John
Kathleen Muenzen
Kristiina Ausmees
Evaluation of serverless computing for scalable execution of a joint variant calling workflow.
description Advances in whole-genome sequencing have greatly reduced the cost and time of obtaining raw genetic information, but the computational requirements of analysis remain a challenge. Serverless computing has emerged as an alternative to using dedicated compute resources, but its utility has not been widely evaluated for standardized genomic workflows. In this study, we define and execute a best-practice joint variant calling workflow using the SWEEP workflow management system. We present an analysis of performance and scalability, and discuss the utility of the serverless paradigm for executing workflows in the field of genomics research. The GATK best-practice short germline joint variant calling pipeline was implemented as a SWEEP workflow comprising 18 tasks. The workflow was executed on Illumina paired-end read samples from the European and African super populations of the 1000 Genomes project phase III. Cost and runtime increased linearly with increasing sample size, although runtime was driven primarily by a single task for larger problem sizes. Execution took a minimum of around 3 hours for 2 samples, up to nearly 13 hours for 62 samples, with costs ranging from $2 to $70.
format article
author Aji John
Kathleen Muenzen
Kristiina Ausmees
author_facet Aji John
Kathleen Muenzen
Kristiina Ausmees
author_sort Aji John
title Evaluation of serverless computing for scalable execution of a joint variant calling workflow.
title_short Evaluation of serverless computing for scalable execution of a joint variant calling workflow.
title_full Evaluation of serverless computing for scalable execution of a joint variant calling workflow.
title_fullStr Evaluation of serverless computing for scalable execution of a joint variant calling workflow.
title_full_unstemmed Evaluation of serverless computing for scalable execution of a joint variant calling workflow.
title_sort evaluation of serverless computing for scalable execution of a joint variant calling workflow.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/2b72b067a19a473b82e50d3191c4398e
work_keys_str_mv AT ajijohn evaluationofserverlesscomputingforscalableexecutionofajointvariantcallingworkflow
AT kathleenmuenzen evaluationofserverlesscomputingforscalableexecutionofajointvariantcallingworkflow
AT kristiinaausmees evaluationofserverlesscomputingforscalableexecutionofajointvariantcallingworkflow
_version_ 1718374572341854208