Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.

The large amount of biological data available in the current times, makes it necessary to use tools and applications based on sophisticated and efficient algorithms, developed in the area of bioinformatics. Further, access to high performance computing resources is necessary, to achieve results in r...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Maximilian Hanussek, Felix Bartusch, Jens Krüger
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
Acceso en línea:https://doaj.org/article/7a947ed7b5e046708e7eb5329b011793
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:7a947ed7b5e046708e7eb5329b011793
record_format dspace
spelling oai:doaj.org-article:7a947ed7b5e046708e7eb5329b0117932021-12-02T19:57:23ZPerformance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.1553-734X1553-735810.1371/journal.pcbi.1009244https://doaj.org/article/7a947ed7b5e046708e7eb5329b0117932021-07-01T00:00:00Zhttps://doi.org/10.1371/journal.pcbi.1009244https://doaj.org/toc/1553-734Xhttps://doaj.org/toc/1553-7358The large amount of biological data available in the current times, makes it necessary to use tools and applications based on sophisticated and efficient algorithms, developed in the area of bioinformatics. Further, access to high performance computing resources is necessary, to achieve results in reasonable time. To speed up applications and utilize available compute resources as efficient as possible, software developers make use of parallelization mechanisms, like multithreading. Many of the available tools in bioinformatics offer multithreading capabilities, but more compute power is not always helpful. In this study we investigated the behavior of well-known applications in bioinformatics, regarding their performance in the terms of scaling, different virtual environments and different datasets with our benchmarking tool suite BOOTABLE. The tool suite includes the tools BBMap, Bowtie2, BWA, Velvet, IDBA, SPAdes, Clustal Omega, MAFFT, SINA and GROMACS. In addition we added an application using the machine learning framework TensorFlow. Machine learning is not directly part of bioinformatics but applied to many biological problems, especially in the context of medical images (X-ray photographs). The mentioned tools have been analyzed in two different virtual environments, a virtual machine environment based on the OpenStack cloud software and in a Docker environment. The gained performance values were compared to a bare-metal setup and among each other. The study reveals, that the used virtual environments produce an overhead in the range of seven to twenty-five percent compared to the bare-metal environment. The scaling measurements showed, that some of the analyzed tools do not benefit from using larger amounts of computing resources, whereas others showed an almost linear scaling behavior. The findings of this study have been generalized as far as possible and should help users to find the best amount of resources for their analysis. Further, the results provide valuable information for resource providers to handle their resources as efficiently as possible and raise the user community's awareness of the efficient usage of computing resources.Maximilian HanussekFelix BartuschJens KrügerPublic Library of Science (PLoS)articleBiology (General)QH301-705.5ENPLoS Computational Biology, Vol 17, Iss 7, p e1009244 (2021)
institution DOAJ
collection DOAJ
language EN
topic Biology (General)
QH301-705.5
spellingShingle Biology (General)
QH301-705.5
Maximilian Hanussek
Felix Bartusch
Jens Krüger
Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.
description The large amount of biological data available in the current times, makes it necessary to use tools and applications based on sophisticated and efficient algorithms, developed in the area of bioinformatics. Further, access to high performance computing resources is necessary, to achieve results in reasonable time. To speed up applications and utilize available compute resources as efficient as possible, software developers make use of parallelization mechanisms, like multithreading. Many of the available tools in bioinformatics offer multithreading capabilities, but more compute power is not always helpful. In this study we investigated the behavior of well-known applications in bioinformatics, regarding their performance in the terms of scaling, different virtual environments and different datasets with our benchmarking tool suite BOOTABLE. The tool suite includes the tools BBMap, Bowtie2, BWA, Velvet, IDBA, SPAdes, Clustal Omega, MAFFT, SINA and GROMACS. In addition we added an application using the machine learning framework TensorFlow. Machine learning is not directly part of bioinformatics but applied to many biological problems, especially in the context of medical images (X-ray photographs). The mentioned tools have been analyzed in two different virtual environments, a virtual machine environment based on the OpenStack cloud software and in a Docker environment. The gained performance values were compared to a bare-metal setup and among each other. The study reveals, that the used virtual environments produce an overhead in the range of seven to twenty-five percent compared to the bare-metal environment. The scaling measurements showed, that some of the analyzed tools do not benefit from using larger amounts of computing resources, whereas others showed an almost linear scaling behavior. The findings of this study have been generalized as far as possible and should help users to find the best amount of resources for their analysis. Further, the results provide valuable information for resource providers to handle their resources as efficiently as possible and raise the user community's awareness of the efficient usage of computing resources.
format article
author Maximilian Hanussek
Felix Bartusch
Jens Krüger
author_facet Maximilian Hanussek
Felix Bartusch
Jens Krüger
author_sort Maximilian Hanussek
title Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.
title_short Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.
title_full Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.
title_fullStr Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.
title_full_unstemmed Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.
title_sort performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources.
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/7a947ed7b5e046708e7eb5329b011793
work_keys_str_mv AT maximilianhanussek performanceandscalingbehaviorofbioinformaticapplicationsinvirtualizationenvironmentstocreateawarenessfortheefficientuseofcomputeresources
AT felixbartusch performanceandscalingbehaviorofbioinformaticapplicationsinvirtualizationenvironmentstocreateawarenessfortheefficientuseofcomputeresources
AT jenskruger performanceandscalingbehaviorofbioinformaticapplicationsinvirtualizationenvironmentstocreateawarenessfortheefficientuseofcomputeresources
_version_ 1718375848475623424