Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP)

Abstract The evaluation of pharmacological data using machine learning requires high data quality. Therefore, data preprocessing, that is, cleaning analytical laboratory errors, replacing missing values or outliers, and transforming data adequately before actual data analysis, is crucial. Because cu...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Sebastian Malkusch, Lisa Hahnefeld, Robert Gurke, Jörn Lötsch
Formato: article
Lenguaje:EN
Publicado: Wiley 2021
Materias:
Acceso en línea:https://doaj.org/article/850b9423372b45f3bd26413a3af60024
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:850b9423372b45f3bd26413a3af60024
record_format dspace
spelling oai:doaj.org-article:850b9423372b45f3bd26413a3af600242021-11-15T18:41:53ZVisually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP)2163-830610.1002/psp4.12704https://doaj.org/article/850b9423372b45f3bd26413a3af600242021-11-01T00:00:00Zhttps://doi.org/10.1002/psp4.12704https://doaj.org/toc/2163-8306Abstract The evaluation of pharmacological data using machine learning requires high data quality. Therefore, data preprocessing, that is, cleaning analytical laboratory errors, replacing missing values or outliers, and transforming data adequately before actual data analysis, is crucial. Because current tools available for this purpose often require programming skills, preprocessing tools with graphical user interfaces that can be used interactively are needed. In collaboration between data scientists and experts in bioanalytical diagnostics, a graphical software package for data preprocessing called pguIMP is proposed, which contains a fixed sequence of preprocessing steps to enable reproducible interactive data preprocessing. As an R‐based package, it also allows direct integration into this data science environment without requiring any programming knowledge. The implementation of contemporary data processing methods, including machine‐learning‐based imputation techniques, ensures the generation of corrected and cleaned bioanalytical data sets that preserve data structures such as clusters better than is possible with classical methods. This was evaluated on bioanalytical data sets from lipidomics and drug research using k‐nearest‐neighbors‐based imputation followed by k‐means clustering and density‐based spatial clustering of applications with noise. The R package provides a Shiny‐based web interface designed to be easy to use for non–data analysis experts. It is demonstrated that the spectrum of methods provided is suitable as a standard pipeline for preprocessing bioanalytical data in biomedical research domains. The R package pguIMP is freely available at the comprehensive R archive network (https://cran.r‐project.org/web/packages/pguIMP/index.html).Sebastian MalkuschLisa HahnefeldRobert GurkeJörn LötschWileyarticleTherapeutics. PharmacologyRM1-950ENCPT: Pharmacometrics & Systems Pharmacology, Vol 10, Iss 11, Pp 1371-1381 (2021)
institution DOAJ
collection DOAJ
language EN
topic Therapeutics. Pharmacology
RM1-950
spellingShingle Therapeutics. Pharmacology
RM1-950
Sebastian Malkusch
Lisa Hahnefeld
Robert Gurke
Jörn Lötsch
Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP)
description Abstract The evaluation of pharmacological data using machine learning requires high data quality. Therefore, data preprocessing, that is, cleaning analytical laboratory errors, replacing missing values or outliers, and transforming data adequately before actual data analysis, is crucial. Because current tools available for this purpose often require programming skills, preprocessing tools with graphical user interfaces that can be used interactively are needed. In collaboration between data scientists and experts in bioanalytical diagnostics, a graphical software package for data preprocessing called pguIMP is proposed, which contains a fixed sequence of preprocessing steps to enable reproducible interactive data preprocessing. As an R‐based package, it also allows direct integration into this data science environment without requiring any programming knowledge. The implementation of contemporary data processing methods, including machine‐learning‐based imputation techniques, ensures the generation of corrected and cleaned bioanalytical data sets that preserve data structures such as clusters better than is possible with classical methods. This was evaluated on bioanalytical data sets from lipidomics and drug research using k‐nearest‐neighbors‐based imputation followed by k‐means clustering and density‐based spatial clustering of applications with noise. The R package provides a Shiny‐based web interface designed to be easy to use for non–data analysis experts. It is demonstrated that the spectrum of methods provided is suitable as a standard pipeline for preprocessing bioanalytical data in biomedical research domains. The R package pguIMP is freely available at the comprehensive R archive network (https://cran.r‐project.org/web/packages/pguIMP/index.html).
format article
author Sebastian Malkusch
Lisa Hahnefeld
Robert Gurke
Jörn Lötsch
author_facet Sebastian Malkusch
Lisa Hahnefeld
Robert Gurke
Jörn Lötsch
author_sort Sebastian Malkusch
title Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP)
title_short Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP)
title_full Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP)
title_fullStr Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP)
title_full_unstemmed Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP)
title_sort visually guided preprocessing of bioanalytical laboratory data using an interactive r notebook (pguimp)
publisher Wiley
publishDate 2021
url https://doaj.org/article/850b9423372b45f3bd26413a3af60024
work_keys_str_mv AT sebastianmalkusch visuallyguidedpreprocessingofbioanalyticallaboratorydatausinganinteractivernotebookpguimp
AT lisahahnefeld visuallyguidedpreprocessingofbioanalyticallaboratorydatausinganinteractivernotebookpguimp
AT robertgurke visuallyguidedpreprocessingofbioanalyticallaboratorydatausinganinteractivernotebookpguimp
AT jornlotsch visuallyguidedpreprocessingofbioanalyticallaboratorydatausinganinteractivernotebookpguimp
_version_ 1718426853671174144