Distance-based clustering challenges for unbiased benchmarking studies

Abstract Benchmark datasets with predefined cluster structures and high-dimensional biomedical datasets outline the challenges of cluster analysis: clustering algorithms are limited in their clustering ability in the presence of clusters defining distance-based structures resulting in a biased clust...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Michael C. Thrun
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/80c3566d0abb42cbb16bd430d6a3c752
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:80c3566d0abb42cbb16bd430d6a3c752
record_format dspace
spelling oai:doaj.org-article:80c3566d0abb42cbb16bd430d6a3c7522021-12-02T18:14:08ZDistance-based clustering challenges for unbiased benchmarking studies10.1038/s41598-021-98126-12045-2322https://doaj.org/article/80c3566d0abb42cbb16bd430d6a3c7522021-09-01T00:00:00Zhttps://doi.org/10.1038/s41598-021-98126-1https://doaj.org/toc/2045-2322Abstract Benchmark datasets with predefined cluster structures and high-dimensional biomedical datasets outline the challenges of cluster analysis: clustering algorithms are limited in their clustering ability in the presence of clusters defining distance-based structures resulting in a biased clustering solution. Data sets might not have cluster structures. Clustering yields arbitrary labels and often depends on the trial, leading to varying results. Moreover, recent research indicated that all partition comparison measures can yield the same results for different clustering solutions. Consequently, algorithm selection and parameter optimization by unsupervised quality measures (QM) are always biased and misleading. Only if the predefined structures happen to meet the particular clustering criterion and QM, can the clusters be recovered. Results are presented based on 41 open-source algorithms which are particularly useful in biomedical scenarios. Furthermore, comparative analysis with mirrored density plots provides a significantly more detailed benchmark than that with the typically used box plots or violin plots.Michael C. ThrunNature PortfolioarticleMedicineRScienceQENScientific Reports, Vol 11, Iss 1, Pp 1-12 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Michael C. Thrun
Distance-based clustering challenges for unbiased benchmarking studies
description Abstract Benchmark datasets with predefined cluster structures and high-dimensional biomedical datasets outline the challenges of cluster analysis: clustering algorithms are limited in their clustering ability in the presence of clusters defining distance-based structures resulting in a biased clustering solution. Data sets might not have cluster structures. Clustering yields arbitrary labels and often depends on the trial, leading to varying results. Moreover, recent research indicated that all partition comparison measures can yield the same results for different clustering solutions. Consequently, algorithm selection and parameter optimization by unsupervised quality measures (QM) are always biased and misleading. Only if the predefined structures happen to meet the particular clustering criterion and QM, can the clusters be recovered. Results are presented based on 41 open-source algorithms which are particularly useful in biomedical scenarios. Furthermore, comparative analysis with mirrored density plots provides a significantly more detailed benchmark than that with the typically used box plots or violin plots.
format article
author Michael C. Thrun
author_facet Michael C. Thrun
author_sort Michael C. Thrun
title Distance-based clustering challenges for unbiased benchmarking studies
title_short Distance-based clustering challenges for unbiased benchmarking studies
title_full Distance-based clustering challenges for unbiased benchmarking studies
title_fullStr Distance-based clustering challenges for unbiased benchmarking studies
title_full_unstemmed Distance-based clustering challenges for unbiased benchmarking studies
title_sort distance-based clustering challenges for unbiased benchmarking studies
publisher Nature Portfolio
publishDate 2021
url https://doaj.org/article/80c3566d0abb42cbb16bd430d6a3c752
work_keys_str_mv AT michaelcthrun distancebasedclusteringchallengesforunbiasedbenchmarkingstudies
_version_ 1718378462446616576