Adaptive kernel fuzzy clustering for missing data

Many machine learning procedures, including clustering analysis are often affected by missing values. This work aims to propose and evaluate a Kernel Fuzzy C-means clustering algorithm considering the kernelization of the metric with local adaptive distances (VKFCM-K-LP) under three types of strateg...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Anny K. G. Rodrigues, Raydonal Ospina, Marcelo R. P. Ferreira
Formato: article
Lenguaje:EN
Publicado: Public Library of Science (PLoS) 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/566a041732bf443083188d6298371576
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:566a041732bf443083188d6298371576
record_format dspace
spelling oai:doaj.org-article:566a041732bf443083188d62983715762021-11-25T06:10:58ZAdaptive kernel fuzzy clustering for missing data1932-6203https://doaj.org/article/566a041732bf443083188d62983715762021-01-01T00:00:00Zhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8589222/?tool=EBIhttps://doaj.org/toc/1932-6203Many machine learning procedures, including clustering analysis are often affected by missing values. This work aims to propose and evaluate a Kernel Fuzzy C-means clustering algorithm considering the kernelization of the metric with local adaptive distances (VKFCM-K-LP) under three types of strategies to deal with missing data. The first strategy, called Whole Data Strategy (WDS), performs clustering only on the complete part of the dataset, i.e. it discards all instances with missing data. The second approach uses the Partial Distance Strategy (PDS), in which partial distances are computed among all available resources and then re-scaled by the reciprocal of the proportion of observed values. The third technique, called Optimal Completion Strategy (OCS), computes missing values iteratively as auxiliary variables in the optimization of a suitable objective function. The clustering results were evaluated according to different metrics. The best performance of the clustering algorithm was achieved under the PDS and OCS strategies. Under the OCS approach, new datasets were derive and the missing values were estimated dynamically in the optimization process. The results of clustering under the OCS strategy also presented a superior performance when compared to the resulting clusters obtained by applying the VKFCM-K-LP algorithm on a version where missing values are previously imputed by the mean or the median of the observed values.Anny K. G. RodriguesRaydonal OspinaMarcelo R. P. FerreiraPublic Library of Science (PLoS)articleMedicineRScienceQENPLoS ONE, Vol 16, Iss 11 (2021)
institution DOAJ
collection DOAJ
language EN
topic Medicine
R
Science
Q
spellingShingle Medicine
R
Science
Q
Anny K. G. Rodrigues
Raydonal Ospina
Marcelo R. P. Ferreira
Adaptive kernel fuzzy clustering for missing data
description Many machine learning procedures, including clustering analysis are often affected by missing values. This work aims to propose and evaluate a Kernel Fuzzy C-means clustering algorithm considering the kernelization of the metric with local adaptive distances (VKFCM-K-LP) under three types of strategies to deal with missing data. The first strategy, called Whole Data Strategy (WDS), performs clustering only on the complete part of the dataset, i.e. it discards all instances with missing data. The second approach uses the Partial Distance Strategy (PDS), in which partial distances are computed among all available resources and then re-scaled by the reciprocal of the proportion of observed values. The third technique, called Optimal Completion Strategy (OCS), computes missing values iteratively as auxiliary variables in the optimization of a suitable objective function. The clustering results were evaluated according to different metrics. The best performance of the clustering algorithm was achieved under the PDS and OCS strategies. Under the OCS approach, new datasets were derive and the missing values were estimated dynamically in the optimization process. The results of clustering under the OCS strategy also presented a superior performance when compared to the resulting clusters obtained by applying the VKFCM-K-LP algorithm on a version where missing values are previously imputed by the mean or the median of the observed values.
format article
author Anny K. G. Rodrigues
Raydonal Ospina
Marcelo R. P. Ferreira
author_facet Anny K. G. Rodrigues
Raydonal Ospina
Marcelo R. P. Ferreira
author_sort Anny K. G. Rodrigues
title Adaptive kernel fuzzy clustering for missing data
title_short Adaptive kernel fuzzy clustering for missing data
title_full Adaptive kernel fuzzy clustering for missing data
title_fullStr Adaptive kernel fuzzy clustering for missing data
title_full_unstemmed Adaptive kernel fuzzy clustering for missing data
title_sort adaptive kernel fuzzy clustering for missing data
publisher Public Library of Science (PLoS)
publishDate 2021
url https://doaj.org/article/566a041732bf443083188d6298371576
work_keys_str_mv AT annykgrodrigues adaptivekernelfuzzyclusteringformissingdata
AT raydonalospina adaptivekernelfuzzyclusteringformissingdata
AT marcelorpferreira adaptivekernelfuzzyclusteringformissingdata
_version_ 1718414101711945728