Research on parallel data processing of data mining platform in the background of cloud computing

The efficient processing of large-scale data has very important practical value. In this study, a data mining platform based on Hadoop distributed file system was designed, and then K-means algorithm was improved with the idea of max-min distance. On Hadoop distributed file system platform, the para...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Bu Lingrui, Zhang Hui, Xing Haiyan, Wu Lijun
Formato: article
Lenguaje:EN
Publicado: De Gruyter 2021
Materias:
Q
Acceso en línea:https://doaj.org/article/b02a41ca611640df84daad8271d7b978
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:b02a41ca611640df84daad8271d7b978
record_format dspace
spelling oai:doaj.org-article:b02a41ca611640df84daad8271d7b9782021-12-05T14:10:51ZResearch on parallel data processing of data mining platform in the background of cloud computing0334-18602191-026X10.1515/jisys-2020-0113https://doaj.org/article/b02a41ca611640df84daad8271d7b9782021-02-01T00:00:00Zhttps://doi.org/10.1515/jisys-2020-0113https://doaj.org/toc/0334-1860https://doaj.org/toc/2191-026XThe efficient processing of large-scale data has very important practical value. In this study, a data mining platform based on Hadoop distributed file system was designed, and then K-means algorithm was improved with the idea of max-min distance. On Hadoop distributed file system platform, the parallelization was realized by MapReduce. Finally, the data processing effect of the algorithm was analyzed with Iris data set. The results showed that the parallel algorithm divided more correct samples than the traditional algorithm; in the single-machine environment, the parallel algorithm ran longer; in the face of large data sets, the traditional algorithm had insufficient memory, but the parallel algorithm completed the calculation task; the acceleration ratio of the parallel algorithm was raised with the expansion of cluster size and data set size, showing a good parallel effect. The experimental results verifies the reliability of parallel algorithm in big data processing, which makes some contributions to further improve the efficiency of data mining.Bu LingruiZhang HuiXing HaiyanWu LijunDe Gruyterarticlecloud computingdata miningparallel processinghadoop platformclustering algorithmScienceQElectronic computers. Computer scienceQA75.5-76.95ENJournal of Intelligent Systems, Vol 30, Iss 1, Pp 479-486 (2021)
institution DOAJ
collection DOAJ
language EN
topic cloud computing
data mining
parallel processing
hadoop platform
clustering algorithm
Science
Q
Electronic computers. Computer science
QA75.5-76.95
spellingShingle cloud computing
data mining
parallel processing
hadoop platform
clustering algorithm
Science
Q
Electronic computers. Computer science
QA75.5-76.95
Bu Lingrui
Zhang Hui
Xing Haiyan
Wu Lijun
Research on parallel data processing of data mining platform in the background of cloud computing
description The efficient processing of large-scale data has very important practical value. In this study, a data mining platform based on Hadoop distributed file system was designed, and then K-means algorithm was improved with the idea of max-min distance. On Hadoop distributed file system platform, the parallelization was realized by MapReduce. Finally, the data processing effect of the algorithm was analyzed with Iris data set. The results showed that the parallel algorithm divided more correct samples than the traditional algorithm; in the single-machine environment, the parallel algorithm ran longer; in the face of large data sets, the traditional algorithm had insufficient memory, but the parallel algorithm completed the calculation task; the acceleration ratio of the parallel algorithm was raised with the expansion of cluster size and data set size, showing a good parallel effect. The experimental results verifies the reliability of parallel algorithm in big data processing, which makes some contributions to further improve the efficiency of data mining.
format article
author Bu Lingrui
Zhang Hui
Xing Haiyan
Wu Lijun
author_facet Bu Lingrui
Zhang Hui
Xing Haiyan
Wu Lijun
author_sort Bu Lingrui
title Research on parallel data processing of data mining platform in the background of cloud computing
title_short Research on parallel data processing of data mining platform in the background of cloud computing
title_full Research on parallel data processing of data mining platform in the background of cloud computing
title_fullStr Research on parallel data processing of data mining platform in the background of cloud computing
title_full_unstemmed Research on parallel data processing of data mining platform in the background of cloud computing
title_sort research on parallel data processing of data mining platform in the background of cloud computing
publisher De Gruyter
publishDate 2021
url https://doaj.org/article/b02a41ca611640df84daad8271d7b978
work_keys_str_mv AT bulingrui researchonparalleldataprocessingofdataminingplatforminthebackgroundofcloudcomputing
AT zhanghui researchonparalleldataprocessingofdataminingplatforminthebackgroundofcloudcomputing
AT xinghaiyan researchonparalleldataprocessingofdataminingplatforminthebackgroundofcloudcomputing
AT wulijun researchonparalleldataprocessingofdataminingplatforminthebackgroundofcloudcomputing
_version_ 1718371683603054592