Research on parallel data processing of data mining platform in the background of cloud computing
The efficient processing of large-scale data has very important practical value. In this study, a data mining platform based on Hadoop distributed file system was designed, and then K-means algorithm was improved with the idea of max-min distance. On Hadoop distributed file system platform, the para...
Guardado en:
Autores principales: | , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
De Gruyter
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/b02a41ca611640df84daad8271d7b978 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:b02a41ca611640df84daad8271d7b978 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:b02a41ca611640df84daad8271d7b9782021-12-05T14:10:51ZResearch on parallel data processing of data mining platform in the background of cloud computing0334-18602191-026X10.1515/jisys-2020-0113https://doaj.org/article/b02a41ca611640df84daad8271d7b9782021-02-01T00:00:00Zhttps://doi.org/10.1515/jisys-2020-0113https://doaj.org/toc/0334-1860https://doaj.org/toc/2191-026XThe efficient processing of large-scale data has very important practical value. In this study, a data mining platform based on Hadoop distributed file system was designed, and then K-means algorithm was improved with the idea of max-min distance. On Hadoop distributed file system platform, the parallelization was realized by MapReduce. Finally, the data processing effect of the algorithm was analyzed with Iris data set. The results showed that the parallel algorithm divided more correct samples than the traditional algorithm; in the single-machine environment, the parallel algorithm ran longer; in the face of large data sets, the traditional algorithm had insufficient memory, but the parallel algorithm completed the calculation task; the acceleration ratio of the parallel algorithm was raised with the expansion of cluster size and data set size, showing a good parallel effect. The experimental results verifies the reliability of parallel algorithm in big data processing, which makes some contributions to further improve the efficiency of data mining.Bu LingruiZhang HuiXing HaiyanWu LijunDe Gruyterarticlecloud computingdata miningparallel processinghadoop platformclustering algorithmScienceQElectronic computers. Computer scienceQA75.5-76.95ENJournal of Intelligent Systems, Vol 30, Iss 1, Pp 479-486 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
cloud computing data mining parallel processing hadoop platform clustering algorithm Science Q Electronic computers. Computer science QA75.5-76.95 |
spellingShingle |
cloud computing data mining parallel processing hadoop platform clustering algorithm Science Q Electronic computers. Computer science QA75.5-76.95 Bu Lingrui Zhang Hui Xing Haiyan Wu Lijun Research on parallel data processing of data mining platform in the background of cloud computing |
description |
The efficient processing of large-scale data has very important practical value. In this study, a data mining platform based on Hadoop distributed file system was designed, and then K-means algorithm was improved with the idea of max-min distance. On Hadoop distributed file system platform, the parallelization was realized by MapReduce. Finally, the data processing effect of the algorithm was analyzed with Iris data set. The results showed that the parallel algorithm divided more correct samples than the traditional algorithm; in the single-machine environment, the parallel algorithm ran longer; in the face of large data sets, the traditional algorithm had insufficient memory, but the parallel algorithm completed the calculation task; the acceleration ratio of the parallel algorithm was raised with the expansion of cluster size and data set size, showing a good parallel effect. The experimental results verifies the reliability of parallel algorithm in big data processing, which makes some contributions to further improve the efficiency of data mining. |
format |
article |
author |
Bu Lingrui Zhang Hui Xing Haiyan Wu Lijun |
author_facet |
Bu Lingrui Zhang Hui Xing Haiyan Wu Lijun |
author_sort |
Bu Lingrui |
title |
Research on parallel data processing of data mining platform in the background of cloud computing |
title_short |
Research on parallel data processing of data mining platform in the background of cloud computing |
title_full |
Research on parallel data processing of data mining platform in the background of cloud computing |
title_fullStr |
Research on parallel data processing of data mining platform in the background of cloud computing |
title_full_unstemmed |
Research on parallel data processing of data mining platform in the background of cloud computing |
title_sort |
research on parallel data processing of data mining platform in the background of cloud computing |
publisher |
De Gruyter |
publishDate |
2021 |
url |
https://doaj.org/article/b02a41ca611640df84daad8271d7b978 |
work_keys_str_mv |
AT bulingrui researchonparalleldataprocessingofdataminingplatforminthebackgroundofcloudcomputing AT zhanghui researchonparalleldataprocessingofdataminingplatforminthebackgroundofcloudcomputing AT xinghaiyan researchonparalleldataprocessingofdataminingplatforminthebackgroundofcloudcomputing AT wulijun researchonparalleldataprocessingofdataminingplatforminthebackgroundofcloudcomputing |
_version_ |
1718371683603054592 |