Minimum threshold determination method based on dataset characteristics in association rule mining

Abstract Association rule mining is a technique that is widely used in data mining. This technique is used to identify interesting relationships between sets of items in a dataset and predict associative behavior for new data. Before the rule is formed, it must be determined in advance which items w...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Erna Hikmawati, Nur Ulfa Maulidevi, Kridanto Surendro
Formato: article
Lenguaje:EN
Publicado: SpringerOpen 2021
Materias:
Acceso en línea:https://doaj.org/article/1a6effe4028547f6974d48e0b242a861
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:1a6effe4028547f6974d48e0b242a861
record_format dspace
spelling oai:doaj.org-article:1a6effe4028547f6974d48e0b242a8612021-11-28T12:03:18ZMinimum threshold determination method based on dataset characteristics in association rule mining10.1186/s40537-021-00538-32196-1115https://doaj.org/article/1a6effe4028547f6974d48e0b242a8612021-11-01T00:00:00Zhttps://doi.org/10.1186/s40537-021-00538-3https://doaj.org/toc/2196-1115Abstract Association rule mining is a technique that is widely used in data mining. This technique is used to identify interesting relationships between sets of items in a dataset and predict associative behavior for new data. Before the rule is formed, it must be determined in advance which items will be involved or called the frequent itemset. In this step, a threshold is used to eliminate items excluded in the frequent itemset which is also known as the minimum support. Furthermore, the threshold provides an important role in determining the number of rules generated. However, setting the wrong threshold leads to the failure of the association rule mining to obtain rules. Currently, user determines the minimum support value randomly. This leads to a challenge that becomes worse for a user that is ignorant of the dataset characteristics. It causes a lot of memory and time consumption. This is because the rule formation process is repeated until it finds the desired number of rules. The value of minimum support in the adaptive support model is determined based on the average and total number of items in each transaction, as well as their support values. Furthermore, the proposed method also uses certain criteria as thresholds, therefore, the resulting rules are in accordance with user needs. The minimum support value in the proposed method is obtained from the average utility value divided by the total existing transactions. Experiments were carried out on 8 specific datasets to determine the association rules using different dataset characteristics. The trial of the proposed adaptive support method uses 2 basic algorithms in the association rule, namely Apriori and Fpgrowth. The test is carried out repeatedly to determine the highest and lowest minimum support values. The result showed that 6 out of 8 datasets produced minimum and maximum support values for the apriori and fpgrowth algorithms. This means that the value of the proposed adaptive support has the ability to generate a rule when viewed from the quality as adaptive support produces at a lift ratio value of > 1. The dataset characteristics obtained from the experimental results can be used as a factor to determine the minimum threshold value.Erna HikmawatiNur Ulfa MaulideviKridanto SurendroSpringerOpenarticleMinimum thresholdAdaptive ruleAssociation ruleComputer engineering. Computer hardwareTK7885-7895Information technologyT58.5-58.64Electronic computers. Computer scienceQA75.5-76.95ENJournal of Big Data, Vol 8, Iss 1, Pp 1-17 (2021)
institution DOAJ
collection DOAJ
language EN
topic Minimum threshold
Adaptive rule
Association rule
Computer engineering. Computer hardware
TK7885-7895
Information technology
T58.5-58.64
Electronic computers. Computer science
QA75.5-76.95
spellingShingle Minimum threshold
Adaptive rule
Association rule
Computer engineering. Computer hardware
TK7885-7895
Information technology
T58.5-58.64
Electronic computers. Computer science
QA75.5-76.95
Erna Hikmawati
Nur Ulfa Maulidevi
Kridanto Surendro
Minimum threshold determination method based on dataset characteristics in association rule mining
description Abstract Association rule mining is a technique that is widely used in data mining. This technique is used to identify interesting relationships between sets of items in a dataset and predict associative behavior for new data. Before the rule is formed, it must be determined in advance which items will be involved or called the frequent itemset. In this step, a threshold is used to eliminate items excluded in the frequent itemset which is also known as the minimum support. Furthermore, the threshold provides an important role in determining the number of rules generated. However, setting the wrong threshold leads to the failure of the association rule mining to obtain rules. Currently, user determines the minimum support value randomly. This leads to a challenge that becomes worse for a user that is ignorant of the dataset characteristics. It causes a lot of memory and time consumption. This is because the rule formation process is repeated until it finds the desired number of rules. The value of minimum support in the adaptive support model is determined based on the average and total number of items in each transaction, as well as their support values. Furthermore, the proposed method also uses certain criteria as thresholds, therefore, the resulting rules are in accordance with user needs. The minimum support value in the proposed method is obtained from the average utility value divided by the total existing transactions. Experiments were carried out on 8 specific datasets to determine the association rules using different dataset characteristics. The trial of the proposed adaptive support method uses 2 basic algorithms in the association rule, namely Apriori and Fpgrowth. The test is carried out repeatedly to determine the highest and lowest minimum support values. The result showed that 6 out of 8 datasets produced minimum and maximum support values for the apriori and fpgrowth algorithms. This means that the value of the proposed adaptive support has the ability to generate a rule when viewed from the quality as adaptive support produces at a lift ratio value of > 1. The dataset characteristics obtained from the experimental results can be used as a factor to determine the minimum threshold value.
format article
author Erna Hikmawati
Nur Ulfa Maulidevi
Kridanto Surendro
author_facet Erna Hikmawati
Nur Ulfa Maulidevi
Kridanto Surendro
author_sort Erna Hikmawati
title Minimum threshold determination method based on dataset characteristics in association rule mining
title_short Minimum threshold determination method based on dataset characteristics in association rule mining
title_full Minimum threshold determination method based on dataset characteristics in association rule mining
title_fullStr Minimum threshold determination method based on dataset characteristics in association rule mining
title_full_unstemmed Minimum threshold determination method based on dataset characteristics in association rule mining
title_sort minimum threshold determination method based on dataset characteristics in association rule mining
publisher SpringerOpen
publishDate 2021
url https://doaj.org/article/1a6effe4028547f6974d48e0b242a861
work_keys_str_mv AT ernahikmawati minimumthresholddeterminationmethodbasedondatasetcharacteristicsinassociationrulemining
AT nurulfamaulidevi minimumthresholddeterminationmethodbasedondatasetcharacteristicsinassociationrulemining
AT kridantosurendro minimumthresholddeterminationmethodbasedondatasetcharacteristicsinassociationrulemining
_version_ 1718408244395769856