Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function

Due to the quick increase in digital data, especially in mobile usage and social media, data deduplication has become a vital and cost-effective approach for removing redundant data segments, reducing the pressure imposed by enormous volumes of data that must be kept. As part of the data deduplicati...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ahmed Sardar M. Saeed, Loay E. George
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Acceso en línea:https://doaj.org/article/d824968fbe98427286e582545b72422c
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:d824968fbe98427286e582545b72422c
record_format dspace
spelling oai:doaj.org-article:d824968fbe98427286e582545b72422c2021-11-25T19:05:43ZFingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function10.3390/sym131119782073-8994https://doaj.org/article/d824968fbe98427286e582545b72422c2021-10-01T00:00:00Zhttps://www.mdpi.com/2073-8994/13/11/1978https://doaj.org/toc/2073-8994Due to the quick increase in digital data, especially in mobile usage and social media, data deduplication has become a vital and cost-effective approach for removing redundant data segments, reducing the pressure imposed by enormous volumes of data that must be kept. As part of the data deduplication process, fingerprints are employed to represent and identify identical data blocks. However, when the amount of data increases, the number of fingerprints grows as well, and due to the restricted memory size, the speed of data deduplication suffers dramatically. Various deduplication solutions show a bottleneck in the form of matching lookups and chunk fingerprint calculations, for which we pay in the form of storage and processors needed for storing hashes. Utilizing a fast hash algorithm to improve the fingerprint lookup performance is an appealing challenge. Thus, this study is focused on enhancing the deduplication system by suggesting a novel and effective mathematical bounded linear hashing algorithm that decreases the hashing time by more than two times compared to MD5 and SHA-1 and reduces the size of the hash index table by 50%. Due to the enormous number of chunk hash values, looking up and comparing hash values takes longer for large datasets; this work offers a hierarchal fingerprint lookup strategy to minimize the hash judgement comparison time by up to 78%. Our suggested system reduces the high latency imposed by deduplication procedures, primarily the hashing and matching phases. The symmetry of our work is based on the balance between the proposed hashing algorithm performance and its reflection on the system efficiency, as well as evaluating the approximate symmetries of the hashing and lookup phases compared to the other deduplication systems.Ahmed Sardar M. SaeedLoay E. GeorgeMDPI AGarticledata deduplicationmathematical bounded linear hashing algorithmhash lookuphashing index tableMathematicsQA1-939ENSymmetry, Vol 13, Iss 1978, p 1978 (2021)
institution DOAJ
collection DOAJ
language EN
topic data deduplication
mathematical bounded linear hashing algorithm
hash lookup
hashing index table
Mathematics
QA1-939
spellingShingle data deduplication
mathematical bounded linear hashing algorithm
hash lookup
hashing index table
Mathematics
QA1-939
Ahmed Sardar M. Saeed
Loay E. George
Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function
description Due to the quick increase in digital data, especially in mobile usage and social media, data deduplication has become a vital and cost-effective approach for removing redundant data segments, reducing the pressure imposed by enormous volumes of data that must be kept. As part of the data deduplication process, fingerprints are employed to represent and identify identical data blocks. However, when the amount of data increases, the number of fingerprints grows as well, and due to the restricted memory size, the speed of data deduplication suffers dramatically. Various deduplication solutions show a bottleneck in the form of matching lookups and chunk fingerprint calculations, for which we pay in the form of storage and processors needed for storing hashes. Utilizing a fast hash algorithm to improve the fingerprint lookup performance is an appealing challenge. Thus, this study is focused on enhancing the deduplication system by suggesting a novel and effective mathematical bounded linear hashing algorithm that decreases the hashing time by more than two times compared to MD5 and SHA-1 and reduces the size of the hash index table by 50%. Due to the enormous number of chunk hash values, looking up and comparing hash values takes longer for large datasets; this work offers a hierarchal fingerprint lookup strategy to minimize the hash judgement comparison time by up to 78%. Our suggested system reduces the high latency imposed by deduplication procedures, primarily the hashing and matching phases. The symmetry of our work is based on the balance between the proposed hashing algorithm performance and its reflection on the system efficiency, as well as evaluating the approximate symmetries of the hashing and lookup phases compared to the other deduplication systems.
format article
author Ahmed Sardar M. Saeed
Loay E. George
author_facet Ahmed Sardar M. Saeed
Loay E. George
author_sort Ahmed Sardar M. Saeed
title Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function
title_short Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function
title_full Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function
title_fullStr Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function
title_full_unstemmed Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function
title_sort fingerprint-based data deduplication using a mathematical bounded linear hash function
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/d824968fbe98427286e582545b72422c
work_keys_str_mv AT ahmedsardarmsaeed fingerprintbaseddatadeduplicationusingamathematicalboundedlinearhashfunction
AT loayegeorge fingerprintbaseddatadeduplicationusingamathematicalboundedlinearhashfunction
_version_ 1718410308330979328