Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function
Due to the quick increase in digital data, especially in mobile usage and social media, data deduplication has become a vital and cost-effective approach for removing redundant data segments, reducing the pressure imposed by enormous volumes of data that must be kept. As part of the data deduplicati...
Guardado en:
Autores principales: | , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
MDPI AG
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/d824968fbe98427286e582545b72422c |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:d824968fbe98427286e582545b72422c |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:d824968fbe98427286e582545b72422c2021-11-25T19:05:43ZFingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function10.3390/sym131119782073-8994https://doaj.org/article/d824968fbe98427286e582545b72422c2021-10-01T00:00:00Zhttps://www.mdpi.com/2073-8994/13/11/1978https://doaj.org/toc/2073-8994Due to the quick increase in digital data, especially in mobile usage and social media, data deduplication has become a vital and cost-effective approach for removing redundant data segments, reducing the pressure imposed by enormous volumes of data that must be kept. As part of the data deduplication process, fingerprints are employed to represent and identify identical data blocks. However, when the amount of data increases, the number of fingerprints grows as well, and due to the restricted memory size, the speed of data deduplication suffers dramatically. Various deduplication solutions show a bottleneck in the form of matching lookups and chunk fingerprint calculations, for which we pay in the form of storage and processors needed for storing hashes. Utilizing a fast hash algorithm to improve the fingerprint lookup performance is an appealing challenge. Thus, this study is focused on enhancing the deduplication system by suggesting a novel and effective mathematical bounded linear hashing algorithm that decreases the hashing time by more than two times compared to MD5 and SHA-1 and reduces the size of the hash index table by 50%. Due to the enormous number of chunk hash values, looking up and comparing hash values takes longer for large datasets; this work offers a hierarchal fingerprint lookup strategy to minimize the hash judgement comparison time by up to 78%. Our suggested system reduces the high latency imposed by deduplication procedures, primarily the hashing and matching phases. The symmetry of our work is based on the balance between the proposed hashing algorithm performance and its reflection on the system efficiency, as well as evaluating the approximate symmetries of the hashing and lookup phases compared to the other deduplication systems.Ahmed Sardar M. SaeedLoay E. GeorgeMDPI AGarticledata deduplicationmathematical bounded linear hashing algorithmhash lookuphashing index tableMathematicsQA1-939ENSymmetry, Vol 13, Iss 1978, p 1978 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
data deduplication mathematical bounded linear hashing algorithm hash lookup hashing index table Mathematics QA1-939 |
spellingShingle |
data deduplication mathematical bounded linear hashing algorithm hash lookup hashing index table Mathematics QA1-939 Ahmed Sardar M. Saeed Loay E. George Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function |
description |
Due to the quick increase in digital data, especially in mobile usage and social media, data deduplication has become a vital and cost-effective approach for removing redundant data segments, reducing the pressure imposed by enormous volumes of data that must be kept. As part of the data deduplication process, fingerprints are employed to represent and identify identical data blocks. However, when the amount of data increases, the number of fingerprints grows as well, and due to the restricted memory size, the speed of data deduplication suffers dramatically. Various deduplication solutions show a bottleneck in the form of matching lookups and chunk fingerprint calculations, for which we pay in the form of storage and processors needed for storing hashes. Utilizing a fast hash algorithm to improve the fingerprint lookup performance is an appealing challenge. Thus, this study is focused on enhancing the deduplication system by suggesting a novel and effective mathematical bounded linear hashing algorithm that decreases the hashing time by more than two times compared to MD5 and SHA-1 and reduces the size of the hash index table by 50%. Due to the enormous number of chunk hash values, looking up and comparing hash values takes longer for large datasets; this work offers a hierarchal fingerprint lookup strategy to minimize the hash judgement comparison time by up to 78%. Our suggested system reduces the high latency imposed by deduplication procedures, primarily the hashing and matching phases. The symmetry of our work is based on the balance between the proposed hashing algorithm performance and its reflection on the system efficiency, as well as evaluating the approximate symmetries of the hashing and lookup phases compared to the other deduplication systems. |
format |
article |
author |
Ahmed Sardar M. Saeed Loay E. George |
author_facet |
Ahmed Sardar M. Saeed Loay E. George |
author_sort |
Ahmed Sardar M. Saeed |
title |
Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function |
title_short |
Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function |
title_full |
Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function |
title_fullStr |
Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function |
title_full_unstemmed |
Fingerprint-Based Data Deduplication Using a Mathematical Bounded Linear Hash Function |
title_sort |
fingerprint-based data deduplication using a mathematical bounded linear hash function |
publisher |
MDPI AG |
publishDate |
2021 |
url |
https://doaj.org/article/d824968fbe98427286e582545b72422c |
work_keys_str_mv |
AT ahmedsardarmsaeed fingerprintbaseddatadeduplicationusingamathematicalboundedlinearhashfunction AT loayegeorge fingerprintbaseddatadeduplicationusingamathematicalboundedlinearhashfunction |
_version_ |
1718410308330979328 |