Understanding the Performance Characteristics of Computational Storage Drives: A Case Study with SmartSSD

The emerging computational storage drives (CSDs) provide new opportunities by moving data computation closer to the storage. Performing computation within storage drives enables data pre/post-processing without expensive data transfers. Moreover, large amounts of data can be processed in parallel th...

Description complète

Enregistré dans:
Détails bibliographiques
Auteurs principaux: Hwajung Kim, Heon Y. Yeom, Hanul Sung
Format: article
Langue:EN
Publié: MDPI AG 2021
Sujets:
Accès en ligne:https://doaj.org/article/0775209fe0b148cfa40ef7ae9eca2bd7
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
Description
Résumé:The emerging computational storage drives (CSDs) provide new opportunities by moving data computation closer to the storage. Performing computation within storage drives enables data pre/post-processing without expensive data transfers. Moreover, large amounts of data can be processed in parallel thanks to the nature of the field-programmable gate array (FPGA) included in CSDs. In a CSD, there are several implementation techniques that support parallel processing, each of which provides a different degree of parallelism. However, without sufficient understanding of the parallel processing techniques of CSD, it can lead to overhead due to misuse rather than benefiting from task offloading. Thus, to exploit the best performance of CSDs, it is important to properly adjust the degree of parallelism of each implementation technique. In this paper, we focus on the study of the differences in CSD performance according to various combinations of parallel processing techniques. To investigate the performance differences, we implement and offload the data verification algorithm to the CSD and analyze the performance and resource utilization. The experimental results show that implementing the data verification algorithm with a sufficient understanding of CSD’s parallel processing techniques can improve the performance by up to 20 times. Moreover, even with the same degree of parallelism, the performance can differ by 59% depending on the combination of implementation techniques. These results imply that proper orchestration of different implementation techniques leads to better performance and efficient resource utilization.