Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet

Convolutional neural networks (CNNs) have dominated image recognition and object detection models in the last few years. They can achieve the highest accuracies with several applications such as automotive and biomedical applications. CNNs are usually implemented by using Graphical Processing Units...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Ahmed J. Abd El-Maksoud, Mohamed Ebbed, Ahmed H. Khalil, Hassan Mostafa
Formato: article
Lenguaje:EN
Publicado: IEEE 2021
Materias:
Acceso en línea:https://doaj.org/article/76953a014d404d07a2d8a929652c98f7
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:76953a014d404d07a2d8a929652c98f7
record_format dspace
spelling oai:doaj.org-article:76953a014d404d07a2d8a929652c98f72021-11-18T00:00:56ZPower Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet2169-353610.1109/ACCESS.2021.3126838https://doaj.org/article/76953a014d404d07a2d8a929652c98f72021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9610053/https://doaj.org/toc/2169-3536Convolutional neural networks (CNNs) have dominated image recognition and object detection models in the last few years. They can achieve the highest accuracies with several applications such as automotive and biomedical applications. CNNs are usually implemented by using Graphical Processing Units (GPUs) or generic processors. Although the GPUs are capable of performing the complex computations needed by the CNNs, their power consumption is huge compared to generic processors. Moreover, current generic processors are unable to cope up with the growing CNNs demand for computation performance. Therefore, hardware accelerators are the best choice to provide the required computation performance needed by the CNNs as well as affordable power consumption. Several techniques are adopted in hardware accelerators such as pruning and quantization. In this paper, a low-power dedicated CNN hardware accelerator is proposed based on GoogLeNet CNN as a case study. Weights pruning and quantization are applied to reduce the memory size by <inline-formula> <tex-math notation="LaTeX">$57.6\times $ </tex-math></inline-formula>. Consequently, only FPGA on-chip memory is used for weights and activations storage without using offline DRAMs (Dynamic Random Access Memories). In addition, the proposed hardware accelerator utilizes zero DSP (Digital Signal Processing) units as all multiplications are replaced by shifting operations. The accelerator is developed based on a time-sharing/pipelined architecture, which processes the CNN model layer by layer. The architecture proposes a new data fetching mechanism that increases data reuse. Moreover, the proposed accelerator units are implemented in native RTL (Register Transfer Logic). The accelerator classifies 25.1 frames per second (fps) with 3.92W only, which is more power-efficient than other GoogLeNet implementations on FPGA in the literature. In addition, the proposed accelerator achieves an average classification efficiency of 91&#x0025;, which is significantly higher than comparable architectures. Furthermore, this accelerator surpasses the popular CPUs such as Intel Core-i7 and GPUs such as GTX 1080Ti in terms of the number of frames processed per Watt.Ahmed J. Abd El-MaksoudMohamed EbbedAhmed H. KhalilHassan MostafaIEEEarticleConvolutional neural networks (CNNs)field programmable gate arrays (FPGAs)GoogLeNethardware acceleratorsobject classificationparallel computingElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENIEEE Access, Vol 9, Pp 151897-151911 (2021)
institution DOAJ
collection DOAJ
language EN
topic Convolutional neural networks (CNNs)
field programmable gate arrays (FPGAs)
GoogLeNet
hardware accelerators
object classification
parallel computing
Electrical engineering. Electronics. Nuclear engineering
TK1-9971
spellingShingle Convolutional neural networks (CNNs)
field programmable gate arrays (FPGAs)
GoogLeNet
hardware accelerators
object classification
parallel computing
Electrical engineering. Electronics. Nuclear engineering
TK1-9971
Ahmed J. Abd El-Maksoud
Mohamed Ebbed
Ahmed H. Khalil
Hassan Mostafa
Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet
description Convolutional neural networks (CNNs) have dominated image recognition and object detection models in the last few years. They can achieve the highest accuracies with several applications such as automotive and biomedical applications. CNNs are usually implemented by using Graphical Processing Units (GPUs) or generic processors. Although the GPUs are capable of performing the complex computations needed by the CNNs, their power consumption is huge compared to generic processors. Moreover, current generic processors are unable to cope up with the growing CNNs demand for computation performance. Therefore, hardware accelerators are the best choice to provide the required computation performance needed by the CNNs as well as affordable power consumption. Several techniques are adopted in hardware accelerators such as pruning and quantization. In this paper, a low-power dedicated CNN hardware accelerator is proposed based on GoogLeNet CNN as a case study. Weights pruning and quantization are applied to reduce the memory size by <inline-formula> <tex-math notation="LaTeX">$57.6\times $ </tex-math></inline-formula>. Consequently, only FPGA on-chip memory is used for weights and activations storage without using offline DRAMs (Dynamic Random Access Memories). In addition, the proposed hardware accelerator utilizes zero DSP (Digital Signal Processing) units as all multiplications are replaced by shifting operations. The accelerator is developed based on a time-sharing/pipelined architecture, which processes the CNN model layer by layer. The architecture proposes a new data fetching mechanism that increases data reuse. Moreover, the proposed accelerator units are implemented in native RTL (Register Transfer Logic). The accelerator classifies 25.1 frames per second (fps) with 3.92W only, which is more power-efficient than other GoogLeNet implementations on FPGA in the literature. In addition, the proposed accelerator achieves an average classification efficiency of 91&#x0025;, which is significantly higher than comparable architectures. Furthermore, this accelerator surpasses the popular CPUs such as Intel Core-i7 and GPUs such as GTX 1080Ti in terms of the number of frames processed per Watt.
format article
author Ahmed J. Abd El-Maksoud
Mohamed Ebbed
Ahmed H. Khalil
Hassan Mostafa
author_facet Ahmed J. Abd El-Maksoud
Mohamed Ebbed
Ahmed H. Khalil
Hassan Mostafa
author_sort Ahmed J. Abd El-Maksoud
title Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet
title_short Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet
title_full Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet
title_fullStr Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet
title_full_unstemmed Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet
title_sort power efficient design of high-performance convolutional neural networks hardware accelerator on fpga: a case study with googlenet
publisher IEEE
publishDate 2021
url https://doaj.org/article/76953a014d404d07a2d8a929652c98f7
work_keys_str_mv AT ahmedjabdelmaksoud powerefficientdesignofhighperformanceconvolutionalneuralnetworkshardwareacceleratoronfpgaacasestudywithgooglenet
AT mohamedebbed powerefficientdesignofhighperformanceconvolutionalneuralnetworkshardwareacceleratoronfpgaacasestudywithgooglenet
AT ahmedhkhalil powerefficientdesignofhighperformanceconvolutionalneuralnetworkshardwareacceleratoronfpgaacasestudywithgooglenet
AT hassanmostafa powerefficientdesignofhighperformanceconvolutionalneuralnetworkshardwareacceleratoronfpgaacasestudywithgooglenet
_version_ 1718425231037562880