Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet

Convolutional neural networks (CNNs) have dominated image recognition and object detection models in the last few years. They can achieve the highest accuracies with several applications such as automotive and biomedical applications. CNNs are usually implemented by using Graphical Processing Units...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Ahmed J. Abd El-Maksoud, Mohamed Ebbed, Ahmed H. Khalil, Hassan Mostafa
Formato:	article
Lenguaje:	EN
Publicado:	IEEE 2021
Materias:	Convolutional neural networks (CNNs) field programmable gate arrays (FPGAs) GoogLeNet hardware accelerators object classification parallel computing Electrical engineering. Electronics. Nuclear engineering TK1-9971
Acceso en línea:	https://doaj.org/article/76953a014d404d07a2d8a929652c98f7
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:76953a014d404d07a2d8a929652c98f7
record_format	dspace
spelling	oai:doaj.org-article:76953a014d404d07a2d8a929652c98f72021-11-18T00:00:56ZPower Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet2169-353610.1109/ACCESS.2021.3126838https://doaj.org/article/76953a014d404d07a2d8a929652c98f72021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9610053/https://doaj.org/toc/2169-3536Convolutional neural networks (CNNs) have dominated image recognition and object detection models in the last few years. They can achieve the highest accuracies with several applications such as automotive and biomedical applications. CNNs are usually implemented by using Graphical Processing Units (GPUs) or generic processors. Although the GPUs are capable of performing the complex computations needed by the CNNs, their power consumption is huge compared to generic processors. Moreover, current generic processors are unable to cope up with the growing CNNs demand for computation performance. Therefore, hardware accelerators are the best choice to provide the required computation performance needed by the CNNs as well as affordable power consumption. Several techniques are adopted in hardware accelerators such as pruning and quantization. In this paper, a low-power dedicated CNN hardware accelerator is proposed based on GoogLeNet CNN as a case study. Weights pruning and quantization are applied to reduce the memory size by <inline-formula> <tex-math notation="LaTeX">$57.6\times $ </tex-math></inline-formula>. Consequently, only FPGA on-chip memory is used for weights and activations storage without using offline DRAMs (Dynamic Random Access Memories). In addition, the proposed hardware accelerator utilizes zero DSP (Digital Signal Processing) units as all multiplications are replaced by shifting operations. The accelerator is developed based on a time-sharing/pipelined architecture, which processes the CNN model layer by layer. The architecture proposes a new data fetching mechanism that increases data reuse. Moreover, the proposed accelerator units are implemented in native RTL (Register Transfer Logic). The accelerator classifies 25.1 frames per second (fps) with 3.92W only, which is more power-efficient than other GoogLeNet implementations on FPGA in the literature. In addition, the proposed accelerator achieves an average classification efficiency of 91%, which is significantly higher than comparable architectures. Furthermore, this accelerator surpasses the popular CPUs such as Intel Core-i7 and GPUs such as GTX 1080Ti in terms of the number of frames processed per Watt.Ahmed J. Abd El-MaksoudMohamed EbbedAhmed H. KhalilHassan MostafaIEEEarticleConvolutional neural networks (CNNs)field programmable gate arrays (FPGAs)GoogLeNethardware acceleratorsobject classificationparallel computingElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENIEEE Access, Vol 9, Pp 151897-151911 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Convolutional neural networks (CNNs) field programmable gate arrays (FPGAs) GoogLeNet hardware accelerators object classification parallel computing Electrical engineering. Electronics. Nuclear engineering TK1-9971
spellingShingle	Convolutional neural networks (CNNs) field programmable gate arrays (FPGAs) GoogLeNet hardware accelerators object classification parallel computing Electrical engineering. Electronics. Nuclear engineering TK1-9971 Ahmed J. Abd El-Maksoud Mohamed Ebbed Ahmed H. Khalil Hassan Mostafa Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet
description	Convolutional neural networks (CNNs) have dominated image recognition and object detection models in the last few years. They can achieve the highest accuracies with several applications such as automotive and biomedical applications. CNNs are usually implemented by using Graphical Processing Units (GPUs) or generic processors. Although the GPUs are capable of performing the complex computations needed by the CNNs, their power consumption is huge compared to generic processors. Moreover, current generic processors are unable to cope up with the growing CNNs demand for computation performance. Therefore, hardware accelerators are the best choice to provide the required computation performance needed by the CNNs as well as affordable power consumption. Several techniques are adopted in hardware accelerators such as pruning and quantization. In this paper, a low-power dedicated CNN hardware accelerator is proposed based on GoogLeNet CNN as a case study. Weights pruning and quantization are applied to reduce the memory size by <inline-formula> <tex-math notation="LaTeX">$57.6\times $ </tex-math></inline-formula>. Consequently, only FPGA on-chip memory is used for weights and activations storage without using offline DRAMs (Dynamic Random Access Memories). In addition, the proposed hardware accelerator utilizes zero DSP (Digital Signal Processing) units as all multiplications are replaced by shifting operations. The accelerator is developed based on a time-sharing/pipelined architecture, which processes the CNN model layer by layer. The architecture proposes a new data fetching mechanism that increases data reuse. Moreover, the proposed accelerator units are implemented in native RTL (Register Transfer Logic). The accelerator classifies 25.1 frames per second (fps) with 3.92W only, which is more power-efficient than other GoogLeNet implementations on FPGA in the literature. In addition, the proposed accelerator achieves an average classification efficiency of 91%, which is significantly higher than comparable architectures. Furthermore, this accelerator surpasses the popular CPUs such as Intel Core-i7 and GPUs such as GTX 1080Ti in terms of the number of frames processed per Watt.
format	article
author	Ahmed J. Abd El-Maksoud Mohamed Ebbed Ahmed H. Khalil Hassan Mostafa
author_facet	Ahmed J. Abd El-Maksoud Mohamed Ebbed Ahmed H. Khalil Hassan Mostafa
author_sort	Ahmed J. Abd El-Maksoud
title	Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet
title_short	Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet
title_full	Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet
title_fullStr	Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet
title_full_unstemmed	Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet
title_sort	power efficient design of high-performance convolutional neural networks hardware accelerator on fpga: a case study with googlenet
publisher	IEEE
publishDate	2021
url	https://doaj.org/article/76953a014d404d07a2d8a929652c98f7
work_keys_str_mv	AT ahmedjabdelmaksoud powerefficientdesignofhighperformanceconvolutionalneuralnetworkshardwareacceleratoronfpgaacasestudywithgooglenet AT mohamedebbed powerefficientdesignofhighperformanceconvolutionalneuralnetworkshardwareacceleratoronfpgaacasestudywithgooglenet AT ahmedhkhalil powerefficientdesignofhighperformanceconvolutionalneuralnetworkshardwareacceleratoronfpgaacasestudywithgooglenet AT hassanmostafa powerefficientdesignofhighperformanceconvolutionalneuralnetworkshardwareacceleratoronfpgaacasestudywithgooglenet
_version_	1718425231037562880

Power Efficient Design of High-Performance Convolutional Neural Networks Hardware Accelerator on FPGA: A Case Study With GoogLeNet

Ejemplares similares