Sparse-PE: A Performance-Efficient Processing Engine Core for Sparse Convolutional Neural Networks

Sparse convolutional neural network (CNN) models reduce the massive compute and memory bandwidth requirements inherently present in dense CNNs without a significant loss in accuracy. Sparse CNNs, however, present their own set of challenges including non-linear data accesses and complex design of CN...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Mahmood Azhar Qureshi, Arslan Munir
Formato:	article
Lenguaje:	EN
Publicado:	IEEE 2021
Materias:	Convolutional neural networks (CNNs) hardware accelerators multi-threaded sparsity high-throughput Electrical engineering. Electronics. Nuclear engineering TK1-9971
Acceso en línea:	https://doaj.org/article/cb34629c94244243822ec50d59d8eb02
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:cb34629c94244243822ec50d59d8eb02
record_format	dspace
spelling	oai:doaj.org-article:cb34629c94244243822ec50d59d8eb022021-11-17T00:00:36ZSparse-PE: A Performance-Efficient Processing Engine Core for Sparse Convolutional Neural Networks2169-353610.1109/ACCESS.2021.3126708https://doaj.org/article/cb34629c94244243822ec50d59d8eb022021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9606879/https://doaj.org/toc/2169-3536Sparse convolutional neural network (CNN) models reduce the massive compute and memory bandwidth requirements inherently present in dense CNNs without a significant loss in accuracy. Sparse CNNs, however, present their own set of challenges including non-linear data accesses and complex design of CNN processing elements (PEs). Recently proposed accelerators like SCNN, Eyeriss v2, and SparTen, exploit the <italic>two-sided</italic> sparsity, that is, sparsity in both the input activations and weights to accelerate the CNN inference. These, accelerators, however, suffer from a multitude of problems that limit their applicability, such as inefficient micro-architecture (SCNN, Eyeriss v2), complex PE design (Eyeriss v2), no support for non-unit stride convolutions (SCNN) and FC layers (SparTen, SCNN). To address these issues in contemporary sparse CNN accelerators, we propose <italic>Sparse-PE</italic>, a multi-threaded, and flexible CNN PE, capable of handling both the dense and sparse CNNs. The Sparse-PE core uses binary mask representation and actively skips computations involving zeros and favors non-zero computations, thereby, drastically increasing the effective throughput and hardware utilization. Unlike previous designs, the Sparse-PE core is generic in nature and not targeted towards a specific accelerator, and thus, can also be used as a standalone sparse dot product compute engine. We evaluate the performance of the core using a custom built cycle accurate simulator. Our simulations show that the Sparse-PE core-based accelerator provides a performance gain of <inline-formula> <tex-math notation="LaTeX">$12\times $ </tex-math></inline-formula> over a recently proposed dense accelerator (NeuroMAX). For sparse accelerators, it provides a performance gain of <inline-formula> <tex-math notation="LaTeX">$4.2\times $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$2.38\times $ </tex-math></inline-formula>, and <inline-formula> <tex-math notation="LaTeX">$1.98\times $ </tex-math></inline-formula> over SCNN, Eyeriss v2, and SparTen, respectively.Mahmood Azhar QureshiArslan MunirIEEEarticleConvolutional neural networks (CNNs)hardware acceleratorsmulti-threadedsparsityhigh-throughputElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENIEEE Access, Vol 9, Pp 151458-151475 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Convolutional neural networks (CNNs) hardware accelerators multi-threaded sparsity high-throughput Electrical engineering. Electronics. Nuclear engineering TK1-9971
spellingShingle	Convolutional neural networks (CNNs) hardware accelerators multi-threaded sparsity high-throughput Electrical engineering. Electronics. Nuclear engineering TK1-9971 Mahmood Azhar Qureshi Arslan Munir Sparse-PE: A Performance-Efficient Processing Engine Core for Sparse Convolutional Neural Networks
description	Sparse convolutional neural network (CNN) models reduce the massive compute and memory bandwidth requirements inherently present in dense CNNs without a significant loss in accuracy. Sparse CNNs, however, present their own set of challenges including non-linear data accesses and complex design of CNN processing elements (PEs). Recently proposed accelerators like SCNN, Eyeriss v2, and SparTen, exploit the <italic>two-sided</italic> sparsity, that is, sparsity in both the input activations and weights to accelerate the CNN inference. These, accelerators, however, suffer from a multitude of problems that limit their applicability, such as inefficient micro-architecture (SCNN, Eyeriss v2), complex PE design (Eyeriss v2), no support for non-unit stride convolutions (SCNN) and FC layers (SparTen, SCNN). To address these issues in contemporary sparse CNN accelerators, we propose <italic>Sparse-PE</italic>, a multi-threaded, and flexible CNN PE, capable of handling both the dense and sparse CNNs. The Sparse-PE core uses binary mask representation and actively skips computations involving zeros and favors non-zero computations, thereby, drastically increasing the effective throughput and hardware utilization. Unlike previous designs, the Sparse-PE core is generic in nature and not targeted towards a specific accelerator, and thus, can also be used as a standalone sparse dot product compute engine. We evaluate the performance of the core using a custom built cycle accurate simulator. Our simulations show that the Sparse-PE core-based accelerator provides a performance gain of <inline-formula> <tex-math notation="LaTeX">$12\times $ </tex-math></inline-formula> over a recently proposed dense accelerator (NeuroMAX). For sparse accelerators, it provides a performance gain of <inline-formula> <tex-math notation="LaTeX">$4.2\times $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$2.38\times $ </tex-math></inline-formula>, and <inline-formula> <tex-math notation="LaTeX">$1.98\times $ </tex-math></inline-formula> over SCNN, Eyeriss v2, and SparTen, respectively.
format	article
author	Mahmood Azhar Qureshi Arslan Munir
author_facet	Mahmood Azhar Qureshi Arslan Munir
author_sort	Mahmood Azhar Qureshi
title	Sparse-PE: A Performance-Efficient Processing Engine Core for Sparse Convolutional Neural Networks
title_short	Sparse-PE: A Performance-Efficient Processing Engine Core for Sparse Convolutional Neural Networks
title_full	Sparse-PE: A Performance-Efficient Processing Engine Core for Sparse Convolutional Neural Networks
title_fullStr	Sparse-PE: A Performance-Efficient Processing Engine Core for Sparse Convolutional Neural Networks
title_full_unstemmed	Sparse-PE: A Performance-Efficient Processing Engine Core for Sparse Convolutional Neural Networks
title_sort	sparse-pe: a performance-efficient processing engine core for sparse convolutional neural networks
publisher	IEEE
publishDate	2021
url	https://doaj.org/article/cb34629c94244243822ec50d59d8eb02
work_keys_str_mv	AT mahmoodazharqureshi sparsepeaperformanceefficientprocessingenginecoreforsparseconvolutionalneuralnetworks AT arslanmunir sparsepeaperformanceefficientprocessingenginecoreforsparseconvolutionalneuralnetworks
_version_	1718426073691062272

Sparse-PE: A Performance-Efficient Processing Engine Core for Sparse Convolutional Neural Networks

Ejemplares similares