Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks

In this paper, we present hardware accelerators created with high-level synthesis techniques for sparse and dense matrix multiplication operations. The cores can operate with different precisions and are designed to be integrated in a heterogeneous CPU-FPGA system for Edge AI applications. The metho...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Jose Nunez-Yanez, Mohammad Hosseinabady
Formato:	article
Lenguaje:	EN
Publicado:	Elsevier 2021
Materias:	Neural network FPGA Quantization Pruning Matrix multiplication acceleration Convolution Computer engineering. Computer hardware TK7885-7895 Electronic computers. Computer science QA75.5-76.95
Acceso en línea:	https://doaj.org/article/dd166273a0504d5681c710bd4c82c5be
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:dd166273a0504d5681c710bd4c82c5be
record_format	dspace
spelling	oai:doaj.org-article:dd166273a0504d5681c710bd4c82c5be2021-11-10T04:40:24ZSparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks2590-005610.1016/j.array.2021.100101https://doaj.org/article/dd166273a0504d5681c710bd4c82c5be2021-12-01T00:00:00Zhttp://www.sciencedirect.com/science/article/pii/S259000562100045Xhttps://doaj.org/toc/2590-0056In this paper, we present hardware accelerators created with high-level synthesis techniques for sparse and dense matrix multiplication operations. The cores can operate with different precisions and are designed to be integrated in a heterogeneous CPU-FPGA system for Edge AI applications. The methodology involves quantization-sparsity aware training and it is applied to a case study consisting of human activity classification. We initially investigate the effects of quantization and sparsity on the accuracy of neural networks with convolution, dense and recurrent layers observing better tolerance to pruning when recurrent layers are present. Then, we propose the hardware accelerators that can switch precision at run-time and work with any matrix size up to a maximum configured at compile time. We compare the performance of these accelerators at different levels of precision and sparsity levels and create a performance model to enable workload balancing. The results show that the proposed sparse matrix multipliers can outperform dense multipliers when sparsity levels are higher than 70% and the improvements are more evident when higher precision arithmetic or structural pruning is used. Additionally, sparsity levels as high as 99% can maintain the level of accuracy required in the network especially when recurrent layers are deployed. Overall, the balance between sparse and dense performance depends on matrix shape, precision, structural pruning and sparsity levels and performance modelling can be used to balance concurrent execution in a heterogeneous configuration.Jose Nunez-YanezMohammad HosseinabadyElsevierarticleNeural networkFPGAQuantizationPruningMatrix multiplication accelerationConvolutionComputer engineering. Computer hardwareTK7885-7895Electronic computers. Computer scienceQA75.5-76.95ENArray, Vol 12, Iss , Pp 100101- (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	Neural network FPGA Quantization Pruning Matrix multiplication acceleration Convolution Computer engineering. Computer hardware TK7885-7895 Electronic computers. Computer science QA75.5-76.95
spellingShingle	Neural network FPGA Quantization Pruning Matrix multiplication acceleration Convolution Computer engineering. Computer hardware TK7885-7895 Electronic computers. Computer science QA75.5-76.95 Jose Nunez-Yanez Mohammad Hosseinabady Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks
description	In this paper, we present hardware accelerators created with high-level synthesis techniques for sparse and dense matrix multiplication operations. The cores can operate with different precisions and are designed to be integrated in a heterogeneous CPU-FPGA system for Edge AI applications. The methodology involves quantization-sparsity aware training and it is applied to a case study consisting of human activity classification. We initially investigate the effects of quantization and sparsity on the accuracy of neural networks with convolution, dense and recurrent layers observing better tolerance to pruning when recurrent layers are present. Then, we propose the hardware accelerators that can switch precision at run-time and work with any matrix size up to a maximum configured at compile time. We compare the performance of these accelerators at different levels of precision and sparsity levels and create a performance model to enable workload balancing. The results show that the proposed sparse matrix multipliers can outperform dense multipliers when sparsity levels are higher than 70% and the improvements are more evident when higher precision arithmetic or structural pruning is used. Additionally, sparsity levels as high as 99% can maintain the level of accuracy required in the network especially when recurrent layers are deployed. Overall, the balance between sparse and dense performance depends on matrix shape, precision, structural pruning and sparsity levels and performance modelling can be used to balance concurrent execution in a heterogeneous configuration.
format	article
author	Jose Nunez-Yanez Mohammad Hosseinabady
author_facet	Jose Nunez-Yanez Mohammad Hosseinabady
author_sort	Jose Nunez-Yanez
title	Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks
title_short	Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks
title_full	Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks
title_fullStr	Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks
title_full_unstemmed	Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks
title_sort	sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks
publisher	Elsevier
publishDate	2021
url	https://doaj.org/article/dd166273a0504d5681c710bd4c82c5be
work_keys_str_mv	AT josenunezyanez sparseanddensematrixmultiplicationhardwareforheterogeneousmultiprecisionneuralnetworks AT mohammadhosseinabady sparseanddensematrixmultiplicationhardwareforheterogeneousmultiprecisionneuralnetworks
_version_	1718440605623779328

Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks

Ejemplares similares