Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks

In this paper, we present hardware accelerators created with high-level synthesis techniques for sparse and dense matrix multiplication operations. The cores can operate with different precisions and are designed to be integrated in a heterogeneous CPU-FPGA system for Edge AI applications. The metho...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Jose Nunez-Yanez, Mohammad Hosseinabady
Formato: article
Lenguaje:EN
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://doaj.org/article/dd166273a0504d5681c710bd4c82c5be
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:dd166273a0504d5681c710bd4c82c5be
record_format dspace
spelling oai:doaj.org-article:dd166273a0504d5681c710bd4c82c5be2021-11-10T04:40:24ZSparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks2590-005610.1016/j.array.2021.100101https://doaj.org/article/dd166273a0504d5681c710bd4c82c5be2021-12-01T00:00:00Zhttp://www.sciencedirect.com/science/article/pii/S259000562100045Xhttps://doaj.org/toc/2590-0056In this paper, we present hardware accelerators created with high-level synthesis techniques for sparse and dense matrix multiplication operations. The cores can operate with different precisions and are designed to be integrated in a heterogeneous CPU-FPGA system for Edge AI applications. The methodology involves quantization-sparsity aware training and it is applied to a case study consisting of human activity classification. We initially investigate the effects of quantization and sparsity on the accuracy of neural networks with convolution, dense and recurrent layers observing better tolerance to pruning when recurrent layers are present. Then, we propose the hardware accelerators that can switch precision at run-time and work with any matrix size up to a maximum configured at compile time. We compare the performance of these accelerators at different levels of precision and sparsity levels and create a performance model to enable workload balancing. The results show that the proposed sparse matrix multipliers can outperform dense multipliers when sparsity levels are higher than 70% and the improvements are more evident when higher precision arithmetic or structural pruning is used. Additionally, sparsity levels as high as 99% can maintain the level of accuracy required in the network especially when recurrent layers are deployed. Overall, the balance between sparse and dense performance depends on matrix shape, precision, structural pruning and sparsity levels and performance modelling can be used to balance concurrent execution in a heterogeneous configuration.Jose Nunez-YanezMohammad HosseinabadyElsevierarticleNeural networkFPGAQuantizationPruningMatrix multiplication accelerationConvolutionComputer engineering. Computer hardwareTK7885-7895Electronic computers. Computer scienceQA75.5-76.95ENArray, Vol 12, Iss , Pp 100101- (2021)
institution DOAJ
collection DOAJ
language EN
topic Neural network
FPGA
Quantization
Pruning
Matrix multiplication acceleration
Convolution
Computer engineering. Computer hardware
TK7885-7895
Electronic computers. Computer science
QA75.5-76.95
spellingShingle Neural network
FPGA
Quantization
Pruning
Matrix multiplication acceleration
Convolution
Computer engineering. Computer hardware
TK7885-7895
Electronic computers. Computer science
QA75.5-76.95
Jose Nunez-Yanez
Mohammad Hosseinabady
Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks
description In this paper, we present hardware accelerators created with high-level synthesis techniques for sparse and dense matrix multiplication operations. The cores can operate with different precisions and are designed to be integrated in a heterogeneous CPU-FPGA system for Edge AI applications. The methodology involves quantization-sparsity aware training and it is applied to a case study consisting of human activity classification. We initially investigate the effects of quantization and sparsity on the accuracy of neural networks with convolution, dense and recurrent layers observing better tolerance to pruning when recurrent layers are present. Then, we propose the hardware accelerators that can switch precision at run-time and work with any matrix size up to a maximum configured at compile time. We compare the performance of these accelerators at different levels of precision and sparsity levels and create a performance model to enable workload balancing. The results show that the proposed sparse matrix multipliers can outperform dense multipliers when sparsity levels are higher than 70% and the improvements are more evident when higher precision arithmetic or structural pruning is used. Additionally, sparsity levels as high as 99% can maintain the level of accuracy required in the network especially when recurrent layers are deployed. Overall, the balance between sparse and dense performance depends on matrix shape, precision, structural pruning and sparsity levels and performance modelling can be used to balance concurrent execution in a heterogeneous configuration.
format article
author Jose Nunez-Yanez
Mohammad Hosseinabady
author_facet Jose Nunez-Yanez
Mohammad Hosseinabady
author_sort Jose Nunez-Yanez
title Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks
title_short Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks
title_full Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks
title_fullStr Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks
title_full_unstemmed Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks
title_sort sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks
publisher Elsevier
publishDate 2021
url https://doaj.org/article/dd166273a0504d5681c710bd4c82c5be
work_keys_str_mv AT josenunezyanez sparseanddensematrixmultiplicationhardwareforheterogeneousmultiprecisionneuralnetworks
AT mohammadhosseinabady sparseanddensematrixmultiplicationhardwareforheterogeneousmultiprecisionneuralnetworks
_version_ 1718440605623779328