Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP
Abstract Resource utilization plays a crucial role for successful implementation of fast real-time inference for deep neural networks (DNNs) and convolutional neural networks (CNNs) on latest generation of hardware accelerators (FPGAs, SoCs, ACAPs, GPUs). To fulfil the needs of the triggers that are...
Guardado en:
Autores principales: | , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
SpringerOpen
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/39c990620026419e9435862b94fc5b24 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:39c990620026419e9435862b94fc5b24 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:39c990620026419e9435862b94fc5b242021-11-08T10:43:22ZModel compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP10.1140/epjc/s10052-021-09770-w1434-60441434-6052https://doaj.org/article/39c990620026419e9435862b94fc5b242021-11-01T00:00:00Zhttps://doi.org/10.1140/epjc/s10052-021-09770-whttps://doaj.org/toc/1434-6044https://doaj.org/toc/1434-6052Abstract Resource utilization plays a crucial role for successful implementation of fast real-time inference for deep neural networks (DNNs) and convolutional neural networks (CNNs) on latest generation of hardware accelerators (FPGAs, SoCs, ACAPs, GPUs). To fulfil the needs of the triggers that are in development for the upgraded LHC detectors, we have developed a multi-stage compression approach based on conventional compression strategies (pruning and quantization) to reduce the memory footprint of the model and knowledge transfer techniques, crucial to streamline the DNNs simplifying the synthesis phase in the FPGA firmware and improving explainability. We present the developed methodologies and the results of the implementation in a working engineering pipeline used as pre-processing stage to high level synthesis tools (HLS4ML, Xilinx Vivado HLS, etc.). We show how it is possible to build ultra-light deep neural networks in practice, by applying the method to a realistic HEP use-case: a toy simulation of one of the triggers planned for the HL-LHC.Simone FrancescatoStefano GiaguFederica RitiGraziella RussoLuigi SabettaFederico TortonesiSpringerOpenarticleAstrophysicsQB460-466Nuclear and particle physics. Atomic energy. RadioactivityQC770-798ENEuropean Physical Journal C: Particles and Fields, Vol 81, Iss 11, Pp 1-10 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Astrophysics QB460-466 Nuclear and particle physics. Atomic energy. Radioactivity QC770-798 |
spellingShingle |
Astrophysics QB460-466 Nuclear and particle physics. Atomic energy. Radioactivity QC770-798 Simone Francescato Stefano Giagu Federica Riti Graziella Russo Luigi Sabetta Federico Tortonesi Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP |
description |
Abstract Resource utilization plays a crucial role for successful implementation of fast real-time inference for deep neural networks (DNNs) and convolutional neural networks (CNNs) on latest generation of hardware accelerators (FPGAs, SoCs, ACAPs, GPUs). To fulfil the needs of the triggers that are in development for the upgraded LHC detectors, we have developed a multi-stage compression approach based on conventional compression strategies (pruning and quantization) to reduce the memory footprint of the model and knowledge transfer techniques, crucial to streamline the DNNs simplifying the synthesis phase in the FPGA firmware and improving explainability. We present the developed methodologies and the results of the implementation in a working engineering pipeline used as pre-processing stage to high level synthesis tools (HLS4ML, Xilinx Vivado HLS, etc.). We show how it is possible to build ultra-light deep neural networks in practice, by applying the method to a realistic HEP use-case: a toy simulation of one of the triggers planned for the HL-LHC. |
format |
article |
author |
Simone Francescato Stefano Giagu Federica Riti Graziella Russo Luigi Sabetta Federico Tortonesi |
author_facet |
Simone Francescato Stefano Giagu Federica Riti Graziella Russo Luigi Sabetta Federico Tortonesi |
author_sort |
Simone Francescato |
title |
Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP |
title_short |
Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP |
title_full |
Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP |
title_fullStr |
Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP |
title_full_unstemmed |
Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP |
title_sort |
model compression and simplification pipelines for fast deep neural network inference in fpgas in hep |
publisher |
SpringerOpen |
publishDate |
2021 |
url |
https://doaj.org/article/39c990620026419e9435862b94fc5b24 |
work_keys_str_mv |
AT simonefrancescato modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep AT stefanogiagu modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep AT federicariti modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep AT graziellarusso modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep AT luigisabetta modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep AT federicotortonesi modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep |
_version_ |
1718442702561869824 |