Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP

Abstract Resource utilization plays a crucial role for successful implementation of fast real-time inference for deep neural networks (DNNs) and convolutional neural networks (CNNs) on latest generation of hardware accelerators (FPGAs, SoCs, ACAPs, GPUs). To fulfil the needs of the triggers that are...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Simone Francescato, Stefano Giagu, Federica Riti, Graziella Russo, Luigi Sabetta, Federico Tortonesi
Formato: article
Lenguaje:EN
Publicado: SpringerOpen 2021
Materias:
Acceso en línea:https://doaj.org/article/39c990620026419e9435862b94fc5b24
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:39c990620026419e9435862b94fc5b24
record_format dspace
spelling oai:doaj.org-article:39c990620026419e9435862b94fc5b242021-11-08T10:43:22ZModel compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP10.1140/epjc/s10052-021-09770-w1434-60441434-6052https://doaj.org/article/39c990620026419e9435862b94fc5b242021-11-01T00:00:00Zhttps://doi.org/10.1140/epjc/s10052-021-09770-whttps://doaj.org/toc/1434-6044https://doaj.org/toc/1434-6052Abstract Resource utilization plays a crucial role for successful implementation of fast real-time inference for deep neural networks (DNNs) and convolutional neural networks (CNNs) on latest generation of hardware accelerators (FPGAs, SoCs, ACAPs, GPUs). To fulfil the needs of the triggers that are in development for the upgraded LHC detectors, we have developed a multi-stage compression approach based on conventional compression strategies (pruning and quantization) to reduce the memory footprint of the model and knowledge transfer techniques, crucial to streamline the DNNs simplifying the synthesis phase in the FPGA firmware and improving explainability. We present the developed methodologies and the results of the implementation in a working engineering pipeline used as pre-processing stage to high level synthesis tools (HLS4ML, Xilinx Vivado HLS, etc.). We show how it is possible to build ultra-light deep neural networks in practice, by applying the method to a realistic HEP use-case: a toy simulation of one of the triggers planned for the HL-LHC.Simone FrancescatoStefano GiaguFederica RitiGraziella RussoLuigi SabettaFederico TortonesiSpringerOpenarticleAstrophysicsQB460-466Nuclear and particle physics. Atomic energy. RadioactivityQC770-798ENEuropean Physical Journal C: Particles and Fields, Vol 81, Iss 11, Pp 1-10 (2021)
institution DOAJ
collection DOAJ
language EN
topic Astrophysics
QB460-466
Nuclear and particle physics. Atomic energy. Radioactivity
QC770-798
spellingShingle Astrophysics
QB460-466
Nuclear and particle physics. Atomic energy. Radioactivity
QC770-798
Simone Francescato
Stefano Giagu
Federica Riti
Graziella Russo
Luigi Sabetta
Federico Tortonesi
Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP
description Abstract Resource utilization plays a crucial role for successful implementation of fast real-time inference for deep neural networks (DNNs) and convolutional neural networks (CNNs) on latest generation of hardware accelerators (FPGAs, SoCs, ACAPs, GPUs). To fulfil the needs of the triggers that are in development for the upgraded LHC detectors, we have developed a multi-stage compression approach based on conventional compression strategies (pruning and quantization) to reduce the memory footprint of the model and knowledge transfer techniques, crucial to streamline the DNNs simplifying the synthesis phase in the FPGA firmware and improving explainability. We present the developed methodologies and the results of the implementation in a working engineering pipeline used as pre-processing stage to high level synthesis tools (HLS4ML, Xilinx Vivado HLS, etc.). We show how it is possible to build ultra-light deep neural networks in practice, by applying the method to a realistic HEP use-case: a toy simulation of one of the triggers planned for the HL-LHC.
format article
author Simone Francescato
Stefano Giagu
Federica Riti
Graziella Russo
Luigi Sabetta
Federico Tortonesi
author_facet Simone Francescato
Stefano Giagu
Federica Riti
Graziella Russo
Luigi Sabetta
Federico Tortonesi
author_sort Simone Francescato
title Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP
title_short Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP
title_full Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP
title_fullStr Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP
title_full_unstemmed Model compression and simplification pipelines for fast deep neural network inference in FPGAs in HEP
title_sort model compression and simplification pipelines for fast deep neural network inference in fpgas in hep
publisher SpringerOpen
publishDate 2021
url https://doaj.org/article/39c990620026419e9435862b94fc5b24
work_keys_str_mv AT simonefrancescato modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep
AT stefanogiagu modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep
AT federicariti modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep
AT graziellarusso modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep
AT luigisabetta modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep
AT federicotortonesi modelcompressionandsimplificationpipelinesforfastdeepneuralnetworkinferenceinfpgasinhep
_version_ 1718442702561869824