CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme

Number theoretic transform (NTT) is widely utilized to speed up polynomial multiplication, which is the critical computation bottleneck in a lot of cryptographic algorithms like lattice-based post-quantum cryptography (PQC) and homomorphic encryption (HE). One of the tendency for NTT hardware archi...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Xiangren Chen, Bohan Yang, Shouyi Yin, Shaojun Wei, Leibo Liu
Formato:	article
Lenguaje:	EN
Publicado:	Ruhr-Universität Bochum 2021
Materias:	number theoretic transform polynomial multiplication algorithm-hardware co-design radix-4 conflict-free memory mapping scheme Computer engineering. Computer hardware TK7885-7895 Information technology T58.5-58.64
Acceso en línea:	https://doaj.org/article/a86caa24d7eb468896a0318c37509d9b
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

id	oai:doaj.org-article:a86caa24d7eb468896a0318c37509d9b
record_format	dspace
spelling	oai:doaj.org-article:a86caa24d7eb468896a0318c37509d9b2021-11-19T14:36:13ZCFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme10.46586/tches.v2022.i1.94-1262569-2925https://doaj.org/article/a86caa24d7eb468896a0318c37509d9b2021-11-01T00:00:00Zhttps://tches.iacr.org/index.php/TCHES/article/view/9291https://doaj.org/toc/2569-2925 Number theoretic transform (NTT) is widely utilized to speed up polynomial multiplication, which is the critical computation bottleneck in a lot of cryptographic algorithms like lattice-based post-quantum cryptography (PQC) and homomorphic encryption (HE). One of the tendency for NTT hardware architecture is to support diverse security parameters and meet resource constraints on different computing platforms. Thus flexibility and Area-Time Product (ATP) become two crucial metrics in NTT hardware design. The flexibility of NTT in terms of different vector sizes and moduli can be obtained directly. Whereas the varying strides in memory access of in-place NTT render the design for different radix and number of parallel butterfly units a tough problem. This paper proposes an efficient conflict-free memory mapping scheme that supports the configuration for both multiple parallel butterfly units and arbitrary radix of NTT. Compared to other approaches, this scheme owns broader applicability and facilitates the parallelization of non-radix-2 NTT hardware design. Based on this scheme, we propose a scalable radix-2 and radix-4 NTT multiplication architecture by algorithm-hardware co-design. A dedicated schedule method is leveraged to reduce the number of modular additions/subtractions and modular multiplications in radix-4 butterfly unit by 20% and 33%, respectively. To avoid the bit-reversed cost and save memory footprint in arbitrary radix NTT/INTT, we put forward a general method by rearranging the loop structure and reusing the twiddle factors. The hardware-level optimization is achieved by excavating the symmetric operators in radix-4 butterfly unit, which saves almost 50% hardware resources compared to a straightforward implementation. Through experimental results and theoretical analysis, we point out that the radix-4 NTT with the same number of parallel butterfly units outperforms the radix-2 NTT in terms of area-time performance in the interleaved memory system. This advantage is enlarged when increasing the number of parallel butterfly units. For example, when processing 1024 14-bit points NTT with 8 parallel butterfly units, the ATP of LUT/FF/DSP/BRAM n radix-4 NTT core is approximately 2.2 × /1.2 × /1.1 × /1.9 × less than that of the radix-2 NTT core on a similar FPGA platform. Xiangren ChenBohan YangShouyi YinShaojun WeiLeibo LiuRuhr-Universität Bochumarticlenumber theoretic transformpolynomial multiplicationalgorithm-hardware co-designradix-4conflict-free memory mapping schemeComputer engineering. Computer hardwareTK7885-7895Information technologyT58.5-58.64ENTransactions on Cryptographic Hardware and Embedded Systems, Vol 2022, Iss 1 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	number theoretic transform polynomial multiplication algorithm-hardware co-design radix-4 conflict-free memory mapping scheme Computer engineering. Computer hardware TK7885-7895 Information technology T58.5-58.64
spellingShingle	number theoretic transform polynomial multiplication algorithm-hardware co-design radix-4 conflict-free memory mapping scheme Computer engineering. Computer hardware TK7885-7895 Information technology T58.5-58.64 Xiangren Chen Bohan Yang Shouyi Yin Shaojun Wei Leibo Liu CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme
description	Number theoretic transform (NTT) is widely utilized to speed up polynomial multiplication, which is the critical computation bottleneck in a lot of cryptographic algorithms like lattice-based post-quantum cryptography (PQC) and homomorphic encryption (HE). One of the tendency for NTT hardware architecture is to support diverse security parameters and meet resource constraints on different computing platforms. Thus flexibility and Area-Time Product (ATP) become two crucial metrics in NTT hardware design. The flexibility of NTT in terms of different vector sizes and moduli can be obtained directly. Whereas the varying strides in memory access of in-place NTT render the design for different radix and number of parallel butterfly units a tough problem. This paper proposes an efficient conflict-free memory mapping scheme that supports the configuration for both multiple parallel butterfly units and arbitrary radix of NTT. Compared to other approaches, this scheme owns broader applicability and facilitates the parallelization of non-radix-2 NTT hardware design. Based on this scheme, we propose a scalable radix-2 and radix-4 NTT multiplication architecture by algorithm-hardware co-design. A dedicated schedule method is leveraged to reduce the number of modular additions/subtractions and modular multiplications in radix-4 butterfly unit by 20% and 33%, respectively. To avoid the bit-reversed cost and save memory footprint in arbitrary radix NTT/INTT, we put forward a general method by rearranging the loop structure and reusing the twiddle factors. The hardware-level optimization is achieved by excavating the symmetric operators in radix-4 butterfly unit, which saves almost 50% hardware resources compared to a straightforward implementation. Through experimental results and theoretical analysis, we point out that the radix-4 NTT with the same number of parallel butterfly units outperforms the radix-2 NTT in terms of area-time performance in the interleaved memory system. This advantage is enlarged when increasing the number of parallel butterfly units. For example, when processing 1024 14-bit points NTT with 8 parallel butterfly units, the ATP of LUT/FF/DSP/BRAM n radix-4 NTT core is approximately 2.2 × /1.2 × /1.1 × /1.9 × less than that of the radix-2 NTT core on a similar FPGA platform.
format	article
author	Xiangren Chen Bohan Yang Shouyi Yin Shaojun Wei Leibo Liu
author_facet	Xiangren Chen Bohan Yang Shouyi Yin Shaojun Wei Leibo Liu
author_sort	Xiangren Chen
title	CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme
title_short	CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme
title_full	CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme
title_fullStr	CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme
title_full_unstemmed	CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme
title_sort	cfntt: scalable radix-2/4 ntt multiplication architecture with an efficient conflict-free memory mapping scheme
publisher	Ruhr-Universität Bochum
publishDate	2021
url	https://doaj.org/article/a86caa24d7eb468896a0318c37509d9b
work_keys_str_mv	AT xiangrenchen cfnttscalableradix24nttmultiplicationarchitecturewithanefficientconflictfreememorymappingscheme AT bohanyang cfnttscalableradix24nttmultiplicationarchitecturewithanefficientconflictfreememorymappingscheme AT shouyiyin cfnttscalableradix24nttmultiplicationarchitecturewithanefficientconflictfreememorymappingscheme AT shaojunwei cfnttscalableradix24nttmultiplicationarchitecturewithanefficientconflictfreememorymappingscheme AT leiboliu cfnttscalableradix24nttmultiplicationarchitecturewithanefficientconflictfreememorymappingscheme
_version_	1718420088961368064

CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme

Ejemplares similares