Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments

The presented paper is concerned with detection of presentation attacks against unsupervised remote biometric speaker verification, using a well-known challenge–response scheme. We propose a novel approach to convolutional phoneme classifier training, which ensures high phoneme recognition accuracy...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kacper Kubicki, Paweł Kapusta, Krzysztof Ślot
Format:	article
Language:	EN
Published:	MDPI AG 2021
Subjects:	biometrics presentation attack detection mel-spectrogram phoneme classification deep neural networks Chemical technology TP1-1185
Online Access:	https://doaj.org/article/112f95b6c1a243e887b9bd989ee67fc9
Tags:	Add Tag No Tags, Be the first to tag this record!

id	oai:doaj.org-article:112f95b6c1a243e887b9bd989ee67fc9
record_format	dspace
spelling	oai:doaj.org-article:112f95b6c1a243e887b9bd989ee67fc92021-11-25T18:58:49ZPresentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments10.3390/s212277281424-8220https://doaj.org/article/112f95b6c1a243e887b9bd989ee67fc92021-11-01T00:00:00Zhttps://www.mdpi.com/1424-8220/21/22/7728https://doaj.org/toc/1424-8220The presented paper is concerned with detection of presentation attacks against unsupervised remote biometric speaker verification, using a well-known challenge–response scheme. We propose a novel approach to convolutional phoneme classifier training, which ensures high phoneme recognition accuracy even for significantly simplified network architectures, thus enabling efficient utterance verification on resource-limited hardware, such as mobile phones or embedded devices. We consider Deep Convolutional Neural Networks operating on windows of speech Mel-Spectrograms as a means for phoneme recognition, and we show that one can boost the performance of highly simplified neural architectures by modifying the principle underlying training set construction. Instead of generating training examples by slicing spectrograms using a sliding window, as it is commonly done, we propose to maximize the consistency of phoneme-related spectrogram structures that are to be learned, by choosing only spectrogram chunks from the central regions of phoneme articulation intervals. This approach enables better utilization of the limited capacity of the considered simplified networks, as it significantly reduces a within-class data scatter. We show that neural architectures comprising as few as dozens of thousands parameters can successfully—with accuracy of up to 76%, solve the 39-phoneme recognition task (we use the English language TIMIT database for experimental verification of the method). We also show that ensembling of simple classifiers, using a basic bagging method, boosts the recognition accuracy by another 2–3%, offering Phoneme Error Rates at the level of 23%, which approaches the accuracy of the state-of-the-art deep neural architectures that are one to two orders of magnitude more complex than the proposed solution. This, in turn, enables executing reliable presentation attack detection, based on just few-syllable long challenges on highly resource-limited computing hardware.Kacper KubickiPaweł KapustaKrzysztof ŚlotMDPI AGarticlebiometricspresentation attack detectionmel-spectrogramphoneme classificationdeep neural networksChemical technologyTP1-1185ENSensors, Vol 21, Iss 7728, p 7728 (2021)
institution	DOAJ
collection	DOAJ
language	EN
topic	biometrics presentation attack detection mel-spectrogram phoneme classification deep neural networks Chemical technology TP1-1185
spellingShingle	biometrics presentation attack detection mel-spectrogram phoneme classification deep neural networks Chemical technology TP1-1185 Kacper Kubicki Paweł Kapusta Krzysztof Ślot Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments
description	The presented paper is concerned with detection of presentation attacks against unsupervised remote biometric speaker verification, using a well-known challenge–response scheme. We propose a novel approach to convolutional phoneme classifier training, which ensures high phoneme recognition accuracy even for significantly simplified network architectures, thus enabling efficient utterance verification on resource-limited hardware, such as mobile phones or embedded devices. We consider Deep Convolutional Neural Networks operating on windows of speech Mel-Spectrograms as a means for phoneme recognition, and we show that one can boost the performance of highly simplified neural architectures by modifying the principle underlying training set construction. Instead of generating training examples by slicing spectrograms using a sliding window, as it is commonly done, we propose to maximize the consistency of phoneme-related spectrogram structures that are to be learned, by choosing only spectrogram chunks from the central regions of phoneme articulation intervals. This approach enables better utilization of the limited capacity of the considered simplified networks, as it significantly reduces a within-class data scatter. We show that neural architectures comprising as few as dozens of thousands parameters can successfully—with accuracy of up to 76%, solve the 39-phoneme recognition task (we use the English language TIMIT database for experimental verification of the method). We also show that ensembling of simple classifiers, using a basic bagging method, boosts the recognition accuracy by another 2–3%, offering Phoneme Error Rates at the level of 23%, which approaches the accuracy of the state-of-the-art deep neural architectures that are one to two orders of magnitude more complex than the proposed solution. This, in turn, enables executing reliable presentation attack detection, based on just few-syllable long challenges on highly resource-limited computing hardware.
format	article
author	Kacper Kubicki Paweł Kapusta Krzysztof Ślot
author_facet	Kacper Kubicki Paweł Kapusta Krzysztof Ślot
author_sort	Kacper Kubicki
title	Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments
title_short	Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments
title_full	Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments
title_fullStr	Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments
title_full_unstemmed	Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments
title_sort	presentation attack detection on limited-resource devices using deep neural classifiers trained on consistent spectrogram fragments
publisher	MDPI AG
publishDate	2021
url	https://doaj.org/article/112f95b6c1a243e887b9bd989ee67fc9
work_keys_str_mv	AT kacperkubicki presentationattackdetectiononlimitedresourcedevicesusingdeepneuralclassifierstrainedonconsistentspectrogramfragments AT pawełkapusta presentationattackdetectiononlimitedresourcedevicesusingdeepneuralclassifierstrainedonconsistentspectrogramfragments AT krzysztofslot presentationattackdetectiononlimitedresourcedevicesusingdeepneuralclassifierstrainedonconsistentspectrogramfragments
_version_	1718410446944337920

Presentation Attack Detection on Limited-Resource Devices Using Deep Neural Classifiers Trained on Consistent Spectrogram Fragments

Similar Items