Federated Learning for Privacy-Preserving Speaker Recognition

The state-of-the-art speaker recognition systems are usually trained on a single computer using speech data collected from multiple users. However, these speech samples may contain private information which users may not be willing to share. To overcome potential breaches of privacy, we investigate...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Abraham Woubie, Tom Backstrom
Formato: article
Lenguaje:EN
Publicado: IEEE 2021
Materias:
Acceso en línea:https://doaj.org/article/20568ae1fca8458dab50743444594d16
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:20568ae1fca8458dab50743444594d16
record_format dspace
spelling oai:doaj.org-article:20568ae1fca8458dab50743444594d162021-11-18T00:04:06ZFederated Learning for Privacy-Preserving Speaker Recognition2169-353610.1109/ACCESS.2021.3124029https://doaj.org/article/20568ae1fca8458dab50743444594d162021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9592761/https://doaj.org/toc/2169-3536The state-of-the-art speaker recognition systems are usually trained on a single computer using speech data collected from multiple users. However, these speech samples may contain private information which users may not be willing to share. To overcome potential breaches of privacy, we investigate the use of federated learning with and without secure aggregators both for supervised and unsupervised speaker recognition systems. Federated learning enables training of a shared model without sharing private data by training the models on edge devices where the data resides. In the proposed system, each edge device trains an individual model which is subsequently sent to a secure aggregator or directly to the main server. To provide contrasting data without the need for transmitting data, we use a generative adversarial network to generate imposter data at the edge. Afterwards, the secure aggregator or the main server merges the individual models, builds a global model and transmits the global model to the edge devices. Experimental results on Voxceleb-1 dataset show that the use of federated learning both for supervised and unsupervised speaker recognition systems provides two advantages. Firstly, it retains privacy since the raw data does not leave the edge devices. Secondly, experimental results show that the aggregated model provides a better average equal error rate than the individual models when the federated model does not use a secure aggregator. Thus, our results quantify the challenges in practical application of privacy-preserving training of speaker training, in particular in terms of the trade-off between privacy and accuracy.Abraham WoubieTom BackstromIEEEarticleEdge computationfederated learningprivacysecure aggregatorspeaker recognitionElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENIEEE Access, Vol 9, Pp 149477-149485 (2021)
institution DOAJ
collection DOAJ
language EN
topic Edge computation
federated learning
privacy
secure aggregator
speaker recognition
Electrical engineering. Electronics. Nuclear engineering
TK1-9971
spellingShingle Edge computation
federated learning
privacy
secure aggregator
speaker recognition
Electrical engineering. Electronics. Nuclear engineering
TK1-9971
Abraham Woubie
Tom Backstrom
Federated Learning for Privacy-Preserving Speaker Recognition
description The state-of-the-art speaker recognition systems are usually trained on a single computer using speech data collected from multiple users. However, these speech samples may contain private information which users may not be willing to share. To overcome potential breaches of privacy, we investigate the use of federated learning with and without secure aggregators both for supervised and unsupervised speaker recognition systems. Federated learning enables training of a shared model without sharing private data by training the models on edge devices where the data resides. In the proposed system, each edge device trains an individual model which is subsequently sent to a secure aggregator or directly to the main server. To provide contrasting data without the need for transmitting data, we use a generative adversarial network to generate imposter data at the edge. Afterwards, the secure aggregator or the main server merges the individual models, builds a global model and transmits the global model to the edge devices. Experimental results on Voxceleb-1 dataset show that the use of federated learning both for supervised and unsupervised speaker recognition systems provides two advantages. Firstly, it retains privacy since the raw data does not leave the edge devices. Secondly, experimental results show that the aggregated model provides a better average equal error rate than the individual models when the federated model does not use a secure aggregator. Thus, our results quantify the challenges in practical application of privacy-preserving training of speaker training, in particular in terms of the trade-off between privacy and accuracy.
format article
author Abraham Woubie
Tom Backstrom
author_facet Abraham Woubie
Tom Backstrom
author_sort Abraham Woubie
title Federated Learning for Privacy-Preserving Speaker Recognition
title_short Federated Learning for Privacy-Preserving Speaker Recognition
title_full Federated Learning for Privacy-Preserving Speaker Recognition
title_fullStr Federated Learning for Privacy-Preserving Speaker Recognition
title_full_unstemmed Federated Learning for Privacy-Preserving Speaker Recognition
title_sort federated learning for privacy-preserving speaker recognition
publisher IEEE
publishDate 2021
url https://doaj.org/article/20568ae1fca8458dab50743444594d16
work_keys_str_mv AT abrahamwoubie federatedlearningforprivacypreservingspeakerrecognition
AT tombackstrom federatedlearningforprivacypreservingspeakerrecognition
_version_ 1718425211894759424