Federated Learning for Privacy-Preserving Speaker Recognition
The state-of-the-art speaker recognition systems are usually trained on a single computer using speech data collected from multiple users. However, these speech samples may contain private information which users may not be willing to share. To overcome potential breaches of privacy, we investigate...
Guardado en:
Autores principales: | , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
IEEE
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/20568ae1fca8458dab50743444594d16 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:20568ae1fca8458dab50743444594d16 |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:20568ae1fca8458dab50743444594d162021-11-18T00:04:06ZFederated Learning for Privacy-Preserving Speaker Recognition2169-353610.1109/ACCESS.2021.3124029https://doaj.org/article/20568ae1fca8458dab50743444594d162021-01-01T00:00:00Zhttps://ieeexplore.ieee.org/document/9592761/https://doaj.org/toc/2169-3536The state-of-the-art speaker recognition systems are usually trained on a single computer using speech data collected from multiple users. However, these speech samples may contain private information which users may not be willing to share. To overcome potential breaches of privacy, we investigate the use of federated learning with and without secure aggregators both for supervised and unsupervised speaker recognition systems. Federated learning enables training of a shared model without sharing private data by training the models on edge devices where the data resides. In the proposed system, each edge device trains an individual model which is subsequently sent to a secure aggregator or directly to the main server. To provide contrasting data without the need for transmitting data, we use a generative adversarial network to generate imposter data at the edge. Afterwards, the secure aggregator or the main server merges the individual models, builds a global model and transmits the global model to the edge devices. Experimental results on Voxceleb-1 dataset show that the use of federated learning both for supervised and unsupervised speaker recognition systems provides two advantages. Firstly, it retains privacy since the raw data does not leave the edge devices. Secondly, experimental results show that the aggregated model provides a better average equal error rate than the individual models when the federated model does not use a secure aggregator. Thus, our results quantify the challenges in practical application of privacy-preserving training of speaker training, in particular in terms of the trade-off between privacy and accuracy.Abraham WoubieTom BackstromIEEEarticleEdge computationfederated learningprivacysecure aggregatorspeaker recognitionElectrical engineering. Electronics. Nuclear engineeringTK1-9971ENIEEE Access, Vol 9, Pp 149477-149485 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
Edge computation federated learning privacy secure aggregator speaker recognition Electrical engineering. Electronics. Nuclear engineering TK1-9971 |
spellingShingle |
Edge computation federated learning privacy secure aggregator speaker recognition Electrical engineering. Electronics. Nuclear engineering TK1-9971 Abraham Woubie Tom Backstrom Federated Learning for Privacy-Preserving Speaker Recognition |
description |
The state-of-the-art speaker recognition systems are usually trained on a single computer using speech data collected from multiple users. However, these speech samples may contain private information which users may not be willing to share. To overcome potential breaches of privacy, we investigate the use of federated learning with and without secure aggregators both for supervised and unsupervised speaker recognition systems. Federated learning enables training of a shared model without sharing private data by training the models on edge devices where the data resides. In the proposed system, each edge device trains an individual model which is subsequently sent to a secure aggregator or directly to the main server. To provide contrasting data without the need for transmitting data, we use a generative adversarial network to generate imposter data at the edge. Afterwards, the secure aggregator or the main server merges the individual models, builds a global model and transmits the global model to the edge devices. Experimental results on Voxceleb-1 dataset show that the use of federated learning both for supervised and unsupervised speaker recognition systems provides two advantages. Firstly, it retains privacy since the raw data does not leave the edge devices. Secondly, experimental results show that the aggregated model provides a better average equal error rate than the individual models when the federated model does not use a secure aggregator. Thus, our results quantify the challenges in practical application of privacy-preserving training of speaker training, in particular in terms of the trade-off between privacy and accuracy. |
format |
article |
author |
Abraham Woubie Tom Backstrom |
author_facet |
Abraham Woubie Tom Backstrom |
author_sort |
Abraham Woubie |
title |
Federated Learning for Privacy-Preserving Speaker Recognition |
title_short |
Federated Learning for Privacy-Preserving Speaker Recognition |
title_full |
Federated Learning for Privacy-Preserving Speaker Recognition |
title_fullStr |
Federated Learning for Privacy-Preserving Speaker Recognition |
title_full_unstemmed |
Federated Learning for Privacy-Preserving Speaker Recognition |
title_sort |
federated learning for privacy-preserving speaker recognition |
publisher |
IEEE |
publishDate |
2021 |
url |
https://doaj.org/article/20568ae1fca8458dab50743444594d16 |
work_keys_str_mv |
AT abrahamwoubie federatedlearningforprivacypreservingspeakerrecognition AT tombackstrom federatedlearningforprivacypreservingspeakerrecognition |
_version_ |
1718425211894759424 |