Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs
Deep convolutional neural networks (DCNNs) have achieved breakthrough performance on bird species identification using a spectrogram of bird vocalization. Aiming at the imbalance of the bird vocalization dataset, a single feature identification model (SFIM) with residual blocks and modified, weighte...
Guardado en:
Autores principales: | , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
MDPI AG
2021
|
Materias: | |
Acceso en línea: | https://doaj.org/article/2ca38a7e0955475892901a44421f3ddc |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
id |
oai:doaj.org-article:2ca38a7e0955475892901a44421f3ddc |
---|---|
record_format |
dspace |
spelling |
oai:doaj.org-article:2ca38a7e0955475892901a44421f3ddc2021-11-25T17:30:21ZBird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs10.3390/e231115071099-4300https://doaj.org/article/2ca38a7e0955475892901a44421f3ddc2021-11-01T00:00:00Zhttps://www.mdpi.com/1099-4300/23/11/1507https://doaj.org/toc/1099-4300Deep convolutional neural networks (DCNNs) have achieved breakthrough performance on bird species identification using a spectrogram of bird vocalization. Aiming at the imbalance of the bird vocalization dataset, a single feature identification model (SFIM) with residual blocks and modified, weighted, cross-entropy function was proposed. To further improve the identification accuracy, two multi-channel fusion methods were built with three SFIMs. One of these fused the outputs of the feature extraction parts of three SFIMs (feature fusion mode), the other fused the outputs of the classifiers of three SFIMs (result fusion mode). The SFIMs were trained with three different kinds of spectrograms, which were calculated through short-time Fourier transform, mel-frequency cepstrum transform and chirplet transform, respectively. To overcome the shortage of the huge number of trainable model parameters, transfer learning was used in the multi-channel models. Using our own vocalization dataset as a sample set, it is found that the result fusion mode model outperforms the other proposed models, the best mean average precision (MAP) reaches 0.914. Choosing three durations of spectrograms, 100 ms, 300 ms and 500 ms for comparison, the results reveal that the 300 ms duration is the best for our own dataset. The duration is suggested to be determined based on the duration distribution of bird syllables. As for the performance with the training dataset of BirdCLEF2019, the highest classification mean average precision (cmAP) reached 0.135, which means the proposed model has certain generalization ability.Feiyu ZhangLuyang ZhangHongxiang ChenJiangjian XieMDPI AGarticlebird vocalizationspectrogram featuremulti-channeldeep convolutional neuralScienceQAstrophysicsQB460-466PhysicsQC1-999ENEntropy, Vol 23, Iss 1507, p 1507 (2021) |
institution |
DOAJ |
collection |
DOAJ |
language |
EN |
topic |
bird vocalization spectrogram feature multi-channel deep convolutional neural Science Q Astrophysics QB460-466 Physics QC1-999 |
spellingShingle |
bird vocalization spectrogram feature multi-channel deep convolutional neural Science Q Astrophysics QB460-466 Physics QC1-999 Feiyu Zhang Luyang Zhang Hongxiang Chen Jiangjian Xie Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs |
description |
Deep convolutional neural networks (DCNNs) have achieved breakthrough performance on bird species identification using a spectrogram of bird vocalization. Aiming at the imbalance of the bird vocalization dataset, a single feature identification model (SFIM) with residual blocks and modified, weighted, cross-entropy function was proposed. To further improve the identification accuracy, two multi-channel fusion methods were built with three SFIMs. One of these fused the outputs of the feature extraction parts of three SFIMs (feature fusion mode), the other fused the outputs of the classifiers of three SFIMs (result fusion mode). The SFIMs were trained with three different kinds of spectrograms, which were calculated through short-time Fourier transform, mel-frequency cepstrum transform and chirplet transform, respectively. To overcome the shortage of the huge number of trainable model parameters, transfer learning was used in the multi-channel models. Using our own vocalization dataset as a sample set, it is found that the result fusion mode model outperforms the other proposed models, the best mean average precision (MAP) reaches 0.914. Choosing three durations of spectrograms, 100 ms, 300 ms and 500 ms for comparison, the results reveal that the 300 ms duration is the best for our own dataset. The duration is suggested to be determined based on the duration distribution of bird syllables. As for the performance with the training dataset of BirdCLEF2019, the highest classification mean average precision (cmAP) reached 0.135, which means the proposed model has certain generalization ability. |
format |
article |
author |
Feiyu Zhang Luyang Zhang Hongxiang Chen Jiangjian Xie |
author_facet |
Feiyu Zhang Luyang Zhang Hongxiang Chen Jiangjian Xie |
author_sort |
Feiyu Zhang |
title |
Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs |
title_short |
Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs |
title_full |
Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs |
title_fullStr |
Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs |
title_full_unstemmed |
Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs |
title_sort |
bird species identification using spectrogram based on multi-channel fusion of dcnns |
publisher |
MDPI AG |
publishDate |
2021 |
url |
https://doaj.org/article/2ca38a7e0955475892901a44421f3ddc |
work_keys_str_mv |
AT feiyuzhang birdspeciesidentificationusingspectrogrambasedonmultichannelfusionofdcnns AT luyangzhang birdspeciesidentificationusingspectrogrambasedonmultichannelfusionofdcnns AT hongxiangchen birdspeciesidentificationusingspectrogrambasedonmultichannelfusionofdcnns AT jiangjianxie birdspeciesidentificationusingspectrogrambasedonmultichannelfusionofdcnns |
_version_ |
1718412274846138368 |