Frog calling activity detection using lightweight CNN with multi-view spectrogram: A case study on Kroombit tinker frog
Frogs play an important role in ecological systems, while frog species across the globe are threatened and declining. Therefore, it is valuable to estimate the frog population based on an intelligent computer system. Due to the success of deep learning (DL) in various pattern recognition tasks, prev...
Guardado en:
Autores principales: | , , , , , |
---|---|
Formato: | article |
Lenguaje: | EN |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://doaj.org/article/67e7710c4b124390be626699700b55f3 |
Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Sumario: | Frogs play an important role in ecological systems, while frog species across the globe are threatened and declining. Therefore, it is valuable to estimate the frog population based on an intelligent computer system. Due to the success of deep learning (DL) in various pattern recognition tasks, previous studies have used DL-based methods for frog call analysis. However, the performance of DL-based systems is highly affected by their input (feature representation). In this study, we develop a frog calling activity detection system for continuous field recordings using a light convolutional neural network (CNN) with multi-view spectrograms. To be specific, a sliding window is first applied to continuous recordings for obtaining audio segments with a fixed duration. Then, the background noise is filtered out. Next, a multi-view spectrogram is used for characterizing those segments, which has more distinctive information than a single-view spectrogram. Finally, a lightweight CNN model is used for the detection of frog calling activity with a twin loss, where different train and test sets are used to validate the model’s robustness. Our experimental results indicate that the highest macro F1-score was 99.6 ± 0.2 and 96.4 ± 2.0 using 2016 and 2017 as the train data respectively, where CNN-GAP is used as the model with multi-view spectrogram as the input. |
---|