Deep convolution stack for waveform in underwater acoustic target recognition

Abstract In underwater acoustic target recognition, deep learning methods have been proved to be effective on recognizing original signal waveform. Previous methods often utilize large convolutional kernels to extract features at the beginning of neural networks. It leads to a lack of depth and stru...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Shengzhao Tian, Duanbing Chen, Hang Wang, Jingfa Liu
Formato: article
Lenguaje:EN
Publicado: Nature Portfolio 2021
Materias:
R
Q
Acceso en línea:https://doaj.org/article/0a2e8f2785584d68a2ccc8d0c380e1cc
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Sumario:Abstract In underwater acoustic target recognition, deep learning methods have been proved to be effective on recognizing original signal waveform. Previous methods often utilize large convolutional kernels to extract features at the beginning of neural networks. It leads to a lack of depth and structural imbalance of networks. The power of nonlinear transformation brought by deep network has not been fully utilized. Deep convolution stack is a kind of network frame with flexible and balanced structure and it has not been explored well in underwater acoustic target recognition, even though such frame has been proven to be effective in other deep learning fields. In this paper, a multiscale residual unit (MSRU) is proposed to construct deep convolution stack network. Based on MSRU, a multiscale residual deep neural network (MSRDN) is presented to classify underwater acoustic target. Dataset acquired in a real-world scenario is used to verify the proposed unit and model. By adding MSRU into Generative Adversarial Networks, the validity of MSRU is proved. Finally, MSRDN achieves the best recognition accuracy of 83.15%, improved by 6.99% from the structure related networks which take the original signal waveform as input and 4.48% from the networks which take the time-frequency representation as input.