Memory-Augmented Transformer for Remote Sensing Image Semantic Segmentation

The semantic segmentation of remote sensing images requires distinguishing local regions of different classes and exploiting a uniform global representation of the same-class instances. Such requirements make it necessary for the segmentation methods to extract discriminative local features between...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Xin Zhao, Jiayi Guo, Yueting Zhang, Yirong Wu
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Q
Acceso en línea:https://doaj.org/article/95a9750f55f84f13a6ae12e9424ee1a0
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:95a9750f55f84f13a6ae12e9424ee1a0
record_format dspace
spelling oai:doaj.org-article:95a9750f55f84f13a6ae12e9424ee1a02021-11-25T18:53:51ZMemory-Augmented Transformer for Remote Sensing Image Semantic Segmentation10.3390/rs132245182072-4292https://doaj.org/article/95a9750f55f84f13a6ae12e9424ee1a02021-11-01T00:00:00Zhttps://www.mdpi.com/2072-4292/13/22/4518https://doaj.org/toc/2072-4292The semantic segmentation of remote sensing images requires distinguishing local regions of different classes and exploiting a uniform global representation of the same-class instances. Such requirements make it necessary for the segmentation methods to extract discriminative local features between different classes and to explore representative features for all instances of a given class. While common deep convolutional neural networks (DCNNs) can effectively focus on local features, they are limited by their receptive field to obtain consistent global information. In this paper, we propose a memory-augmented transformer (MAT) to effectively model both the local and global information. The feature extraction pipeline of the MAT is split into a memory-based global relationship guidance module and a local feature extraction module. The local feature extraction module mainly consists of a transformer, which is used to extract features from the input images. The global relationship guidance module maintains a memory bank for the consistent encoding of the global information. Global guidance is performed by memory interaction. Bidirectional information flow between the global and local branches is conducted by a memory-query module, as well as a memory-update module, respectively. Experiment results on the ISPRS Potsdam and ISPRS Vaihingen datasets demonstrated that our method can perform competitively with state-of-the-art methods.Xin ZhaoJiayi GuoYueting ZhangYirong WuMDPI AGarticlesemantic segmentationremote sensing imagerymemory-augmented transformermemory mechanismself-attentionScienceQENRemote Sensing, Vol 13, Iss 4518, p 4518 (2021)
institution DOAJ
collection DOAJ
language EN
topic semantic segmentation
remote sensing imagery
memory-augmented transformer
memory mechanism
self-attention
Science
Q
spellingShingle semantic segmentation
remote sensing imagery
memory-augmented transformer
memory mechanism
self-attention
Science
Q
Xin Zhao
Jiayi Guo
Yueting Zhang
Yirong Wu
Memory-Augmented Transformer for Remote Sensing Image Semantic Segmentation
description The semantic segmentation of remote sensing images requires distinguishing local regions of different classes and exploiting a uniform global representation of the same-class instances. Such requirements make it necessary for the segmentation methods to extract discriminative local features between different classes and to explore representative features for all instances of a given class. While common deep convolutional neural networks (DCNNs) can effectively focus on local features, they are limited by their receptive field to obtain consistent global information. In this paper, we propose a memory-augmented transformer (MAT) to effectively model both the local and global information. The feature extraction pipeline of the MAT is split into a memory-based global relationship guidance module and a local feature extraction module. The local feature extraction module mainly consists of a transformer, which is used to extract features from the input images. The global relationship guidance module maintains a memory bank for the consistent encoding of the global information. Global guidance is performed by memory interaction. Bidirectional information flow between the global and local branches is conducted by a memory-query module, as well as a memory-update module, respectively. Experiment results on the ISPRS Potsdam and ISPRS Vaihingen datasets demonstrated that our method can perform competitively with state-of-the-art methods.
format article
author Xin Zhao
Jiayi Guo
Yueting Zhang
Yirong Wu
author_facet Xin Zhao
Jiayi Guo
Yueting Zhang
Yirong Wu
author_sort Xin Zhao
title Memory-Augmented Transformer for Remote Sensing Image Semantic Segmentation
title_short Memory-Augmented Transformer for Remote Sensing Image Semantic Segmentation
title_full Memory-Augmented Transformer for Remote Sensing Image Semantic Segmentation
title_fullStr Memory-Augmented Transformer for Remote Sensing Image Semantic Segmentation
title_full_unstemmed Memory-Augmented Transformer for Remote Sensing Image Semantic Segmentation
title_sort memory-augmented transformer for remote sensing image semantic segmentation
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/95a9750f55f84f13a6ae12e9424ee1a0
work_keys_str_mv AT xinzhao memoryaugmentedtransformerforremotesensingimagesemanticsegmentation
AT jiayiguo memoryaugmentedtransformerforremotesensingimagesemanticsegmentation
AT yuetingzhang memoryaugmentedtransformerforremotesensingimagesemanticsegmentation
AT yirongwu memoryaugmentedtransformerforremotesensingimagesemanticsegmentation
_version_ 1718410602937843712