Building Extraction from Remote Sensing Images with Sparse Token Transformers

Deep learning methods have achieved considerable progress in remote sensing image building extraction. Most building extraction methods are based on Convolutional Neural Networks (CNN). Recently, vision transformers have provided a better perspective for modeling long-range context in images, but us...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Keyan Chen, Zhengxia Zou, Zhenwei Shi
Formato: article
Lenguaje:EN
Publicado: MDPI AG 2021
Materias:
Q
Acceso en línea:https://doaj.org/article/7be1cf5f7cac4b119ac30e1cde47d610
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
id oai:doaj.org-article:7be1cf5f7cac4b119ac30e1cde47d610
record_format dspace
spelling oai:doaj.org-article:7be1cf5f7cac4b119ac30e1cde47d6102021-11-11T18:57:01ZBuilding Extraction from Remote Sensing Images with Sparse Token Transformers10.3390/rs132144412072-4292https://doaj.org/article/7be1cf5f7cac4b119ac30e1cde47d6102021-11-01T00:00:00Zhttps://www.mdpi.com/2072-4292/13/21/4441https://doaj.org/toc/2072-4292Deep learning methods have achieved considerable progress in remote sensing image building extraction. Most building extraction methods are based on Convolutional Neural Networks (CNN). Recently, vision transformers have provided a better perspective for modeling long-range context in images, but usually suffer from high computational complexity and memory usage. In this paper, we explored the potential of using transformers for efficient building extraction. We design an efficient dual-pathway transformer structure that learns the long-term dependency of tokens in both their spatial and channel dimensions and achieves state-of-the-art accuracy on benchmark building extraction datasets. Since single buildings in remote sensing images usually only occupy a very small part of the image pixels, we represent buildings as a set of “sparse” feature vectors in their feature space by introducing a new module called “sparse token sampler”. With such a design, the computational complexity in transformers can be greatly reduced over an order of magnitude. We refer to our method as Sparse Token Transformers (STT). Experiments conducted on the Wuhan University Aerial Building Dataset (WHU) and the Inria Aerial Image Labeling Dataset (INRIA) suggest the effectiveness and efficiency of our method. Compared with some widely used segmentation methods and some state-of-the-art building extraction methods, STT has achieved the best performance with low time cost.Keyan ChenZhengxia ZouZhenwei ShiMDPI AGarticleremote sensing imagesbuilding extractiontransformerssparse token samplerScienceQENRemote Sensing, Vol 13, Iss 4441, p 4441 (2021)
institution DOAJ
collection DOAJ
language EN
topic remote sensing images
building extraction
transformers
sparse token sampler
Science
Q
spellingShingle remote sensing images
building extraction
transformers
sparse token sampler
Science
Q
Keyan Chen
Zhengxia Zou
Zhenwei Shi
Building Extraction from Remote Sensing Images with Sparse Token Transformers
description Deep learning methods have achieved considerable progress in remote sensing image building extraction. Most building extraction methods are based on Convolutional Neural Networks (CNN). Recently, vision transformers have provided a better perspective for modeling long-range context in images, but usually suffer from high computational complexity and memory usage. In this paper, we explored the potential of using transformers for efficient building extraction. We design an efficient dual-pathway transformer structure that learns the long-term dependency of tokens in both their spatial and channel dimensions and achieves state-of-the-art accuracy on benchmark building extraction datasets. Since single buildings in remote sensing images usually only occupy a very small part of the image pixels, we represent buildings as a set of “sparse” feature vectors in their feature space by introducing a new module called “sparse token sampler”. With such a design, the computational complexity in transformers can be greatly reduced over an order of magnitude. We refer to our method as Sparse Token Transformers (STT). Experiments conducted on the Wuhan University Aerial Building Dataset (WHU) and the Inria Aerial Image Labeling Dataset (INRIA) suggest the effectiveness and efficiency of our method. Compared with some widely used segmentation methods and some state-of-the-art building extraction methods, STT has achieved the best performance with low time cost.
format article
author Keyan Chen
Zhengxia Zou
Zhenwei Shi
author_facet Keyan Chen
Zhengxia Zou
Zhenwei Shi
author_sort Keyan Chen
title Building Extraction from Remote Sensing Images with Sparse Token Transformers
title_short Building Extraction from Remote Sensing Images with Sparse Token Transformers
title_full Building Extraction from Remote Sensing Images with Sparse Token Transformers
title_fullStr Building Extraction from Remote Sensing Images with Sparse Token Transformers
title_full_unstemmed Building Extraction from Remote Sensing Images with Sparse Token Transformers
title_sort building extraction from remote sensing images with sparse token transformers
publisher MDPI AG
publishDate 2021
url https://doaj.org/article/7be1cf5f7cac4b119ac30e1cde47d610
work_keys_str_mv AT keyanchen buildingextractionfromremotesensingimageswithsparsetokentransformers
AT zhengxiazou buildingextractionfromremotesensingimageswithsparsetokentransformers
AT zhenweishi buildingextractionfromremotesensingimageswithsparsetokentransformers
_version_ 1718431654076219392