A Comparison of Deep Learning Architectures for Semantic Mapping of Very High Resolution Images


Semantic mapping of land cover is a key, but challenging, problem in remote sensing. Recent advances in deep learning, especially deep convolutional neural networks (CNNs), have shown outstanding performance in this task. In order to develop refined deep learning pipeline for meeting the rising need for accurate semantic mapping in remote sensing images, this paper study and compare a number of advanced deep learning segmentation architectures, which have obtained state-of-the-art results on computer vision contests like the Pascal VOC. To further analyze and compare the effectiveness of some elaborate layers and underlying structures introduced by these architectures, we evaluate them by re-implementing, train and test them on ISPRS Potsdam dataset. Our results show that a promising performance with overall Fl_score above 87% and mIoU of 79% can be obtained by only using the RGB images, without any post-processing such as conditional random field (CRF) smoothing. At last, we propose several possible approaches to further enhance the deep learning architectures to better deal with high-resolution aerial images. We therefore consider this work to be helpful for the remote sensing research community.