Integrating RGB and DSM Data for Enhanced Building Segmentation in UAV Images

Main Article Content

Kanokwan Khiewwan
Duangduen Asavasuthirakul
Sutasinee Chimlek

Abstract

Accurate building segmentation in unmanned aerial vehicle (UAV) orthophotos remains a significant challenge due to the visual similarity between buildings and non-target elements such as trees, roads, and background clutter. This study proposes an enhanced segmentation method—referred to as RGB-DSM-IMP (M3)—which integrates RGB imagery, Digital Surface Model (DSM) data, and a novel background removal preprocessing step. The Mask Region-Based Convolutional Neural Network (Mask R-CNN) framework was employed to evaluate three segmentation strategies: a baseline model using only RGB imagery, a second model combining RGB imagery with DSM data, and the proposed model that incorporates both data types along with preprocessing. All models were trained and tested on drone-acquired images representing a variety of building types and environmental conditions. Performance was evaluated using precision, recall, F1-score, average precision (AP), mean intersection over union (mIoU), and mean average precision (mAP). The enhanced model achieved the highest results across all metrics, with an average F1-score of 0.74, mIoU of 0.74, and mAP of 0.63. These findings highlight the benefit of integrating elevation data to enhance spatial differentiation and demonstrate the effectiveness of background removal in reducing misclassifications caused by visually similar objects. In addition, the method maintained a practical inference time per image, supporting its real-world applicability. Overall, the study demonstrates that combining height-based information with strategic preprocessing significantly improves the accuracy and robustness of building segmentation in complex aerial imagery.

Article Details

How to Cite
Khiewwan, K., Asavasuthirakul, D. ., & Chimlek, S. . (2025). Integrating RGB and DSM Data for Enhanced Building Segmentation in UAV Images. CURRENT APPLIED SCIENCE AND TECHNOLOGY, e0265709. https://doi.org/10.55003/cast.2025.265709
Section
Original Research Articles

References

Al-Najjar, H. A. H., Kalantar, B., Pradhan, B., Saeidi, V., Halin, A. A., Ueda, N., & Mansor, S. (2019). Land cover classification from fused DSM and UAV images using convolutional neural networks. Remote Sensing, 11(12), 1-18. https://doi.org/10.3390/rs11121461

Amo-Boateng, M., Sey, N., Amproche, A., & Domfeh, M. (2022). Instance segmentation scheme for roofs in rural areas based on Mask R-CNN. Egyptian Journal of Remote Sensing and Space Science, 25, 569-577.

Boonpook, W., Tan, Y., & Xu, B. (2020). Deep learning-based multi-feature semantic segmentation in building extraction from images of UAV photogrammetry. International Journal of Remote Sensing, 42(1), 1-19. https://doi.org/10.1080/01431161.2020.1788742

Chea, C., Saengprachatanarug, K., Posom, J., Wongphati, M., & Taira, E. (2019). Sugarcane canopy detection using high spatial resolution UAS images and digital surface model. Engineering and Applied Science Research, 46(4), 312-317.

Chen, J., Wang, G., Luo, L., Gong, W., & Cheng, Z. (2021). Building area estimation in drone aerial images based on Mask R-CNN. IEEE Geoscience and Remote Sensing Letters, 18(5), 891-894. https://doi.org/10.1109/LGRS.2020.2988326

Chueprasert, T., Udomchaiporn, A., & Intagosum, S. (2025). Comparative analysis of deep learning models for building extraction from high-resolution satellite imagery. Current Applied Science and Technology, 25(1), Article e0260846. https://doi.org/10.55003/cast.2024.260846

Dawn, K. (2024, November 20). Enhancing image segmentation using U2-Net: An approach to efficient background removal. https://learnopencv.com/u2-net-image-segmentation/

Dombrowski, M., Reynaud, H., Baugh, M., & Kainz, B. (2022). Foreground-background separation through concept distillation from generative image foundation models. https://arxiv.org/pdf/2212.14306

Hu, Z., & Yu, T. (2023). Dynamic spectrum mixer for visual recognition. https://arxiv.org/pdf/2309.06721

Jiang, N., Zhang, J., Li, H., & Lin, X. (2008). Object-oriented building extraction by DSM and very high-resolution orthoimages. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 37, 441-446.

Kamarulzaman, A. M. M., Jaafar, W. S. W, M., Saad, S. N. M., & Mohan, M. (2023). UAV implementations in urban planning and related sectors of rapidly developing nations: A review and future perspectives for Malaysia. Remote Sensing, 15(11), Article 2845. https://doi.org/10.3390/rs15112845

Kampffmeyer, M., Salberg, A. B., & Jenssen, R. (2016). Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1-9). IEEE. https://doi.org/10.1109/CVPRW.2016.105

KC, K., Yin, Z., Li, D., & Wu, Z. (2021). Impacts of background removal on convolutional neural networks for plant disease classification in-situ. Agriculture, 11, Article 827. https://doi.org/10.3390/agriculture11090827

Khiewwan, K., & Asavasuthirakul, D. (2025, March). Building segmentation in drone photos across varied flight altitudes using Mask R-CNN. ICIC Express Letters Part B: Applications, 16(3), 309-316. https://doi.org/10.24507/icicelb.16.03.309

Li, J., Cai, X., & Qi, J. (2021). AMFNet: an attention-based multi-level feature fusion network for ground objects extraction from mining area’s UAV-based RGB images and digital surface model. Journal of Applied Remote Sensing, 15(3), Article 036506. https://doi.org/10.1117/1.JRS.15.036506

Ma, X., Zhang, X., Pun, M.-O., & Huang, B. (2024). MANet: Fine-tuning segment anything model for multimodal remote sensing semantic segmentation. https://arxiv.org/pdf/2410.11160

Müller, D., Soto-Rey, I., & Kramer, F. (2022). Towards a guideline for evaluation metrics in medical image segmentation. BMC Research Notes, 15, Article 210. https://doi.org/10.1186/s13104-022-06096-y

Mungklachaiya, S., & Salaiwarakul, A. (2024). Exploring deep learning features and bag-of-visual-words for scene classification. ICIC Express Letters. Part B: Applications, 15(10), 1081-1088. https://doi.org/10.24507/icicelb.15.10.1081

Ran, X., Xue, L., Zhang, Y., Liu, Z., Sang, X., & He, J. (2019). Rock classification from field image patches analyzed using a deep convolutional neural network. Mathematics, 7, Article 755. https://doi.org/10.3390/math7080755

Rizk, H., Nishimur, Y., Yamaguchi, H., & Higashino, T. (2022). Drone-based water level detection in flood disasters. International Journal of Environmental Research and Public Health, 19(1), Article 237. https://doi.org/10.3390/ijerph19010237

Snyder, B., Kama, S., & ElGalaind, K. (2021, December 13). Distributed Mask R-CNN training with Amazon SageMakerCV. https://aws.amazon.com/blogs/machine-learning/distributed-mask-rcnn-training-with-amazon-sagemakercv

Wang, Q., Yan, L., Sun, Y., Cui, X., Mortimer, H., & Li, Y. (2018). True orthophoto generation using line segment matches. Photogrammetric Record, 33, 113-130.

Wang, Y., Li, S., Teng, F., Lin, Y., Wang, M., & Cai, H. (2022). Improved Mask R-CNN for rural building roof type recognition from UAV high-resolution images: A case study in Hunan Province, China. Remote Sensing, 14(2), Article 265. https://doi.org/10.3390/rs14020265

Yang, C., Rottensteiner, F., & Heipke, C. (2018). Classification of land cover and land use based on convolutional neural networks. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 4(3), 251-258. https://doi.org/10.5194/isprs-annals-IV-3-251-2018

Yi, Y., Zhang, Z., Zhang, W., Zhang, C., Li, W., & Zhao, T. (2019). Semantic segmentation of urban buildings from VHR remote sensing imagery using a deep convolutional neural network. Remote Sensing, 11(15), Article 1774. https://doi.org/10.3390/rs11151774

Zhang, Y., & Liu, Y. (2020). Image segmentation evaluation: A survey of methods. Artificial Intelligence Review, 53(8), 5637-5674.