Integrating RGB and DSM Data for Enhanced Building Segmentation in UAV Images
Main Article Content
Abstract
Accurate building segmentation in unmanned aerial vehicle (UAV) orthophotos remains a significant challenge due to the visual similarity between buildings and non-target elements such as trees, roads, and background clutter. This study proposes an enhanced segmentation method—referred to as RGB-DSM-IMP (M3)—which integrates RGB imagery, Digital Surface Model (DSM) data, and a novel background removal preprocessing step. The Mask Region-Based Convolutional Neural Network (Mask R-CNN) framework was employed to evaluate three segmentation strategies: a baseline model using only RGB imagery, a second model combining RGB imagery with DSM data, and the proposed model that incorporates both data types along with preprocessing. All models were trained and tested on drone-acquired images representing a variety of building types and environmental conditions. Performance was evaluated using precision, recall, F1-score, average precision (AP), mean intersection over union (mIoU), and mean average precision (mAP). The enhanced model achieved the highest results across all metrics, with an average F1-score of 0.74, mIoU of 0.74, and mAP of 0.63. These findings highlight the benefit of integrating elevation data to enhance spatial differentiation and demonstrate the effectiveness of background removal in reducing misclassifications caused by visually similar objects. In addition, the method maintained a practical inference time per image, supporting its real-world applicability. Overall, the study demonstrates that combining height-based information with strategic preprocessing significantly improves the accuracy and robustness of building segmentation in complex aerial imagery.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Copyright Transfer Statement
The copyright of this article is transferred to Current Applied Science and Technology journal with effect if and when the article is accepted for publication. The copyright transfer covers the exclusive right to reproduce and distribute the article, including reprints, translations, photographic reproductions, electronic form (offline, online) or any other reproductions of similar nature.
The author warrants that this contribution is original and that he/she has full power to make this grant. The author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors.
Here is the link for download: Copyright transfer form.pdf
References
Al-Najjar, H. A. H., Kalantar, B., Pradhan, B., Saeidi, V., Halin, A. A., Ueda, N., & Mansor, S. (2019). Land cover classification from fused DSM and UAV images using convolutional neural networks. Remote Sensing, 11(12), 1-18. https://doi.org/10.3390/rs11121461
Amo-Boateng, M., Sey, N., Amproche, A., & Domfeh, M. (2022). Instance segmentation scheme for roofs in rural areas based on Mask R-CNN. Egyptian Journal of Remote Sensing and Space Science, 25, 569-577.
Boonpook, W., Tan, Y., & Xu, B. (2020). Deep learning-based multi-feature semantic segmentation in building extraction from images of UAV photogrammetry. International Journal of Remote Sensing, 42(1), 1-19. https://doi.org/10.1080/01431161.2020.1788742
Chea, C., Saengprachatanarug, K., Posom, J., Wongphati, M., & Taira, E. (2019). Sugarcane canopy detection using high spatial resolution UAS images and digital surface model. Engineering and Applied Science Research, 46(4), 312-317.
Chen, J., Wang, G., Luo, L., Gong, W., & Cheng, Z. (2021). Building area estimation in drone aerial images based on Mask R-CNN. IEEE Geoscience and Remote Sensing Letters, 18(5), 891-894. https://doi.org/10.1109/LGRS.2020.2988326
Chueprasert, T., Udomchaiporn, A., & Intagosum, S. (2025). Comparative analysis of deep learning models for building extraction from high-resolution satellite imagery. Current Applied Science and Technology, 25(1), Article e0260846. https://doi.org/10.55003/cast.2024.260846
Dawn, K. (2024, November 20). Enhancing image segmentation using U2-Net: An approach to efficient background removal. https://learnopencv.com/u2-net-image-segmentation/
Dombrowski, M., Reynaud, H., Baugh, M., & Kainz, B. (2022). Foreground-background separation through concept distillation from generative image foundation models. https://arxiv.org/pdf/2212.14306
Hu, Z., & Yu, T. (2023). Dynamic spectrum mixer for visual recognition. https://arxiv.org/pdf/2309.06721
Jiang, N., Zhang, J., Li, H., & Lin, X. (2008). Object-oriented building extraction by DSM and very high-resolution orthoimages. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 37, 441-446.
Kamarulzaman, A. M. M., Jaafar, W. S. W, M., Saad, S. N. M., & Mohan, M. (2023). UAV implementations in urban planning and related sectors of rapidly developing nations: A review and future perspectives for Malaysia. Remote Sensing, 15(11), Article 2845. https://doi.org/10.3390/rs15112845
Kampffmeyer, M., Salberg, A. B., & Jenssen, R. (2016). Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1-9). IEEE. https://doi.org/10.1109/CVPRW.2016.105
KC, K., Yin, Z., Li, D., & Wu, Z. (2021). Impacts of background removal on convolutional neural networks for plant disease classification in-situ. Agriculture, 11, Article 827. https://doi.org/10.3390/agriculture11090827
Khiewwan, K., & Asavasuthirakul, D. (2025, March). Building segmentation in drone photos across varied flight altitudes using Mask R-CNN. ICIC Express Letters Part B: Applications, 16(3), 309-316. https://doi.org/10.24507/icicelb.16.03.309
Li, J., Cai, X., & Qi, J. (2021). AMFNet: an attention-based multi-level feature fusion network for ground objects extraction from mining area’s UAV-based RGB images and digital surface model. Journal of Applied Remote Sensing, 15(3), Article 036506. https://doi.org/10.1117/1.JRS.15.036506
Ma, X., Zhang, X., Pun, M.-O., & Huang, B. (2024). MANet: Fine-tuning segment anything model for multimodal remote sensing semantic segmentation. https://arxiv.org/pdf/2410.11160
Müller, D., Soto-Rey, I., & Kramer, F. (2022). Towards a guideline for evaluation metrics in medical image segmentation. BMC Research Notes, 15, Article 210. https://doi.org/10.1186/s13104-022-06096-y
Mungklachaiya, S., & Salaiwarakul, A. (2024). Exploring deep learning features and bag-of-visual-words for scene classification. ICIC Express Letters. Part B: Applications, 15(10), 1081-1088. https://doi.org/10.24507/icicelb.15.10.1081
Ran, X., Xue, L., Zhang, Y., Liu, Z., Sang, X., & He, J. (2019). Rock classification from field image patches analyzed using a deep convolutional neural network. Mathematics, 7, Article 755. https://doi.org/10.3390/math7080755
Rizk, H., Nishimur, Y., Yamaguchi, H., & Higashino, T. (2022). Drone-based water level detection in flood disasters. International Journal of Environmental Research and Public Health, 19(1), Article 237. https://doi.org/10.3390/ijerph19010237
Snyder, B., Kama, S., & ElGalaind, K. (2021, December 13). Distributed Mask R-CNN training with Amazon SageMakerCV. https://aws.amazon.com/blogs/machine-learning/distributed-mask-rcnn-training-with-amazon-sagemakercv
Wang, Q., Yan, L., Sun, Y., Cui, X., Mortimer, H., & Li, Y. (2018). True orthophoto generation using line segment matches. Photogrammetric Record, 33, 113-130.
Wang, Y., Li, S., Teng, F., Lin, Y., Wang, M., & Cai, H. (2022). Improved Mask R-CNN for rural building roof type recognition from UAV high-resolution images: A case study in Hunan Province, China. Remote Sensing, 14(2), Article 265. https://doi.org/10.3390/rs14020265
Yang, C., Rottensteiner, F., & Heipke, C. (2018). Classification of land cover and land use based on convolutional neural networks. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 4(3), 251-258. https://doi.org/10.5194/isprs-annals-IV-3-251-2018
Yi, Y., Zhang, Z., Zhang, W., Zhang, C., Li, W., & Zhao, T. (2019). Semantic segmentation of urban buildings from VHR remote sensing imagery using a deep convolutional neural network. Remote Sensing, 11(15), Article 1774. https://doi.org/10.3390/rs11151774
Zhang, Y., & Liu, Y. (2020). Image segmentation evaluation: A survey of methods. Artificial Intelligence Review, 53(8), 5637-5674.