Diagnostic Performance of Artificial Intelligence and Radiologists in 3D Automated Breast Ultrasound System (ABUS) Interpretation: A Pilot Study
DOI:
https://doi.org/10.64960/srimedj.v41i3.271311Keywords:
Automated Breast Ultrasound System, benign, malignant, artificial intelligenceAbstract
Background and objectives: Breast cancer is currently the most commonly diagnosed cancer among women worldwide. In 2022, the World Health Organization reported approximately 2.29 million new cases and 666,103 deaths. In Thailand, breast cancer similarly ranks as the most frequently diagnosed cancer in women, with approximately 21,628 new cases and 7,599 deaths. Mammography, the standard screening tool, has limitations in women with extremely dense breast tissue, showing only 61% sensitivity compared to 86% in fatty breasts. 3D Automated Breast Ultrasound System (3D ABUS) is a technology suitable for dense breasts that provides standardized imaging. Artificial intelligence (AI) integration may improve diagnostic accuracy, reduce false positives, and support areas with limited access to breast imaging specialists. This study aimed to evaluate the accuracy, sensitivity and specificity of artificial intelligence in classifying breast cancer risk from 3D ABUS images compared with radiologists of different expertise levels and experience, and to study advantages and limitations of radiologists versus artificial intelligence (AI) in breast cancer screening using automated breast ultrasound (3D ABUS).
Methods: This retrospective study included 50 three-dimensional automated breast ultrasound (3D-ABUS) images from the TDSC-ABUS2023 dataset, without stratification by breast tissue density, comprising 25 benign and 25 malignant cases. Three radiologists participated: a breast imaging specialist (R1), a radiologist with over 15 years of experience (R2), and a radiologist with less than 15 years of experience (R3). Cases 1–10 were designated as a familiarization set to allow radiologists to acclimate to the imaging characteristics and were excluded from the final analysis. The remaining 40 cases (cases 11–50) constituted the evaluation set, in which radiologists were tasked with classifying the lesions as either benign or malignant along with recording the time spent interpreting each case (in seconds) and their level of diagnostic confidence using a confidence score rated on a scale of 1 to 10. R3 analyzed the images both independently and with AI assistance (R3&AI), with an inter-reading washout period of approximately 9 months. 3D Slicer software was used for image analysis. The AI system employed a combined model of Faster R-CNN with ResNet50 for detection, U-net for segmentation, and Radiomics with Machine Learning for classification. Accuracy, sensitivity, specificity, reading time, confidence scores AUC and statistical significance of Diagnostic performance were measured.
Results: In terms of diagnostic accuracy, R1 and AI demonstrated the highest accuracy (72.50%), followed by R3&AI (65.00%), R3 (62.50%), and R2 (60.00%). Of the 40 evaluated cases, 12 cases (30.0%) were correctly classified by all reader groups and AI. For sensitivity, R1 had the highest at 80.00%, followed by R3, R3&AI, and AI (75.00%), and R2 (70.00%). For specificity, AI had the highest at 70.00%, followed by R1 (65.00%), R3&AI (55.00%), and R2 and R3 (50.00%). AUC values for R1 and AI were highest (0.7250), followed by R3&AI (0.6500), R3 (0.6250), and R2 (0.6000). Comparison of diagnostic performance between each group and the AI revealed no statistically significant differences across all metrics (p > 0.05 for all groups). Pairwise comparisons of AUC values between radiologist groups and AI, as well as among radiologist groups themselves, showed no statistically significant differences in any comparison (p > 0.05). The interobserver reliability of confidence scores among radiologists was assessed using the intraclass correlation coefficient (ICC), which was 0.069 (95%CI: -2.543–0.976, p = 0.345) Reading times for R1, R2, R3 and R3&AI were 47.40, 47.05, 53.70, and 41.20 seconds, respectively, R3&AI condition demonstrated 23.28% reduction in reading time and no statistically significant differences (p > 0.05). The AI required an average processing time of 12–15 seconds per case. R3&AI showed the highest confidence score (median = 8.0) and greatest consistency (IQR = 2.0).
Conclusions: AI demonstrated accuracy equivalent to breast imaging specialists and the highest specificity, reducing false positive diagnoses. AI can enhance the performance of less experienced radiologists by improving accuracy, specificity, and confidence. However, breast imaging specialists maintained the highest sensitivity for cancer detection. Furthermore, automation bias was observed in the R3&AI group, where increased confidence scores did not correspond to improved diagnostic accuracy, representing an important caution for clinical AI implementation. This study demonstrates the potential of integrating 3D ABUS with AI in breast cancer screening programs, especially in regions facing a shortage of subspecialized radiologists. However, policy-level implementation warrants further validation through large-scale prospective studies in real-world populations.
References
Ferlay J, Ervik M, Lam F, Laversanne M, Colombet M, Mery L, et al. Global Cancer Observatory: Cancer Today [Internet]. Lyon: International Agency for Research on Cancer; 2024 [cited Apr 15, 2026]. Available from: https://gco.iarc.who.int/today
Mann RM, Athanasiou A, Baltzer PAT, Camps-Herrero J, Clauser P, Fallenberg EM, et al. Breast cancer screening in women with extremely dense breasts recommendations of the European Society of Breast Imaging (EUSOBI). Eur Radiol 2022;32(6):4036-45. doi:10.1007/s00330-022-08617-6.
American College of Radiology. Breast Imaging Reporting & Data System (BI-RADS®) [Internet]. Reston (VA): ACR; 2013 [cited Nov 16, 2024]. Available from: https://www.acr.org/Clinical-Resources/Reporting-and-Data-Systems/Bi-Rads
Dempsey PJ. The history of breast ultrasound. J Ultrasound Med 2004;23(7):887-94. doi:10.7863/jum.2004.23.7.887.
Allajbeu I, EHickman S, Payne N, Moyle P, Taylor K, Sharma N, et al. Automated breast ultrasound: technical aspects, impact on breast screening, and future perspectives. Current Breast Cancer Reports 2021;13:141–50. doi:10.1007/s12609-021-00423-1.
Gómez M, Castilla C, Díaz-Herrero G, Barroso AM, López-Guerra D, González-Cortijo L, et al. Automatic breast ultrasound: state of the art and future perspectives. Ecancermedicalscience 2020;14:1062. doi:10.3332/ecancer.2020.1062.
Kummnaee P, Lertsatittanakorn S, Thongchai P, Chaicharoen P, Fuangrod T. Comparative analysis of stand-alone artificial intelligence for 3D Automated Breast Ultrasound System (ABUS) and standard clinical practice with radiologist in breast cancer screening. In: Proceedings of the 2023 15th Biomedical Engineering International Conference (BMEICON); 2023 Oct 25-27; Chiang Rai, Thailand. New York: IEEE; 2023:1-5. doi:10.1109/BMEiCON60347.2023.10322037.
Goh SSN, Du H, Tan LY, Seah EZY, Lau WK, Ng AHZ, et al. Impact of AI on breast cancer detection rates in mammography by radiologists of varying experience levels in Singapore: preliminary comparative study. JMIR Form Res 2025;9:e66931. doi:10.2196/66931.
Lee JH, Kim KH, Lee EH, An JS, Ryu JK, Park YM, et al. Improving the performance of radiologists using artificial intelligence-based detection support software for mammography: a multi-reader study. Korean J Radiol 2022;23(5):505-16. doi:10.3348/kjr.2021.0476.
Patanawanitkul R, Suvannarerg V, Thiravit S, Muangsomboon K, Korpraphong P. Diagnostic performance of AI-CAD digital mammography for breast cancer: experience from Siriraj Breast Imaging Center. Siriraj Med J 2026;78:145-52.
Allajbeu I, Hickman SE, Payne N, Moyle P, Taylor K, Sharma N, et al. Automated breast ultrasound: technical aspects, impact on breast screening, and future perspectives. Curr Breast Cancer Rep 2021;13:141-50. doi:10.1007/s12609-021-00423-1.
Mohamed ES, Hassanin AM, Refaat Ismail M. In screening breast program: can Automated Breast Ultrasound (ABUS) have a role in the future? Zagazig Univ Med J 2022;28(6):1142-9.
Tan T, Rodriguez-Ruiz A, Zhang T, Xu L, Beets-Tan RGH, Shen Y, et al. Multi-modal artificial intelligence for the combination of automated 3D breast ultrasound and mammograms in a population of women with predominantly dense breasts. Insights Imaging 2023;14(1):10. doi:10.1186/s13244-022-01352-y.
Hosmer DW, Lemeshow S, Sturdivant RX. Applied logistic regression. 3rd ed. Hoboken: John Wiley & Sons; 2013. doi:10.1002/9781118548387.
Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med 2016;15(2):155-63. doi:10.1016/j.jcm.2016.02.012.
Van Winkel SL, Rodríguez-Ruiz A, Appelman L, Gubern-Mérida A, Vreemann S, Bult P, et al. Impact of artificial intelligence support on accuracy and reading time in breast tomosynthesis image interpretation: a multi-reader multi-case study. Eur Radiol 2021;31(11):8682-91. doi:10.1007/s00330-021-07992-w.
Dembrower K, Cripps A, Colón E, Eklund M, Strand F. Artificial intelligence for breast cancer detection in screening mammography in Sweden: a prospective, population-based, paired-reader, non-inferiority study. Lancet Digit Health 2023;5(10):e703-11. doi:10.1016/S2589-7500(23)00153-X.
Najjar R. Redefining radiology: a review of artificial intelligence integration in medical imaging. Diagnostics 2023;13(17):2760. doi:10.3390/diagnostics13172760.
Rajput D, Wang WJ, Chen CC. Evaluation of a decided sample size in machine learning applications. BMC Bioinformatics 2023;24(1):48. doi:10.1186/s12859-023-05156-9.
Muhammad D, Bendechache M. Unveiling the black box: A systematic review of Explainable Artificial Intelligence in medical image analysis. Comput Struct Biotechnol J 2024;24:542-60. doi:10.1016/j.csbj.2024.08.005.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Srinagarind Medical Journal

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
