Explainable and Bias-Aware AI Models for Clinical Decision Support in U.S.Healthcare Systems

Sudip  Sharma; Kevin Leziga  Giami; Uche Stanley  Chukwuemeka

doi:10.54536/ijphn.v2i1.6988

Authors

Sudip Sharma Morgan State University, Department of Computer Science, Baltimore, Maryland, USA
Kevin Leziga Giami Modinfra Technologies Ltd, England
Uche Stanley Chukwuemeka Prairie View A&M University, USA

DOI:

https://doi.org/10.54536/ijphn.v2i1.6988

Keywords:

Algorithmic Bias and Fairness, Bias Mitigation and Equity Auditing, Clinical Decision Support Systems (CDSS), Explainable Artificial Intelligence (XAI), U.S. Healthcare Deployment and Governance

Abstract

Although AI-enabled clinical decision support systems (CDSS) are becoming more prevalent in U.S. healthcare, inequities and opaque models pose a threat to patient safety and clinician trust. This scoping review mapped evidence on explainable and bias-aware clinical AI systems to inform their equitable deployment. Following the PRISMA-ScR guidelines and a PCC framework, we searched MEDLINE/PubMed and Embase for English peer-reviewed studies published between 2015 and 2025. The full texts of eligible studies were charted across eight domains, including data modality, CDSS use case, model approach, documentation bias, explanation technique, implications for trust and outcomes and mitigation and governance actions. Of the 464 records identified, 18 studies met the inclusion criteria. The evidence spanned imaging (predominantly chest radiography), EHR-based risk prediction and emergency department operational and safety models. The evidence was largely retrospective in nature. Explainability was most defensible when used as a safety audit to support reviewable rationales and detect shortcut learning, typically via feature attribution, perturbation tests, or visualisation. Bias reflected demographic signal leakage, temporal or label leakage, proxy targets, subgroup error disparities, and site confounding factors, which can inflate apparent performance. The most common mitigation strategies combined reweighting or fairness-aware selection, data augmentation, and setting-specific recalibration, but post-deployment monitoring was inconsistently reported. A trustworthy CDSS requires explicit equity objectives, multi-site evaluation, standardised documentation, and continuous surveillance for drift and emergent disparities.

References

Abràmoff, M. D., Lavin, P. T., Birch, M., Shah, N., Folk, J. C., & IDx-DR Study Group. (2018). Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. npj Digital Medicine, 1, Article 39. https://doi.org/10.1038/s41746-018-0040-6

Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., & Kim, B. (2018). Sanity checks for saliency maps. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (pp. 9525–9536). Curran Associates, Inc.

Adleberg, J., Wardeh, A., Doo, F. X., Marinelli, B., Cook, T. S., Mendelson, D. S., & Kagen, A. (2022). Predicting patient demographics from chest radiographs with deep learning. Journal of the American College of Radiology, 19(10), 1151–1161. https://doi.org/10.1016/j.jacr.2022.06.008

Amann, J., Blasimme, A., Vayena, E., Frey, D., Madai, V. I., & Precise4Q Consortium. (2020). Explainability for artificial intelligence in healthcare: A multidisciplinary perspective. BMC Medical Informatics and Decision Making, 20(1), Article 310. https://doi.org/10.1186/s12911-020-01332-6

Beam, A. L., & Kohane, I. S. (2018). Big data and machine learning in health care. JAMA, 319(13), 1317–1318. https://doi.org/10.1001/jama.2017.18391

Briganti, G., & Le Moine, O. (2020). Artificial intelligence in medicine: Today and tomorrow. Frontiers in Medicine, 7, Article 27. https://doi.org/10.3389/fmed.2020.00027

Challen, R., Denny, J., Pitt, M., Gompels, L., Edwards, T., & Tsaneva-Atanasova, K. (2019). Artificial intelligence, bias and clinical safety. BMJ Quality & Safety, 28(3), 231–237. https://doi.org/10.1136/bmjqs-2018-008370

Collins, G. S., Moons, K. G. M., Dhiman, P., Riley, R. D., Beam, A. L., van Calster, B., Ghassemi, M., Liu, X., Reitsma, J. B., van Smeden, M., Boulesteix, A.-L., Camaradou, J.-C., Celi, L. A., Denaxas, S., Denniston, A. K., Glocker, B., Golub, R. M., Harvey, H., Heinze, G., ... Logullo, P. (2024). TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ, 385, e078378. https://doi.org/10.1136/bmj-2023-078378

Cruz Rivera, S., Liu, X., Chan, A.-W., Denniston, A. K., Calvert, M. J., SPIRIT-AI and CONSORT-AI Working Group, SPIRIT-AI and CONSORT-AI Steering Group, & SPIRIT-AI and CONSORT-AI Consensus Group. (2020). Guidelines for clinical trial protocols for interventions involving artificial intelligence: The SPIRIT-AI extension. Nature Medicine, 26(9), 1351–1363. https://doi.org/10.1038/s41591-020-1037-7

Davoudi, A., Sajdeya, R., Ison, R., Hagen, J., Rashidi, P., Price, C. C., & Tighe, P. J. (2023). Fairness in the prediction of acute postoperative pain using machine learning models. Frontiers in Digital Health, 4, Article 970281. https://doi.org/10.3389/fdgth.2022.970281

DeGrave, A. J., Janizek, J. D., & Lee, S.-I. (2021). AI for radiographic COVID-19 detection selects shortcuts over signal. Nature Machine Intelligence, 3, 610–619. https://doi.org/10.1038/s42256-021-00338-7

Fletcher, R. R., Nakeshimana, A., & Olubeko, O. (2021). Addressing fairness, bias, and appropriate use of artificial intelligence and machine learning in global health. Frontiers in Artificial Intelligence, 3, Article 561802. https://doi.org/10.3389/frai.2020.561802

Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2021). Datasheets for datasets. Communications of the ACM, 64(12), 86–92. https://doi.org/10.1145/3458723

Ghassemi, M., Oakden-Rayner, L., & Beam, A. L. (2021). The false hope of current approaches to explainable artificial intelligence in health care. The Lancet Digital Health, 3(11), e745–e750. https://doi.org/10.1016/S2589-7500(21)00208-9

Gichoya, J. W., Banerjee, I., Bhimireddy, A. R., Burns, J. L., Celi, L. A., Chen, L. C., Correa, R., Dullerud, N., Ghassemi, M., Huang, S.-C., Kuo, P.-C., Lungren, M. P., Palmer, L. J., Price, B. J., Purkayastha, S., Pyrros, A. T., Oakden-Rayner, L., Okechukwu, C., Seyyed-Kalantari, L., ... Zhang, H. (2022). AI recognition of patient race in medical imaging: A modelling study. The Lancet Digital Health, 4(6), e406–e414. https://doi.org/10.1016/S2589-7500(22)00063-2

Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G., & King, D. (2019). Key challenges for delivering clinical impact with artificial intelligence. BMC Medicine, 17(1), Article 195. https://doi.org/10.1186/s12916-019-1426-2

Larrazabal, A. J., Nieto, N., Peterson, V., Milone, D. H., & Ferrante, E. (2020). Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proceedings of the National Academy of Sciences of the United States of America, 117(23), 12592–12594. https://doi.org/10.1073/pnas.1919012117

Li, Y., Yao, L., Lee, Y. A., Huang, Y., Merkel, P. A., Vina, E., Yeh, Y.-Y., Li, Y., Allen, J. M., Bian, J., & Guo, J. (2025). A fair machine learning model to predict flares of systemic lupus erythematosus. JAMIA Open, 8(4), ooaf072. https://doi.org/10.1093/jamiaopen/ooaf072

Liu, M., Ning, Y., Ke, Y., Shang, Y., Chakraborty, B., Ong, M. E. H., Vaughan, R., & Liu, N. (2024). FAIM: Fairness-aware interpretable modeling for trustworthy machine learning in healthcare. Patterns, 5(10), 101059. https://doi.org/10.1016/j.patter.2024.101059

Liu, X., Cruz Rivera, S., Moher, D., Calvert, M. J., Denniston, A. K., & CONSORT-AI/SPIRIT-AI Working Group. (2020). Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: The CONSORT-AI extension. Nature Medicine, 26(9), 1364–1374. https://doi.org/10.1038/s41591-020-1034-x

Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 4768–4777). Curran Associates, Inc. https://doi.org/10.48550/arXiv.1705.07874

McLeod, G. A., Stanley, E. A. M., Rosenal, T., Li, M., Kirby, P. A., Karwowska, M., & Forkert, N. D. (2026). Distinct visual biases affect humans and artificial intelligence in medical imaging diagnoses. npj Digital Medicine, 9, Article 62. https://doi.org/10.1038/s41746-025-02226-5

Meng, C., Trinh, L., Xu, N., Enouen, J., & Liu, Y. (2022). Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset. Scientific Reports, 12, Article 7166. https://doi.org/10.1038/s41598-022-11012-2

Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, T. (2019). Model cards for model reporting. In Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency (pp. 220–229). Association for Computing Machinery. https://doi.org/10.1145/3287560.3287596

Moons, K. G. M., Damen, J. A. A., Kaul, T., Hooft, L., Andaur Navarro, C., Dhiman, P., Beam, A. L., van Calster, B., Celi, L. A., Denaxas, S., Denniston, A. K., Ghassemi, M., Heinze, G., Kengne, A. P., Maier-Hein, L., Liu, X., Logullo, P., McCradden, M. D., Liu, N., ... van Smeden, M. (2025). PROBAST+AI: An updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. BMJ, 388, e082505. https://doi.org/10.1136/bmj-2024-082505

Obermeyer, Z., & Emanuel, E. J. (2016). Predicting the future—Big data, machine learning, and clinical medicine. The New England Journal of Medicine, 375(13), 1216–1219. https://doi.org/10.1056/NEJMp1606181

Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453. https://doi.org/10.1126/science.aax2342

Panch, T., Mattie, H., & Atun, R. (2019). Artificial intelligence and algorithmic bias: Implications for health systems. Journal of Global Health, 9(2), 010318. https://doi.org/10.7189/jogh.09.020318

Rajkomar, A., Hardt, M., Howell, M. D., Corrado, G., & Chin, M. H. (2018). Ensuring fairness in machine learning to advance health equity. Annals of Internal Medicine, 169(12), 866–872. https://doi.org/10.7326/M18-1990

Ramadan, B., Liu, M., Burkhart, M. C., Parker, W. F., & Beaulieu-Jones, B. K. (2025). Diagnostic codes in AI prediction models and label leakage of same-admission clinical outcomes. JAMA Network Open, 8(12), e2550454. https://doi.org/10.1001/jamanetworkopen.2025.50454

Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x

Saxena, A., Sharma, S., Johari, P. K., Pandey, A., & Kumar, S. (2025). A fair and interpretable deep learning approach for healthcare access prediction in underserved communities. Discover Artificial Intelligence, 5, Article 185. https://doi.org/10.1007/s44163-025-00425-3

Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J.-F., & Dennison, D. (2015). Hidden technical debt in machine learning systems. In Advances in Neural Information Processing Systems (Vol. 28, pp. 2503–2511). Curran Associates, Inc.

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (pp. 618–626). IEEE. https://doi.org/10.1109/ICCV.2017.74

Sendak, M. P., Gao, M., Brajer, N., & Balu, S. (2020). Presenting machine learning model information to clinical end users with model facts labels. npj Digital Medicine, 3, Article 41. https://doi.org/10.1038/s41746-020-0253-3

Sendak, M. P., Ratliff, W., Sarro, D., Alderton, E., Futoma, J., Gao, M., Nichols, M., Revoir, M., Yashar, F., Miller, C., Kester, K., Sandhu, S., Corey, K., Brajer, N., Tan, C., Lin, A., Brown, T., Engelbosch, S., Anstrom, K., Elish, M. C., ... O’Brien, C. (2020). Real-world integration of a sepsis deep learning technology into routine clinical care: Implementation study. JMIR Medical Informatics, 8(7), e15182. https://doi.org/10.2196/15182

Seyyed-Kalantari, L., Zhang, H., McDermott, M. B. A., Chen, I. Y., & Ghassemi, M. (2021). Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nature Medicine, 27, 2176–2182. https://doi.org/10.1038/s41591-021-01595-0

Slack, D., Hilgard, S., Jia, E., Singh, S., & Lakkaraju, H. (2020). Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 180–186). Association for Computing Machinery. https://doi.org/10.1145/3375627.3375830

Shortliffe, E. H., & Sepúlveda, M. J. (2018). Clinical decision support in the era of artificial intelligence. JAMA, 320(21), 2199–2200. https://doi.org/10.1001/jama.2018.17163

Sutton, R. T., Pincock, D., Baumgart, D. C., Sadowski, D. C., Fedorak, R. N., & Kroeker, K. I. (2020). An overview of clinical decision support systems: Benefits, risks, and strategies for success. npj Digital Medicine, 3, Article 17. https://doi.org/10.1038/s41746-020-0221-y

Tabassi, E. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0) (NIST AI 100-1). National Institute of Standards and Technology. https://doi.org/10.6028/NIST.AI.100-1

Tonekaboni, S., Joshi, S., McCradden, M. D., & Goldenberg, A. (2019). What clinicians want: Contextualizing explainable machine learning for clinical end use. In Proceedings of the Machine Learning for Healthcare Conference (pp. 359–380). PMLR. https://doi.org/10.48550/arXiv.1905.05134

Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25, 44–56. https://doi.org/10.1038/s41591-018-0300-7

Vasey, B., Nagendran, M., Campbell, B., Clifton, D. A., Collins, G. S., Denaxas, S., Denniston, A. K., Faes, L., Geerts, B., Ibrahim, M., Liu, X., Mateen, B. A., Mathur, P., McCradden, M. D., Morgan, L., Ordish, J., Rogers, C., Saria, S., Ting, D. S. W., ... DECIDE-AI expert group. (2022). Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI. Nature Medicine, 28, 924–933. https://doi.org/10.1038/s41591-022-01772-9

Vyas, D. A., Eisenstein, L. G., & Jones, D. S. (2020). Hidden in plain sight—Reconsidering the use of race correction in clinical algorithms. The New England Journal of Medicine, 383(9), 874–882. https://doi.org/10.1056/NEJMms2004740

Wachter, S., Mittelstadt, B., & Russell, C. (2018). Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harvard Journal of Law & Technology, 31(2), 841–887. https://doi.org/10.2139/ssrn.3063289

Wang, H. E., Weiner, J. P., Saria, S., & Kharrazi, H. (2024). Evaluating algorithmic bias in 30-day hospital readmission models: Retrospective analysis. Journal of Medical Internet Research, 26, e47125. https://doi.org/10.2196/47125

Wang, H., Sambamoorthi, N., Hoot, N., Bryant, D., & Sambamoorthi, U. (2025). Evaluating fairness of machine learning prediction of prolonged wait times in emergency department with interpretable eXtreme gradient boosting. PLOS Digital Health, 4(3), e0000751. https://doi.org/10.1371/journal.pdig.0000751

Wang, R., Kuo, P. C., Chen, L. C., Seastedt, K. P., Gichoya, J. W., & Celi, L. A. (2024). Drop the shortcuts: Image augmentation improves fairness and decreases AI detection of race and other demographics from medical images. EBioMedicine, 102, 105047. https://doi.org/10.1016/j.ebiom.2024.105047

Wiens, J., Saria, S., Sendak, M., Ghassemi, M., Liu, V. X., Doshi-Velez, F., Jung, K., Heller, K., Kale, D., Saeed, M., Ossorio, P. N., Thadaney-Israni, S., & Goldenberg, A. (2019). Do no harm: A roadmap for responsible machine learning for health care. Nature Medicine, 25(9), 1337–1340. https://doi.org/10.1038/s41591-019-0548-6

Wong, A. H., Sapre, A. V., Wang, K., Nath, B., Shah, M. N., Kumar, A., Faustino, E. V. S., Desai, S., Hu, J., Robinson, A. L., Meng, Y., Tong, J., Bernstein, S. L., Yonkers, K. A., Melnick, E. R., Dziura, J. D., & Taylor, R. A. (2025). Predicting agitation events in the emergency department through artificial intelligence. JAMA Network Open, 8(5), e258927. https://doi.org/10.1001/jamanetworkopen.2025.8927

Yao, S., Dai, F., Sun, P., Zhang, W., Qian, B., & Lu, H. (2024). Enhancing the fairness of AI prediction models by quasi-Pareto improvement among heterogeneous thyroid nodule population. Nature Communications, 15, Article 1958. https://doi.org/10.1038/s41467-024-44906-y

Yu, K.-H., Beam, A. L., & Kohane, I. S. (2018). Artificial intelligence in healthcare. Nature Biomedical Engineering, 2, 719–731. https://doi.org/10.1038/s41551-018-0305-z

Zander, T., Kendall, M. A., Wolansky, R. L., Grimsley, E. A., Parikh, R., Sujka, J., & Kuo, P. C. (2025). Fairness of machine learning readmission predictions following open ventral hernia repair. Surgical Endoscopy, 39, 5035–5045. https://doi.org/10.1007/s00464-025-11927-7

Zech, J. R., Badgeley, M. A., Liu, M., Costa, A. B., Titano, J. J., & Oermann, E. K. (2018). Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLOS Medicine, 15(11), e1002683. https://doi.org/10.1371/journal.pmed.1002683

Engineering & Technology

Agricultural Science

Environment & Climate

Business & Economics

Arts & Social Science

Multidisciplinary

Medical Science & Others

Explainable and Bias-Aware AI Models for Clinical Decision Support in U.S.Healthcare Systems

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Make a Submission

Information

Latest publications