Integrating Machine Learning and Big Data Analytics for Early Disease Detection in U.S. Health Systems
Keywords:
Big Data Analytics, Early Disease Detection, Machine Learning, Multi-modal Data, Predictive ModellingAbstract
This systematic review article explores the use of machine learning and big data analytics in the early disease detection system in the U.S. healthcare systems. The objectives of the study are to know what is being done, assess the level of predictive performance and what are the implementation challenges and enablers. In the search of four large databases over the period of 2014 to 2024, 11 studies that met the rigid inclusion criteria were found in the U.S. setting, clinical validation. The results show that machine learning algorithms have high accuracy of over 90%, and significant success has been demonstrated in neurological, metabolic, and infectious diseases. Predictive performance is improved with multi-modal data integration with imaging, genetic and electronic health record data. Although this delivery of technical results showed that there are very great barriers to translate models into the normal workings of the clinic (data quality, interoperability, model interpretability, absence of external validation, etc.). The majority of the models are at the stage of proving the concept, and this fact creates a significant distance between the development and the practical use. Other acute gaps in patient diversity representation, long-term outcome connections, infrastructure preparedness are also identified in the review. In order to achieve machine learning to its maximum capacity in early diagnosis, one will need to invest in data governance, interdisciplinary implementation teams and continuous monitoring of models. The review identifies that machine learning is a transformative opportunity to proactive healthcare, and strategy should shift toward implementation science, external validation, and equitable use to gain meaningful clinical impact.
Downloads
References
Abdulazeem, H., Whitelaw, S., Schauberger, G., & Klug, S. J. (2023). A systematic review of clinical health conditions predicted by machine learning diagnostic and prognostic models trained or validated using real-world primary health care data. PLOS ONE, 18(9), e0274276. https://doi.org/10.1371/journal.pone.0274276
Al-Dmour, R., Al-Dmour, H., Basheer Amin, E., & Al-Dmour, A. (2025). Impact of AI and big data analytics on healthcare outcomes: An empirical study in Jordanian healthcare institutions. Digital Health, 11, 20552076241311051. https://doi.org/10.1177/20552076241311051
Alhumaidi, N. H., Dermawan, D., Kamaruzaman, H. F., & Alotaiq, N. (2025). The use of machine learning for analyzing real-world data in disease prediction and management: Systematic review. JMIR Medical Informatics, 13(1), e68898. https://doi.org/10.2196/68898
Atkinson, J. G., & Atkinson, E. G. (2023). Machine learning and health care: Potential benefits and issues. The Journal of Ambulatory Care Management, 46(2), 114–120. https://doi.org/10.1097/JAC.0000000000000500
Ball, J. R., Miller, B. T., & Balogh, E. P. (Eds.). (2015). Improving diagnosis in health care. National Academies Press. https://doi.org/10.17226/21794
Benedict, K., Massey, J., Fearon Scales, M., Hennessee, I., Williams, S. L., & Toda, M. (2025, August). Impact of delays in diagnosis on healthcare costs associated with blastomycosis, coccidioidomycosis, and histoplasmosis in a commercially insured population. In Open Forum Infectious Diseases (Vol. 12, No. 8, p. ofaf499). Oxford University Press. https://doi.org/10.1093/ofid/ofaf499
Bramer, W. M., De Jonge, G. B., Rethlefsen, M. L., Mast, F., & Kleijnen, J. (2018). A systematic approach to searching: An efficient and complete method to develop literature searches. Journal of the Medical Library Association: JMLA, 106(4), 531–541. https://doi.org/10.5195/jmla.2018.283
Buell, K. G., Carey, K. A., Dussault, N., Parker, W. F., Dumanian, J., Bhavani, S. V., Gilbert, E. R., Winslow, C. J., Shah, N. S., Afshar, M., Edelson, D. P., & Churpek, M. M. (2024). Development and validation of a machine learning model for early detection of untreated infection. Critical Care Explorations, 6(10), e1165. https://doi.org/10.1097/CCE.0000000000001165
Chandler, J., Cumpston, M., Li, T., Page, M. J., & Welch, V. J. H. W. (2019). Cochrane handbook for systematic reviews of interventions (4th ed., Vol. 1002). Wiley.
Costa, L., Kumar, R., Villarreal-Garza, C., Sinha, S., Saini, S., Semwal, J., ... & Lipton, A. (2024). Diagnostic delays in breast cancer among young women: An emphasis on healthcare providers. The Breast, 73, 103623. https://doi.org/10.1016/j.breast.2023.103623
Covidence. (2021, October 2). How to write a search strategy for your systematic review. https://www.covidence.org/resources/search-strategy/
Falagas, M. E., Pitsouni, E. I., Malietzis, G. A., & Pappas, G. (2008). Comparison of PubMed, Scopus, Web of Science, and Google Scholar: Strengths and weaknesses. The FASEB Journal, 22(2), 338–342. https://doi.org/10.1096/fj.07-9492LSF
Gaffney, A., Woolhandler, S., Himmelstein, D. U., & McCormick, D. (2025). Health care in the USA: Money has become the mission. The Lancet. Advance online publication.
Gao, X. R., Chiariglione, M., Qin, K., Nuytemans, K., Scharre, D. W., Li, Y.-J., & Martin, E. R. (2023). Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer’s disease prediction. Scientific Reports, 13(1), 450. https://doi.org/10.1038/s41598-023-27551-1
Ghassemi, M., Oakden-Rayner, L., & Beam, A. L. (2021). The false hope of current approaches to explainable artificial intelligence in health care. The Lancet Digital Health, 3(11), e745–e750. https://doi.org/10.1016/S2589-7500(21)00208-9
Gusenbauer, M., & Haddaway, N. R. (2020). Which academic search systems are suitable for systematic reviews? Systematic Reviews, 9, Article 1. https://doi.org/10.1186/s13643-020-01301-9
Hafeez, R., Waheed, S., Naqvi, S. A., Maqbool, F., Sarwar, A., Saleem, S., ... & Akhtar, Z. (2025). Deep learning in early Alzheimer’s disease’s detection: A comprehensive survey of classification, segmentation, and feature extraction methods. arXiv. https://arxiv.org/abs/2501.15293
He, J., Baxter, S. L., Xu, J., Xu, J., Zhou, X., & Zhang, K. (2019). The practical implementation of artificial intelligence technologies in medicine. Nature Medicine, 25(1), 30–36. https://doi.org/10.1038/s41591-018-0307-0
Howard, J. (2023, January 31). US spends most on health care but has worst health outcomes among high-income countries, new report finds. CNN. https://www.cnn.com/2023/01/31/health/us-health-care-spending-global-perspective/index.html
IAU LibGuides. (2024, February 4). Systematic literature review: Search strategy. https://iau.libguides.com/slr/search-strategy
James, C., Ranson, J. M., Everson, R., & Llewellyn, D. J. (2021). Performance of machine learning algorithms for predicting progression to dementia in memory clinic patients. JAMA Network Open, 4(12), e2136553. https://doi.org/10.1001/jamanetworkopen.2021.36553
Jung, Y., Park, Y., Jo, J., & Jeong, J. (2025). MMSE-based dementia prediction: Deep vs. traditional models. Life, 15(10), 1544. https://doi.org/10.3390/life15101544
Khan, S., Khan, H. U., & Nazir, S. (2022). Systematic analysis of healthcare big data analytics for efficient care and disease diagnosing. Scientific Reports, 12(1), 22377. https://doi.org/10.1038/s41598-022-26788-6
Kleiman, M. J., Barenholtz, E., Galvin, J. E., & Alzheimer’s Disease Neuroimaging Initiative. (2021). Screening for early-stage Alzheimer’s disease using optimized feature sets and machine learning. Journal of Alzheimer’s Disease, 81(1), 355–366. https://doi.org/10.3233/JAD-201037
Kuehn, B. M. (2021). US health system ranks last among high-income countries. JAMA, 326(11), 999. https://doi.org/10.1001/jama.2021.16874
Kumar, A., Roberts, D., Wood, K. E., Light, B., Parrillo, J. E., Sharma, S., Suppes, R., Feinstein, D., Zanotti, S., Taiberg, L., Gurka, D., Kumar, A., & Cheang, M. (2006). Duration of hypotension before initiation of effective antimicrobial therapy is the critical determinant of survival in human septic shock. Critical Care Medicine, 34(6), 1589–1596. https://doi.org/10.1097/01.CCM.0000217961.75225.E9
Lee-St. John, T. J., Kanwar, O., Abidi, E., El Nekidy, W., & Piechowski-Jozwiak, B. (2024). Towards artificial intelligence-based disease prediction algorithms that comprehensively leverage and continuously learn from real-world clinical tabular data systems. PLOS Digital Health, 3(9), e0000589. https://doi.org/10.1371/journal.pdig.0000589
Mays, N., Pope, C., & Popay, J. (2005). Systematically reviewing qualitative and quantitative evidence to inform management and policy-making in the health field. Journal of Health Services Research & Policy, 10(Suppl 1), 6–20. https://doi.org/10.1258/1355819054308576
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & PRISMA Group. (2010). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. International Journal of Surgery, 8(5), 336–341. https://doi.org/10.1016/j.ijsu.2010.02.007
Nagajyothi, D., & Reddy, C. V. R. (2025). Optimizing dementia prediction: A comparative performance study of ML and DL. Journal of Theoretical and Applied Information Technology, 103(11), 1830–1838.
Nagarajan, I., & Lakshmi Priya, G. G. (2025). A comprehensive review on early detection of Alzheimer’s disease using various deep learning techniques. Frontiers in Computer Science, 6, 1404494. https://doi.org/10.3389/fcomp.2024.1404494
Nair, M., Svedberg, P., Larsson, I., & Nygren, J. M. (2024). A comprehensive overview of barriers and strategies for AI implementation in healthcare: Mixed-method design. PLOS ONE, 19(8), e0305949. https://doi.org/10.1371/journal.pone.0305949
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453. https://doi.org/10.1126/science.aax2342
Obermeyer, Z., Subbaswamy, A., & Saria, S. (2020). Bias and governance frameworks in clinical ML deployment. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 122–127). https://doi.org/10.1145/3375627.3375844
Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. (2016). Rayyan—a web and mobile app for systematic reviews. Systematic Reviews, 5(1), 210. https://doi.org/10.1186/s13643-016-0384-4
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., ... & Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71
Popay, J., Roberts, H., Sowden, A., Petticrew, M., Arai, L., Rodgers, M., Britten, N., Roen, K., & Duffy, S. (2006). Guidance on the conduct of narrative synthesis in systematic reviews. A product from the ESRC Methods Programme. https://www.lancaster.ac.uk/media/lancaster-university/content-assets/documents/fhm/dhr/chir/NSsynthesisguidanceVersion1-April2006.pdf
Preti, L. M., Ardito, V., Compagni, A., Petracca, F., & Cappellaro, G. (2024). Implementation of machine learning applications in health care organizations: Systematic review of empirical studies. Journal of Medical Internet Research, 26, e55897. https://doi.org/10.2196/55897
Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine learning in medicine. New England Journal of Medicine, 380(14), 1347–1358. https://doi.org/10.1056/NEJMra1814259
Rani, S., Kumar, R., Panda, B. S., Kumar, R., Muften, N. F., Abass, M. A., & Lozanović, J. (2025). Machine learning-powered smart healthcare systems in the era of big data: Applications, diagnostic insights, challenges, and ethical implications. Diagnostics, 15(15), 1914. https://doi.org/10.3390/diagnostics15151914
Schwartz, J. L., Tseng, E., Maruthur, N. M., & Rouhizadeh, M. (2022). Identification of prediabetes discussions in unstructured clinical documentation: Validation of a natural language processing algorithm. JMIR Medical Informatics, 10(2), e29803. https://doi.org/10.2196/29803
Stafford, I. S., Gosink, M. M., Mossotto, E., Ennis, S., & Hauben, M. (2022). A systematic review of artificial intelligence and machine learning applications to inflammatory bowel disease, with practical guidelines for interpretation. Inflammatory Bowel Diseases, 28(10), 1573–1583. https://doi.org/10.1093/ibd/izac077
Subbaswamy, A., & Saria, S. (2020). From development to deployment: Dataset shift, causality, and shift-stable models in health AI. Biostatistics, 21(2), 345–352. https://doi.org/10.1093/biostatistics/kxy045
Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25(1), 44–56. https://doi.org/10.1038/s41591-018-0300-7
Wang, Q., & Xu, R. (2023). AANet: Attentive all-level fusion deep neural network approach for multi-modality early Alzheimer’s disease diagnosis. In AMIA Annual Symposium Proceedings (Vol. 2022, p. 1125). American Medical Informatics Association. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337667/
Whiting, P. F., Rutjes, A. W. S., Westwood, M. E., Mallett, S., Deeks, J. J., Reitsma, J. B., Leeflang, M. M., Sterne, J. A., & Bossuyt, P. M. M. (2011). QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies. Annals of Internal Medicine, 155(8), 529–536. https://doi.org/10.7326/0003-4819-155-8-201110180-00009
Wong, A., Otles, E., Donnelly, J. P., Krumm, A., McCullough, J., DeTroyer-Cooley, O., Pestrue, J., Phillips, J., Konye, J., Penoza, C., & Singh, K. (2021). External validation of a widely implemented proprietary sepsis prediction model in a large multihospital system. JAMA Internal Medicine, 181(8), 1065–1070. https://doi.org/10.1001/jamainternmed.2021.2626
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Daniel Abaneme, Benita Chinemerem, Yusuf Kolawole Adebakin, Oladapo Omobayo Aiyenitaju, Albert Darko

This work is licensed under a Creative Commons Attribution 4.0 International License.