Integrating Machine Learning and Big Data Analytics for Early Disease Detection in U.S. Health Systems

Daniel  Abaneme; Benita  Chinemerem; Yusuf Kolawole  Adebakin; Oladapo Omobayo  Aiyenitaju; Albert  Darko

Authors

Daniel Abaneme Research Dept at CentraCare, Minnesota, USA
Benita Chinemerem Rensselaer Polytechnic Institute: Troy, New York, US
Yusuf Kolawole Adebakin Department of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana, USA
Oladapo Omobayo Aiyenitaju Coolbet, Tallinn, Estonia
Albert Darko Department of Applied Machine Intelligence, College of Professional Studies, Northeastern University, Portland, ME USA

Keywords:

Big Data Analytics, Early Disease Detection, Machine Learning, Multi-modal Data, Predictive Modelling

Abstract

This systematic review article explores the use of machine learning and big data analytics in the early disease detection system in the U.S. healthcare systems. The objectives of the study are to know what is being done, assess the level of predictive performance and what are the implementation challenges and enablers. In the search of four large databases over the period of 2014 to 2024, 11 studies that met the rigid inclusion criteria were found in the U.S. setting, clinical validation. The results show that machine learning algorithms have high accuracy of over 90%, and significant success has been demonstrated in neurological, metabolic, and infectious diseases. Predictive performance is improved with multi-modal data integration with imaging, genetic and electronic health record data. Although this delivery of technical results showed that there are very great barriers to translate models into the normal workings of the clinic (data quality, interoperability, model interpretability, absence of external validation, etc.). The majority of the models are at the stage of proving the concept, and this fact creates a significant distance between the development and the practical use. Other acute gaps in patient diversity representation, long-term outcome connections, infrastructure preparedness are also identified in the review. In order to achieve machine learning to its maximum capacity in early diagnosis, one will need to invest in data governance, interdisciplinary implementation teams and continuous monitoring of models. The review identifies that machine learning is a transformative opportunity to proactive healthcare, and strategy should shift toward implementation science, external validation, and equitable use to gain meaningful clinical impact.

Downloads

Download data is not yet available.

References

Abdulazeem, H., Whitelaw, S., Schauberger, G., & Klug, S. J. (2023). A systematic review of clinical health conditions predicted by machine learning diagnostic and prognostic models trained or validated using real-world primary health care data. PLOS ONE, 18(9), e0274276. https://doi.org/10.1371/journal.pone.0274276

Al-Dmour, R., Al-Dmour, H., Basheer Amin, E., & Al-Dmour, A. (2025). Impact of AI and big data analytics on healthcare outcomes: An empirical study in Jordanian healthcare institutions. Digital Health, 11, 20552076241311051. https://doi.org/10.1177/20552076241311051

Alhumaidi, N. H., Dermawan, D., Kamaruzaman, H. F., & Alotaiq, N. (2025). The use of machine learning for analyzing real-world data in disease prediction and management: Systematic review. JMIR Medical Informatics, 13(1), e68898. https://doi.org/10.2196/68898

Atkinson, J. G., & Atkinson, E. G. (2023). Machine learning and health care: Potential benefits and issues. The Journal of Ambulatory Care Management, 46(2), 114–120. https://doi.org/10.1097/JAC.0000000000000500

Ball, J. R., Miller, B. T., & Balogh, E. P. (Eds.). (2015). Improving diagnosis in health care. National Academies Press. https://doi.org/10.17226/21794

Benedict, K., Massey, J., Fearon Scales, M., Hennessee, I., Williams, S. L., & Toda, M. (2025, August). Impact of delays in diagnosis on healthcare costs associated with blastomycosis, coccidioidomycosis, and histoplasmosis in a commercially insured population. In Open Forum Infectious Diseases (Vol. 12, No. 8, p. ofaf499). Oxford University Press. https://doi.org/10.1093/ofid/ofaf499

Bramer, W. M., De Jonge, G. B., Rethlefsen, M. L., Mast, F., & Kleijnen, J. (2018). A systematic approach to searching: An efficient and complete method to develop literature searches. Journal of the Medical Library Association: JMLA, 106(4), 531–541. https://doi.org/10.5195/jmla.2018.283

Buell, K. G., Carey, K. A., Dussault, N., Parker, W. F., Dumanian, J., Bhavani, S. V., Gilbert, E. R., Winslow, C. J., Shah, N. S., Afshar, M., Edelson, D. P., & Churpek, M. M. (2024). Development and validation of a machine learning model for early detection of untreated infection. Critical Care Explorations, 6(10), e1165. https://doi.org/10.1097/CCE.0000000000001165

Chandler, J., Cumpston, M., Li, T., Page, M. J., & Welch, V. J. H. W. (2019). Cochrane handbook for systematic reviews of interventions (4th ed., Vol. 1002). Wiley.

Costa, L., Kumar, R., Villarreal-Garza, C., Sinha, S., Saini, S., Semwal, J., ... & Lipton, A. (2024). Diagnostic delays in breast cancer among young women: An emphasis on healthcare providers. The Breast, 73, 103623. https://doi.org/10.1016/j.breast.2023.103623

Covidence. (2021, October 2). How to write a search strategy for your systematic review. https://www.covidence.org/resources/search-strategy/

Falagas, M. E., Pitsouni, E. I., Malietzis, G. A., & Pappas, G. (2008). Comparison of PubMed, Scopus, Web of Science, and Google Scholar: Strengths and weaknesses. The FASEB Journal, 22(2), 338–342. https://doi.org/10.1096/fj.07-9492LSF

Gaffney, A., Woolhandler, S., Himmelstein, D. U., & McCormick, D. (2025). Health care in the USA: Money has become the mission. The Lancet. Advance online publication.

Gao, X. R., Chiariglione, M., Qin, K., Nuytemans, K., Scharre, D. W., Li, Y.-J., & Martin, E. R. (2023). Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer’s disease prediction. Scientific Reports, 13(1), 450. https://doi.org/10.1038/s41598-023-27551-1

Ghassemi, M., Oakden-Rayner, L., & Beam, A. L. (2021). The false hope of current approaches to explainable artificial intelligence in health care. The Lancet Digital Health, 3(11), e745–e750. https://doi.org/10.1016/S2589-7500(21)00208-9

Gusenbauer, M., & Haddaway, N. R. (2020). Which academic search systems are suitable for systematic reviews? Systematic Reviews, 9, Article 1. https://doi.org/10.1186/s13643-020-01301-9

Hafeez, R., Waheed, S., Naqvi, S. A., Maqbool, F., Sarwar, A., Saleem, S., ... & Akhtar, Z. (2025). Deep learning in early Alzheimer’s disease’s detection: A comprehensive survey of classification, segmentation, and feature extraction methods. arXiv. https://arxiv.org/abs/2501.15293

He, J., Baxter, S. L., Xu, J., Xu, J., Zhou, X., & Zhang, K. (2019). The practical implementation of artificial intelligence technologies in medicine. Nature Medicine, 25(1), 30–36. https://doi.org/10.1038/s41591-018-0307-0

Howard, J. (2023, January 31). US spends most on health care but has worst health outcomes among high-income countries, new report finds. CNN. https://www.cnn.com/2023/01/31/health/us-health-care-spending-global-perspective/index.html

IAU LibGuides. (2024, February 4). Systematic literature review: Search strategy. https://iau.libguides.com/slr/search-strategy

James, C., Ranson, J. M., Everson, R., & Llewellyn, D. J. (2021). Performance of machine learning algorithms for predicting progression to dementia in memory clinic patients. JAMA Network Open, 4(12), e2136553. https://doi.org/10.1001/jamanetworkopen.2021.36553

Jung, Y., Park, Y., Jo, J., & Jeong, J. (2025). MMSE-based dementia prediction: Deep vs. traditional models. Life, 15(10), 1544. https://doi.org/10.3390/life15101544

Khan, S., Khan, H. U., & Nazir, S. (2022). Systematic analysis of healthcare big data analytics for efficient care and disease diagnosing. Scientific Reports, 12(1), 22377. https://doi.org/10.1038/s41598-022-26788-6

Kleiman, M. J., Barenholtz, E., Galvin, J. E., & Alzheimer’s Disease Neuroimaging Initiative. (2021). Screening for early-stage Alzheimer’s disease using optimized feature sets and machine learning. Journal of Alzheimer’s Disease, 81(1), 355–366. https://doi.org/10.3233/JAD-201037

Kuehn, B. M. (2021). US health system ranks last among high-income countries. JAMA, 326(11), 999. https://doi.org/10.1001/jama.2021.16874

Kumar, A., Roberts, D., Wood, K. E., Light, B., Parrillo, J. E., Sharma, S., Suppes, R., Feinstein, D., Zanotti, S., Taiberg, L., Gurka, D., Kumar, A., & Cheang, M. (2006). Duration of hypotension before initiation of effective antimicrobial therapy is the critical determinant of survival in human septic shock. Critical Care Medicine, 34(6), 1589–1596. https://doi.org/10.1097/01.CCM.0000217961.75225.E9

Lee-St. John, T. J., Kanwar, O., Abidi, E., El Nekidy, W., & Piechowski-Jozwiak, B. (2024). Towards artificial intelligence-based disease prediction algorithms that comprehensively leverage and continuously learn from real-world clinical tabular data systems. PLOS Digital Health, 3(9), e0000589. https://doi.org/10.1371/journal.pdig.0000589

Mays, N., Pope, C., & Popay, J. (2005). Systematically reviewing qualitative and quantitative evidence to inform management and policy-making in the health field. Journal of Health Services Research & Policy, 10(Suppl 1), 6–20. https://doi.org/10.1258/1355819054308576

Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & PRISMA Group. (2010). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. International Journal of Surgery, 8(5), 336–341. https://doi.org/10.1016/j.ijsu.2010.02.007

Nagajyothi, D., & Reddy, C. V. R. (2025). Optimizing dementia prediction: A comparative performance study of ML and DL. Journal of Theoretical and Applied Information Technology, 103(11), 1830–1838.

Nagarajan, I., & Lakshmi Priya, G. G. (2025). A comprehensive review on early detection of Alzheimer’s disease using various deep learning techniques. Frontiers in Computer Science, 6, 1404494. https://doi.org/10.3389/fcomp.2024.1404494

Nair, M., Svedberg, P., Larsson, I., & Nygren, J. M. (2024). A comprehensive overview of barriers and strategies for AI implementation in healthcare: Mixed-method design. PLOS ONE, 19(8), e0305949. https://doi.org/10.1371/journal.pone.0305949

Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453. https://doi.org/10.1126/science.aax2342

Obermeyer, Z., Subbaswamy, A., & Saria, S. (2020). Bias and governance frameworks in clinical ML deployment. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (pp. 122–127). https://doi.org/10.1145/3375627.3375844

Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. (2016). Rayyan—a web and mobile app for systematic reviews. Systematic Reviews, 5(1), 210. https://doi.org/10.1186/s13643-016-0384-4

Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., ... & Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71

Popay, J., Roberts, H., Sowden, A., Petticrew, M., Arai, L., Rodgers, M., Britten, N., Roen, K., & Duffy, S. (2006). Guidance on the conduct of narrative synthesis in systematic reviews. A product from the ESRC Methods Programme. https://www.lancaster.ac.uk/media/lancaster-university/content-assets/documents/fhm/dhr/chir/NSsynthesisguidanceVersion1-April2006.pdf

Preti, L. M., Ardito, V., Compagni, A., Petracca, F., & Cappellaro, G. (2024). Implementation of machine learning applications in health care organizations: Systematic review of empirical studies. Journal of Medical Internet Research, 26, e55897. https://doi.org/10.2196/55897

Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine learning in medicine. New England Journal of Medicine, 380(14), 1347–1358. https://doi.org/10.1056/NEJMra1814259

Rani, S., Kumar, R., Panda, B. S., Kumar, R., Muften, N. F., Abass, M. A., & Lozanović, J. (2025). Machine learning-powered smart healthcare systems in the era of big data: Applications, diagnostic insights, challenges, and ethical implications. Diagnostics, 15(15), 1914. https://doi.org/10.3390/diagnostics15151914

Schwartz, J. L., Tseng, E., Maruthur, N. M., & Rouhizadeh, M. (2022). Identification of prediabetes discussions in unstructured clinical documentation: Validation of a natural language processing algorithm. JMIR Medical Informatics, 10(2), e29803. https://doi.org/10.2196/29803

Stafford, I. S., Gosink, M. M., Mossotto, E., Ennis, S., & Hauben, M. (2022). A systematic review of artificial intelligence and machine learning applications to inflammatory bowel disease, with practical guidelines for interpretation. Inflammatory Bowel Diseases, 28(10), 1573–1583. https://doi.org/10.1093/ibd/izac077

Subbaswamy, A., & Saria, S. (2020). From development to deployment: Dataset shift, causality, and shift-stable models in health AI. Biostatistics, 21(2), 345–352. https://doi.org/10.1093/biostatistics/kxy045

Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25(1), 44–56. https://doi.org/10.1038/s41591-018-0300-7

Wang, Q., & Xu, R. (2023). AANet: Attentive all-level fusion deep neural network approach for multi-modality early Alzheimer’s disease diagnosis. In AMIA Annual Symposium Proceedings (Vol. 2022, p. 1125). American Medical Informatics Association. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337667/

Whiting, P. F., Rutjes, A. W. S., Westwood, M. E., Mallett, S., Deeks, J. J., Reitsma, J. B., Leeflang, M. M., Sterne, J. A., & Bossuyt, P. M. M. (2011). QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies. Annals of Internal Medicine, 155(8), 529–536. https://doi.org/10.7326/0003-4819-155-8-201110180-00009

Wong, A., Otles, E., Donnelly, J. P., Krumm, A., McCullough, J., DeTroyer-Cooley, O., Pestrue, J., Phillips, J., Konye, J., Penoza, C., & Singh, K. (2021). External validation of a widely implemented proprietary sepsis prediction model in a large multihospital system. JAMA Internal Medicine, 181(8), 1065–1070. https://doi.org/10.1001/jamainternmed.2021.2626

Engineering & Technology

Agricultural Science

Environment & Climate

Business & Economics

Arts & Social Science

Multidisciplinary

Medical Science & Others

Integrating Machine Learning and Big Data Analytics for Early Disease Detection in U.S. Health Systems

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Make a Submission

Information

Latest publications