Integration of Artificial Intelligence (AI) into the Data Extraction Phase of a Scoping Review

Authors

  • Paige Maylott
  • Shaminder Dhillon School of Rehabilitation Science, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada
  • Dina Brooks School of Rehabilitation Science, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada
  • Sarah Wojkowski School of Rehabilitation Science, Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada

DOI:

https://doi.org/10.54536/jir.v3i1.3946

Keywords:

Artificial Intelligence, Data Extraction, Large Language, Models, Methodology, Scoping Review

Abstract

This paper describes how artificial intelligence (AI) was used to assist with the data extraction phase of a scoping review, specifically comparing different AI models and the accuracy of AI-assisted data extraction compared to human extraction. Scoping reviews map existing literature on a topic and are useful for complex or under-reviewed subjects. Integrating AI, particularly large language models, can enhance processing speed and data analysis. Three models, ChatGPT 3.5 and -4 (both developed by OpenAI) and Copilot (by Microsoft), were compared to identify the best model for AI-assisted data extraction. Adobe Acrobat Pro’s Optical Character Recognition (OCR) feature and ‘ChatGPT Splitter’ were used to manage image-based content and large sections of data. A custom script was iteratively generated and implemented with the source material. AI-assisted extraction results were compared to text extracted by an independent reviewer. ChatGPT-4 was utilized to enhance efficiency and accuracy of data extraction from 234 sources. While human extraction was more specific with verbatim information, AI was faster and sometimes provided more nuanced understanding, averaging 20 minutes per source compared to one hour for human extraction. ChatGPT-4’s superior text processing capabilities made it the optimal choice. While AI advancements have streamlined data extraction, human oversight remains crucial to ensure accuracy and address biases. This methodology is especially beneficial for smaller research teams and emphasizes the importance of structured prompts and rigorous review. Careful planning and oversight can mitigate risks, ultimately improving the quality and efficiency of the review process.

References

Alshami, A., Elsayed, M., Ali, E., Eltoukhy, A. E., & Zayed, T. (2023). Harnessing the power of CHATGPT for automating systematic review process: Methodology, case study, limitations, and Future Directions. Systems, 11(7), 351. https://doi.org/10.3390/systems11070351

Arksey, H., & O’Malley, L. (2005). Scoping studies: Towards a methodological framework. International Journal of Social Research Methodology, 8(1), 19–32. https://doi.org/10.1080/1364557032000119616

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10.1145/3442188.3445922

Bolaños, F., Salatino, A., Osborne, F., & Motta, E. (2024). Artificial Intelligence for Literature Reviews: Opportunities and challenges. Artificial Intelligence Review, 57(10). https://doi.org/10.1007/s10462-024-10902-3

Chen, L., Zaharia, M., & Zou, J. (2023, October 31). How is CHATGPT’s behavior changing over time? arXiv.org. https://arxiv.org/abs/2307.09009 de la Torre-López, J., Ramírez, A., & Romero, J. R. (2023). Artificial Intelligence to automate the systematic review of scientific literature. Computing, 105(10), 2171–2194. https://doi.org/10.1007/s00607-023-01181-x

Dhillon, S., Roque, M. I., Brooks, D., & Wojkowski, S. (2024). Strategies to increase accessibility for students with disabilities in health professional programs: a scoping review protocol. JBI Evidence Synthesis, 22(12), 2625-2635. https://doi.org/10.11124/jbies-23-00484

Duke, T. (2023). Human-in-the-loop. Building Responsible AI Algorithms, 95–103. https://doi.org/10.1007/978-1-4842-9306-5_6

Foster, I. (2008). Enhancing the learning experience of student radiographers with dyslexia. Radiography, 14(1), 32–38. https://doi.org/10.1016/j.radi.2006.05.004

Frueh, S. (2023, November 6). How AI Is Shaping Scientific Discovery. Nationalacademies.org. https://www.nationalacademies.org/news/2023/11/how-ai-is-shaping-scientific-discovery

Levac, D., Colquhoun, H., & O’Brien, K. K. (2010). Scoping studies: Advancing the methodology. Implementation Science, 5(1). https://doi.org/10.1186/1748-5908-5-69

Nazer, L. H., Zatarah, R., Waldrip, S., Ke, J. X., Moukheiber, M., Khanna, A. K., Hicklen, R. S., Moukheiber, L., Moukheiber, D., Ma, H., & Mathur, P. (2023). Bias in artificial intelligence algorithms and recommendations for mitigation. PLOS Digital Health, 2(6). https://doi.org/10.1371/journal.pdig.0000278

OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., … Zoph, B. (2024, March 4). GPT-4 technical report. arXiv.org. https://doi.org/10.48550/arXiv.2303.08774

Peters, M. D. J., Marnie, C., Tricco, A. C., Pollock, D., Munn, Z., Alexander, L., McInerney, P., Godfrey, C. M., & Khalil, H. (2020). Updated methodological guidance for the conduct of scoping reviews. JBI Evidence Synthesis, 18(10), 2119–2126. https://doi.org/10.11124/jbies-20-00167

Peters, M. D., Marnie, C., Colquhoun, H., Garritty, C. M., Hempel, S., Horsley, T., Langlois, E. V., Lillie, E., O’Brien, K. K., Tunçalp, zge, Wilson, M. G., Zarin, W., & Tricco, A. C. (2021). Scoping reviews: Reinforcing and advancing the methodology and application. Systematic Reviews, 10(1). https://doi.org/10.1186/s13643-021-01821-3

Qlik. (2024). How big data and Ai work together: Synergies & benefits. https://www.qlik.com/us/augmented-analytics/big-data-ai

Russell, S. J., & Norvig, P. (2022). Artificial Intelligence: A modern approach Stuart J. Russell and Peter Norvig; contributing writers: Ming-Wei Chang, Jacob Devlin, Anca Dragan, (4th ed., Ser. global ed.). Pearson Education.

Tricco, A. C., Lillie, E., Zarin, W., O’Brien, K. K., Colquhoun, H., Levac, D., Moher, D., Peters, M. D. J., Horsley, T., Weeks, L., Hempel, S., Akl, E. A., Chang, C., McGowan, J., Stewart, L., Hartling, L., Aldcroft, A., Wilson, M. G., Garritty, C., … Straus, S. E. (2018). Prisma extension for scoping reviews (PRISMA-SCR): Checklist and explanation. Annals of Internal Medicine, 169(7), 467–473. https://doi.org/10.7326/m18-0850

Van Noorden, R., & Perkel, J. M. (2023). AI and science: What 1,600 researchers think. Nature, 621(7980), 672–675. https://doi.org/10.1038/d41586-023-02980-0

Vinuesa, R., Azizpour, H., Leite, I., Balaam, M., Dignum, V., Domisch, S., Felländer, A., Langhans, S. D., Tegmark, M., & Fuso Nerini, F. (2020). The role of Artificial Intelligence in achieving the Sustainable Development Goals. Nature Communications, 11(1). https://doi.org/10.1038/s41467-019-14108-y

Volino, L. R., Allen, S. M., & Gallimore, C. E. (2021). Addressing the challenges of providing accommodations for pharmacy students with disabilities across Learning Environments. American Journal of Pharmaceutical Education, 85(7), 8455. https://doi.org/10.5688/ajpe8455

Wang, X., Huey, S. L., Sheng, R., Mehta, S., & Wang, F. (2024, April 21). SciDaSynth: Interactive structured knowledge extraction and synthesis from scientific literature with large language model. arXiv.org. https://arxiv.org/abs/2404.13765

Downloads

Published

2025-03-25

How to Cite

Maylott, P., Dhillon, S., Brooks, D., & Wojkowski, S. (2025). Integration of Artificial Intelligence (AI) into the Data Extraction Phase of a Scoping Review. Journal of Innovative Research, 3(1), 89–94. https://doi.org/10.54536/jir.v3i1.3946