Accuracy and Reliability of AI-Generated Text Detection Tools: A Literature Review

Authors

  • Jezreel Edriene J. Gotoman Department of Information Technology, Cavite State Universtiy, Silang Campus, Biga 1 Silang, Cavite, 4112, Philippines
  • Harenz Lloyd T. Luna Department of Information Technology, Cavite State Universtiy, Silang Campus, Biga 1 Silang, Cavite, 4112, Philippines
  • John Carlo S. Sangria Department of Information Technology, Cavite State Universtiy, Silang Campus, Biga 1 Silang, Cavite, 4112, Philippines
  • Cereneo S. Santiago Jr. Department of Information Technology, Cavite State Universtiy, Silang Campus, Biga 1 Silang, Cavite, 4112, Philippines
  • Danel Dave Barbuco Department of Information Technology, Cavite State Universtiy, Silang Campus, Biga 1 Silang, Cavite, 4112, Philippines

DOI:

https://doi.org/10.54536/ajirb.v4i1.3795

Keywords:

Artificial Intelligence, AI Detection, AI-Generated Text, AI Text Detector, AI Text Accuracy

Abstract

Artificial intelligence has become a significant tool for completing a wide range of tasks, from simple to complex, though its use is subject to various considerations and preferences. This study explored one aspect of the varied usages of artificial intelligence, the AI-generated text (AIGT) detection. This study used a literature review wherein pertinent studies were gathered and selected to discover potential implications. Research objectives were defined to assess the accuracy and reliability of the AI text detectors and identify which AI detectors were evaluated. Three online databases were used to search for relevant literature, of which 34 articles were finalized. Results show that despite most detectors attaining accuracy above 50%, they are unreliable. Paid tools generally perform better than free ones, but there are concerns about bias against non-native English speakers. These tools also struggle with sophisticated AI content and tricks like paraphrasing, so using them carefully and relying on human judgment is important to avoid unfairly discrediting someone’s work. AI-generated text detection technology still has a lot of room for improvement. Users should not rely completely on these tools but rather cooperate with those tools to better find the true writer of a text. Hence, authorities who use these AI detectors should only partially trust these tools, for they are imperfect and can still make mistakes in their judgment.

Downloads

Download data is not yet available.

References

Akram, A. (2023). An empirical study of AI-generated text detection tools. https://doi.org/10.48550/arXiv.2310.01423

Awan, A. A. (2023, May 24). What is text generation? DataCamp. https://www.datacamp.com/blog/what-is-text-generation

Bellini, V., Semeraro, F., Montomoli, J., Cascella, M., & Bignami, E. (2024). Between human and AI: Assessing the reliability of AI text detection tools. Current Medical Research and Opinion, 40(3), 353–358. https://doi.org/10.1080/03007995.2024.2310086

Bhattacharjee, A., & Liu, H. (2023). Fighting fire with fire: Can ChatGPT detect AI-generated text? ACM SIGKDD Explorations Newsletter, 25(2), 14-21. https://doi.org/10.1145/3655103.3655106

Chakraborty, M., Islam Tonmoy, S. M. T., Zaman, S. M. M., Sharma, K., Barman, N. R., Gupta, C., Gautam, S., Kumar, T., Jain, V., Chadha, A., Sheth, A. P., & Das, A. (2023). Counter Turing test CT²: AI-generated text detection is not as easy as you may think—Introducing AI detectability index. In Proceedings of EMNLP 2023 Main. https://doi.org/10.48550/arXiv.2310.05030

Chaka, C. (2024). Reviewing the performance of AI detection tools in differentiating between AI-generated and human-written texts: A literature and integrative hybrid review. Journal of Applied Learning and Teaching, 7(1). https://doi.org/10.37074/jalt.2024.7.1.14

Chaka, C. (2023). Detecting AI content in responses generated by ChatGPT, YouChat, and Chatsonic: The case of five AI content detection tools. Journal of Applied Learning and Teaching, 6(2). https://doi.org/10.37074/jalt.2023.6.2.12

Desaire, H., Chua, A. E., Isom, M., Jarošová, R., & Hua, D. (2023). Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools. Cell Reports Physical Science, 4(6), 101426. https://doi.org/10.1016/j.xcrp.2023.101426

Draxler, F., Werner, A., Lehmann, F., Hoppe, M., Schmidt, A., Buschek, D., & Welsch, R. (2024). The AI ghostwriter effect: When users do not perceive ownership of AI-generated text but self-declare as authors. ACM Transactions on Computer-Human Interaction, 31(2), Article 25, 1-40. https://doi.org/10.1145/3637875

Durach, C. F., Kembro, J., & Wieland, A. (2017). A new paradigm for systematic literature reviews in supply chain management. Journal of Supply Chain Management, 53(4), 67–85. https://doi.org/10.1111/jscm.12145

Elkhatat, A. M., Elsaid, K., & Al-Meer, S. (2023). Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text. International Journal of Educational Integrity, 19, 17. https://doi.org/10.1007/s40979-023-00140-5

Gao, C. A., Howard, F. M., Markov, N. S., Dyer, E. C., Ramesh, S., Luo, Y., & Pearson, A. T. (2023). Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. Nature Digital Medicine, 6, Article 19. https://doi.org/10.1038/s41746-023-00819-6

Ghosal, S. S., Chakraborty, S., Geiping, J., Huang, F., Manocha, D., & Bedi, A. S. (2023). Towards possibilities & impossibilities of AI-generated text detection: A survey. arXiv. https://doi.org/10.48550/arXiv.2310.15264

Guleria, A., Krishan, K., Sharma, V., & Kanchan, T. (2023). ChatGPT: Ethical concerns and challenges in academics and research. The Journal of Infection in Developing Countries, 17(9), 1292-1299 https://doi.org/10.3855/jidc.18738

Ibrahim, K. (2023). Using AI-based detectors to control AI-assisted plagiarism in ESL writing: ‘The Terminator versus the machines’. Language Testing in Asia, 13, 46. https://doi.org/10.1186/s40468-023-00260-2

Ippolito, D., Duckworth, D., Callison-Burch, C., & Eck, D. (2020). Automatic detection of generated text is easiest when humans are fooled. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 1808–1822). https://doi.org/10.18653/v1/2020.acl-main.163

Jawahar, G., Abdul-Mageed, M., & Lakshmanan, L. V. S. (2020). Automatic detection of machine-generated text: A critical survey. In Proceedings of the 28th International Conference on Computational Linguistics (COLING). https://doi.org/10.48550/arXiv.2011.01314

Krishna, K., Song, Y., Karpinska, M., Wieting, J., & Iyyer, M. (2023). Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS ‘23) (pp. 27469-27500).

Kumarage, T., Sheth, P., Moraffah, R., Garland, J., & Liu, H. (2023). How reliable are AI-generated text detectors? An assessment framework using evasive soft prompts. In Proceedings of the EMNLP 2023. https://doi.org/10.48550/arXiv.2310.05095

Ladha, N., Yadav, K., & Rathore, P. (2023). AI-generated content detectors: Boon or bane for scientific writing? Indian Journal of Science and Technology, 16(39), 3435–3439. https://doi.org/10.17485/ijst/v16i39.1632

Majovský, M., Černý, M., Netuka, D., & Mikolov, T. (2024). Perfect detection of computer-generated text faces fundamental challenges. Cell Reports Physical Science, 5(1), 101769. https://doi.org/10.1016/j.xcrp.2023.101769

Needle, F. (2023, November 9). AI detection: How to pinpoint AI-generated text and imagery [+ detection tools]. HubSpot. https://blog.hubspot.com/marketing/ai-detection

Odri, G.-A., & Yoon, D. J. (2023). Detecting generative artificial intelligence in scientific articles: Evasion techniques and implications for scientific integrity. Orthopaedics & Traumatology: Surgery & Research, 109(8), Article 103706. https://doi.org/10.1016/j.otsr.2023.103706

Orenstrakh, M. S., Karnalim, O., Suarez, C. A., & Liut, M. (2023). Detecting LLM-generated text in computing education: A comparative study for ChatGPT cases. arXiv. https://doi.org/10.48550/arXiv.2307.07411

Ormond, J., & Eisgrau, A. (2024). Can we ensure that systems for detecting generative AI are accurate and fair? ACM.org. https://www.acm.org/media-center/2023/october/systems-detecting-generative-ai

Otterbacher, J. (2023). Why technical solutions for detecting AI-generated content in research and education are insufficient. Patterns, 4(7), 100796. https://doi.org/10.1016/j.patter.2023.100796

Pan, W. H., Chok, M. J., Wong, J. L. S., Shin, Y. X., Poon, Y. S., Yang, Z., Chong, C. Y., Lo, D., & Lim, M. K. (2024). Assessing AI detectors in identifying AI-generated code: Implications for education. In ICSE-SEET ‘24: Proceedings of the 46th International Conference on Software Engineering: Software Engineering Education and Training (pp. 1–11). https://doi.org/10.1145/3639474.3640068

Perkins, M., Roe, J., Postma, D., McGaughran, J., & Hickerson, D. (2023). Detection of GPT-4 generated text in higher education: Combining academic judgment and software to identify generative AI tool misuse. Journal of Academic Ethics, 22(89), 89-113. https://doi.org/10.1007/s10805-023-09492-6

Perkins, M., Roe, J., Vu, B. H., Postma, D., Hickerson, D., McGaughran, J., & Khuat, H. Q. (2024). GenAI detection tools, adversarial techniques and implications for inclusivity in higher education. arXiv. https://doi.org/10.48550/arXiv.2403.19148

Pu, J., Sarwar, Z., Abdullah, S. M., Rehman, A., Kim, Y., Bhattacharya, P., Javed, M., & Viswanath, B. (2023). Deepfake text detection: Limitations and opportunities. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP) (pp. 1613-1630). https://doi.org/10.1109/SP46215.2023.10179387

Rashidi, H. H., Fennell, B. D., Albahra, S., Hu, B., & Gorbett, T. (2023). The ChatGPT conundrum: Human-generated scientific manuscripts are misidentified as AI creations by an AI text detection tool. Journal of Pathology Informatics, 14, 100342. https://doi.org/10.1016/j.jpi.2023.100342

Sadasivan, V. S., Kumar, A., Balasubramanian, S., Wang, W., & Feizi, S. (2023). Can AI-generated text be reliably detected? arXiv. https://doi.org/10.48550/arXiv.2303.11156

Šigut, P. (2023). Evaluation of machine-generated text detectors (Undergraduate thesis, Masaryk University). https://is.muni.cz/th/f5y2v/Bachelors_thesis.pdf

Singh, A. (2023). A comparison study on AI language detector. In 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 0489-0493). https://doi.org/10.1109/CCWC57344.2023.10099219

Singh, P., Singh, A. P., Rathi, S., & Vasesi, S. (2023). Unmasking the source: Identifying human vs ChatGPT-generated text through machine learning. In 2023 International Conference on Advanced Computing & Communication Technologies (ICACCTech) (pp. 622-625). https://doi.org/10.1109/ICACCTech61146.2023.00106

Uzun, L. (2023). ChatGPT and academic integrity concerns: Detecting artificial intelligence generated content. Language and Education Technology, 3(1), 45-54.

Walters, W. H. (2023). The effectiveness of software designed to detect AI-generated writing: A comparison of 16 AI text detectors. Open Information Science, 7(1). https://doi.org/10.1515/opis-2022-0158

Weber-Wulff, D., Anohina-Naumeca, A., Bjelobaba, S., et al. (2023). Testing of detection tools for AI-generated text. International Journal of Educational Integrity, 19, 26. https://doi.org/10.1007/s40979-023-00146-z

Wu, H., & Flanagan, T. (2023). The limits of AI content detectors. Journal of Student Research, 12(3). https://doi.org/10.47611/jsrhs.v12i3.5064

York, A. (2024, March 20). 10 best AI detection tools & checkers in 2024. ClickUp. https://clickup.com/blog/ai-detection-tools/

Downloads

Published

2025-02-18

How to Cite

Gotoman, J. E. J., Luna, H. L. T., Sangria, J. C. S., Santiago Jr., C. S., & Barbuco, D. D. (2025). Accuracy and Reliability of AI-Generated Text Detection Tools: A Literature Review. American Journal of IR 4.0 and Beyond, 4(1), 1–9. https://doi.org/10.54536/ajirb.v4i1.3795