So Many Opinions, So Many LLMs: Comparing Large Language Models to Traditional Machine Learning for Open- Ended Survey Analysis

Authors

  • Abdullah Akinde Austin Peay State University, United States
  • Mariam Akinde Austin Peay State University, United States
  • Rasheedat Emiola Austin Peay State University, United States
  • Ahmed Akinsola Austin Peay State University, United States

DOI:

https://doi.org/10.54536/ari.v4i1.6586

Keywords:

Computational Social Science, Large Language Models (Llms), Nsse Open-Ended Survey Data, Qualitative Data Analysis, Sentiment Analysis

Abstract

Open-ended surveys offer valuable insights, but they are notoriously difficult to analyze at scale. Building on previous work that employed traditional machine learning to classify text (“So Many Responses, So Little Time: A Machine-Learning Approach to Analyzing Open-Ended Survey Data”), this study investigates how different large language models (LLMs) understand and analyze NSSE open-ended survey responses. We focus on several cutting-edge LLMSs, including OpenAI’s GPT-4-Turbo, Claude 3.5, Meta’s LLaMA 3 (70B), and Perplexity Sonar Pro, and compare their performance to the previous machine learning models in tasks like sentiment analysis and thematic classification. 
Our research analysis assesses model agreement, classification accuracy, and interpretability of reasoning. The findings reveal that current LLMs routinely beat classic machine learning models in classification accuracy, particularly in understanding complex mood and theme patterns in student replies. While LLMs have superior accuracy, they differ greatly in how explicitly and consistently they justify their predictions and apply category boundaries. These distinctions highlight crucial trade-offs when using LLMs for qualitative analysis: greater predictive power comes with challenges of consistency and explainability. Our findings illustrate the benefits and drawbacks of utilizing various LLMs for large-scale qualitative research, and we provide practical advice for researchers looking to balance automation and interpretive rigor.

References

Michael, A., & Abdullah, A. (2024). So many responses, so little time: A machine-learning approach to analyzing open-ended survey data. Analyses of Social Issues and Public Policy, 24(1), Article e30377. https://doi.org/10.1002/au.30377

Liu, B. (2015). Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge University Press. https://doi.org/10.1017/CBO9781139084789

Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1–135.

Bostan, L. A., & Klinger, R. (2018). An analysis of annotated corpora for emotion classification in text. In E. M. Bender, L. Derczynski, & P. Isabelle (Eds.), Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018) (pp. 2734–2746). Association for Computational Linguistics.

Saravia, E., Liu, C.-H., Huang, J.-H., Wu, J.-X., & Chen, Y.-S. (2018). CARER: Contextualized affect representations for emotion recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018) (pp. 3687–3697).

Jurafsky, D., & Martin, J. H. (2026). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition with language models (3rd ed., online manuscript). https://web.stanford.edu/~jurafsky/slp3

Poria, S., Hazarika, D., Majumder, N., & Mihalcea, R. (2020). Beneath the tip of the iceberg: Current challenges and new directions in sentiment analysis research. IEEE Transactions on Affective Computing.

Jacobs, I. S., Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S. M., Nori, H., Palangi, H., Ribeiro, M. T., & Zhang, Y. (2023). Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv. https://arxiv.org/abs/2303.12712

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Kaiser, Ł., … Fiedel, N. (2022). PaLM: Scaling language modeling with Pathways. arXiv. https://arxiv.org/abs/2204.02311

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and efficient foundation language models. arXiv. https://arxiv.org/abs/2302.13971

Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M., & Le, Q. V. (2022). Finetuned language models are zero-shot learners. Proceedings of the Tenth International Conference on Learning Representations (ICLR 2022), 1–10.

https://openreview.net/forum?id=rJ4p2XrF6A

Christiano, P. F., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30, 3338–3346.

Zhong, R., Wang, S., Zou, D., & Klein, D. (2023). On the robustness of ChatGPT: An adversarial and out-of-distribution perspective. arXiv. https://arxiv.org/abs/2303.07205

Wang, S., & Manning, C. D. (2012). Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 90–94). https://aclanthology.org/P12-2018/

Deng, X., Bashlovkina, V., Han, F., Baumgartner, S., & Bendersky, M. (2023). LLMs to the moon? Reddit market sentiment analysis with large language models. Companion Proceedings of the ACM Web Conference 2023 (WWW 2023), 1014–1019. https://doi.org/10.1145/3544422.3556244

Ainslie, J., Lee, J., Chen, M., Tran, T., Pang, R., & Ontañón, S. (2023). GQA: Training generalized multi-query transformer models from multi-head checkpoints. arXiv. https://arxiv.org/abs/2305.13245

Anthropic. (2024). Claude 3.5 Sonnet model card. (Version 1.2) https://www.anthropic.com/news/claude-3-5-sonnet

OpenAI. (2024). GPT-4 Turbo and API updates (Technical report, November 2023 release). https://platform.openai.com/docs/models/gpt-4-turbo

Meta AI. (2024, April 18). LLaMA 3: Open foundation and fine-tuned language models. https://ai.meta.com/blog/meta-llama-3/

Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Finn, C., & Levine, S. (2023). Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36.

https://proceedings.neurips.cc/paper_files/paper/2023/file/a8b139544e501e11e70b3c3c2a21d720-Paper-Conference.pdf

Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260. https://doi.org/10.1126/science.aaa8415

Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78–87. https://doi.org/10.1145/2347736.2347755

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.

https://doi.org/10.1007/BF00994018

Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning (ECML) (pp. 137–142).

https://doi.org/10.1007/BFb0026683

Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.

http://jmlr.org/papers/v9/fan08a.html

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

https://doi.org/10.1023/A:1010933404324

Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Computing Surveys, 51(5), Article 93. https://doi.org/10.1145/3236009

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.

https://doi.org/10.1007/BF00116251

Craven, M., & Shavlik, J. W. (1996). Extracting tree-structured representations of trained networks. Advances in Neural Information Processing Systems, 8, 24–30.

https://proceedings.neurips.cc/paper/1995/file/303ed4c69846ab36c2904d3ba8573050-Paper.pdf

Downloads

Published

2026-04-29

How to Cite

Akinde, A. ., Akinde, M. ., Emiola, R. ., & Akinsola, A. . (2026). So Many Opinions, So Many LLMs: Comparing Large Language Models to Traditional Machine Learning for Open- Ended Survey Analysis. Applied Research and Innovation, 4(1), 87-96. https://doi.org/10.54536/ari.v4i1.6586

Similar Articles

11-20 of 26

You may also start an advanced similarity search for this article.