So Many Opinions, So Many LLMs: Comparing Large Language Models to Traditional Machine Learning for Open- Ended Survey Analysis
DOI:
https://doi.org/10.54536/ari.v4i1.6586Keywords:
Computational Social Science, Large Language Models (Llms), Nsse Open-Ended Survey Data, Qualitative Data Analysis, Sentiment AnalysisAbstract
Open-ended surveys offer valuable insights, but they are notoriously difficult to analyze at scale. Building on previous work that employed traditional machine learning to classify text (“So Many Responses, So Little Time: A Machine-Learning Approach to Analyzing Open-Ended Survey Data”), this study investigates how different large language models (LLMs) understand and analyze NSSE open-ended survey responses. We focus on several cutting-edge LLMSs, including OpenAI’s GPT-4-Turbo, Claude 3.5, Meta’s LLaMA 3 (70B), and Perplexity Sonar Pro, and compare their performance to the previous machine learning models in tasks like sentiment analysis and thematic classification.
Our research analysis assesses model agreement, classification accuracy, and interpretability of reasoning. The findings reveal that current LLMs routinely beat classic machine learning models in classification accuracy, particularly in understanding complex mood and theme patterns in student replies. While LLMs have superior accuracy, they differ greatly in how explicitly and consistently they justify their predictions and apply category boundaries. These distinctions highlight crucial trade-offs when using LLMs for qualitative analysis: greater predictive power comes with challenges of consistency and explainability. Our findings illustrate the benefits and drawbacks of utilizing various LLMs for large-scale qualitative research, and we provide practical advice for researchers looking to balance automation and interpretive rigor.
References
Michael, A., & Abdullah, A. (2024). So many responses, so little time: A machine-learning approach to analyzing open-ended survey data. Analyses of Social Issues and Public Policy, 24(1), Article e30377. https://doi.org/10.1002/au.30377
Liu, B. (2015). Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge University Press. https://doi.org/10.1017/CBO9781139084789
Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends® in Information Retrieval, 2(1–2), 1–135.
Bostan, L. A., & Klinger, R. (2018). An analysis of annotated corpora for emotion classification in text. In E. M. Bender, L. Derczynski, & P. Isabelle (Eds.), Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018) (pp. 2734–2746). Association for Computational Linguistics.
Saravia, E., Liu, C.-H., Huang, J.-H., Wu, J.-X., & Chen, Y.-S. (2018). CARER: Contextualized affect representations for emotion recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018) (pp. 3687–3697).
Jurafsky, D., & Martin, J. H. (2026). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition with language models (3rd ed., online manuscript). https://web.stanford.edu/~jurafsky/slp3
Poria, S., Hazarika, D., Majumder, N., & Mihalcea, R. (2020). Beneath the tip of the iceberg: Current challenges and new directions in sentiment analysis research. IEEE Transactions on Affective Computing.
Jacobs, I. S., Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y. T., Li, Y., Lundberg, S. M., Nori, H., Palangi, H., Ribeiro, M. T., & Zhang, Y. (2023). Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv. https://arxiv.org/abs/2303.12712
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Kaiser, Ł., … Fiedel, N. (2022). PaLM: Scaling language modeling with Pathways. arXiv. https://arxiv.org/abs/2204.02311
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and efficient foundation language models. arXiv. https://arxiv.org/abs/2302.13971
Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M., & Le, Q. V. (2022). Finetuned language models are zero-shot learners. Proceedings of the Tenth International Conference on Learning Representations (ICLR 2022), 1–10.
https://openreview.net/forum?id=rJ4p2XrF6A
Christiano, P. F., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30, 3338–3346.
Zhong, R., Wang, S., Zou, D., & Klein, D. (2023). On the robustness of ChatGPT: An adversarial and out-of-distribution perspective. arXiv. https://arxiv.org/abs/2303.07205
Wang, S., & Manning, C. D. (2012). Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 90–94). https://aclanthology.org/P12-2018/
Deng, X., Bashlovkina, V., Han, F., Baumgartner, S., & Bendersky, M. (2023). LLMs to the moon? Reddit market sentiment analysis with large language models. Companion Proceedings of the ACM Web Conference 2023 (WWW 2023), 1014–1019. https://doi.org/10.1145/3544422.3556244
Ainslie, J., Lee, J., Chen, M., Tran, T., Pang, R., & Ontañón, S. (2023). GQA: Training generalized multi-query transformer models from multi-head checkpoints. arXiv. https://arxiv.org/abs/2305.13245
Anthropic. (2024). Claude 3.5 Sonnet model card. (Version 1.2) https://www.anthropic.com/news/claude-3-5-sonnet
OpenAI. (2024). GPT-4 Turbo and API updates (Technical report, November 2023 release). https://platform.openai.com/docs/models/gpt-4-turbo
Meta AI. (2024, April 18). LLaMA 3: Open foundation and fine-tuned language models. https://ai.meta.com/blog/meta-llama-3/
Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Finn, C., & Levine, S. (2023). Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 36.
Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260. https://doi.org/10.1126/science.aaa8415
Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78–87. https://doi.org/10.1145/2347736.2347755
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
https://doi.org/10.1007/BF00994018
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning (ECML) (pp. 137–142).
https://doi.org/10.1007/BFb0026683
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.
http://jmlr.org/papers/v9/fan08a.html
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
https://doi.org/10.1023/A:1010933404324
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Computing Surveys, 51(5), Article 93. https://doi.org/10.1145/3236009
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
https://doi.org/10.1007/BF00116251
Craven, M., & Shavlik, J. W. (1996). Extracting tree-structured representations of trained networks. Advances in Neural Information Processing Systems, 8, 24–30.
https://proceedings.neurips.cc/paper/1995/file/303ed4c69846ab36c2904d3ba8573050-Paper.pdf
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Abdullah Akinde, Mariam Akinde, Rasheedat Emiola, Ahmed Akinsola

This work is licensed under a Creative Commons Attribution 4.0 International License.