The rapid adoption of Large Language Models (LLMs) like ChatGPT-4 is transforming teaching and assessment practices at higher educational institutions. Our study investigates the impact of LLMs on student performance in an open-ended exam scenario. While existing literature suggests that LLMs generally enhance performance across various tasks and contribute to a democratizing effect-especially benefiting lower-performing individuals-our research presents a more nuanced picture. Through a mixed methods approach, we conducted an experimental lab-study (N=146) with business school students analyzing an Organizational Behaviour case study. Students first solved a task without having access to LLMs, and subsequently randomized access to ChatGPT-4 in the second task. Our findings reveal an ""equalizing effect,"" where low-performing students significantly improved their performance with LLM assistance, and high-performing students experienced a decline in performance, bringing them to the level of their lower-performing peers. Qualitative follow-up interviews (19) highlighted that high performers struggled to effectively integrate LLM outputs into their work, mirroring the challenges faced by low performers who often resorted to simple copy-paste strategies. These results underscore the need for a deeper understanding of how LLMs can be leveraged to benefit all learners without inadvertently disadvantaging high achievers.
Leveling Up or Leveling Down? The Impact of Large Language Models on Student Performance in Higher Education
Date
Publisher
SSRN
What is the application?
Who is the user?
Who age?
Why use AI?
Study design