Date
Publisher
arXiv
With the recent rapid increase in digitization across all major industries,
acquiring programming skills has increased the demand for introductory
programming courses. This has further resulted in universities integrating
programming courses into a wide range of curricula, including not only
technical studies but also business and management fields of study.
Consequently, additional resources are needed for teaching, grading, and
tutoring students with diverse educational backgrounds and skills. As part of
this, Automated Programming Assessment Systems (APASs) have emerged, providing
scalable and high-quality assessment systems with efficient evaluation and
instant feedback. Commonly, APASs heavily rely on predefined unit tests for
generating feedback, often limiting the scope and level of detail of feedback
that can be provided to students. With the rise of Large Language Models (LLMs)
in recent years, new opportunities have emerged as these technologies can
enhance feedback quality and personalization.
To investigate how different feedback mechanisms in APASs are perceived by
students, and how effective they are in supporting problem-solving, we have
conducted a large-scale study with over 200 students from two different
universities. Specifically, we compare baseline Compiler Feedback, standard
Unit Test Feedback, and advanced LLM-based Feedback regarding perceived quality
and impact on student performance.
Results indicate that while students rate unit test feedback as the most
helpful, AI-generated feedback leads to significantly better performances.
These findings suggest combining unit tests and AI-driven guidance to optimize
automated feedback mechanisms and improve learning outcomes in programming
education.
What is the application?
Who is the user?
Who age?
Why use AI?
