Date
Publisher
arXiv
Recent advances in AI-assisted education have encouraged the integration of
vision-language models (VLMs) into academic assessment, particularly for tasks
that require both quantitative and qualitative evaluation. However, existing
VLM based approaches struggle with complex educational artifacts, such as
programming tasks with executable components and measurable outputs, that
require structured reasoning and alignment with clearly defined evaluation
criteria. We introduce AGACCI, a multi-agent system that distributes
specialized evaluation roles across collaborative agents to improve accuracy,
interpretability, and consistency in code-oriented assessment. To evaluate the
framework, we collected 360 graduate-level code-based assignments from 60
participants, each annotated by domain experts with binary rubric scores and
qualitative feedback. Experimental results demonstrate that AGACCI outperforms
a single GPT-based baseline in terms of rubric and feedback accuracy,
relevance, consistency, and coherence, while preserving the instructional
intent and evaluative depth of expert assessments. Although performance varies
across task types, AGACCI highlights the potential of multi-agent systems for
scalable and context-aware educational evaluation.
What is the application?
Who is the user?
Who age?
Why use AI?
Study design
