Date
Publisher
arXiv
K-12 educators are increasingly using Large Language Models (LLMs) to create
instructional materials. These systems excel at producing fluent, coherent
content, but often lack support for high-quality teaching. The reason is
twofold: first, commercial LLMs, such as ChatGPT and Gemini which are among the
most widely accessible to teachers, do not come preloaded with the depth of
pedagogical theory needed to design truly effective activities; second,
although sophisticated prompt engineering can bridge this gap, most teachers
lack the time or expertise and find it difficult to encode such pedagogical
nuance into their requests. This study shifts pedagogical expertise from the
user's prompt to the LLM's internal architecture. We embed the well-established
Knowledge-Learning-Instruction (KLI) framework into a Multi-Agent System (MAS)
to act as a sophisticated instructional designer. We tested three systems for
generating secondary Math and Science learning activities: a Single-Agent
baseline simulating typical teacher prompts; a role-based MAS where agents work
sequentially; and a collaborative MAS-CMD where agents co-construct activities
through conquer and merge discussion. The generated materials were evaluated by
20 practicing teachers and a complementary LLM-as-a-judge system using the
Quality Matters (QM) K-12 standards. While the rubric scores showed only small,
often statistically insignificant differences between the systems, the
qualitative feedback from educators painted a clear and compelling picture.
Teachers strongly preferred the activities from the collaborative MAS-CMD,
describing them as significantly more creative, contextually relevant, and
classroom-ready. Our findings show that embedding pedagogical principles into
LLM systems offers a scalable path for creating high-quality educational
content.
What is the application?
Who is the user?
Who age?
Why use AI?
Study design
