Recent advances in large reasoning models (LRMs) show strong performance in structured domains like math and programming, but they lack pedagogical coherence and real-world teaching behaviors. To bridge this gap, we introduce Pedagogy-R1, a framework that tailors LRMs for classroom use via three innovations: (1) a distillation-based pipeline that filters and refines model outputs for instruction tuning, (2) the Well-balanced Educational Benchmark (WBEB), which measures performance across subject knowledge, pedagogy, tracing, essay scoring, and teacher decision–making, and (3) Chain-of-Pedagogy (CoP) prompts to generate and elicit teacher-style reasoning. Our mixedmethod evaluation combines quantitative metrics and qualitative analysis, offering the first systematic assessment of LRMs’ pedagogical strengths and limitations.
Pedagogy-R1: Pedagogically-Aligned Reasoning Model with Balanced Educational Benchmark
Date
Publisher
arXiv
What is the application?
Who is the user?
Why use AI?
Study design