Date
Publisher
arXiv
Recent advances in large reasoning models (LRMs) show strong performance in structured domains like math and programming, but they lack pedagogical coherence and real-world teaching behaviors. To bridge this gap, we introduce Pedagogy-R1, a framework that tailors LRMs for classroom use via three innovations: (1) a distillation-based pipeline that filters and refines model outputs for instruction tuning, (2) the Well-balanced Educational Benchmark (WBEB), which measures performance across subject knowledge, pedagogy, tracing, essay scoring, and teacher decision making, and (3) Chain-of-Pedagogy (CoP) prompts to generate and elicit teacher-style reasoning. Our mixedmethod evaluation combines quantitative metrics and qualitative analysis, offering the first systematic assessment of LRMs pedagogical strengths and limitations.
What is the application?
Who is the user?
Why use AI?
Study design
