Date
Publisher
arXiv
Scaling high-quality tutoring remains a major challenge in education. Due to
growing demand, many platforms employ novice tutors who, unlike experienced
educators, struggle to address student mistakes and thus fail to seize prime
learning opportunities. Our work explores the potential of large language
models (LLMs) to close the novice-expert knowledge gap in remediating math
mistakes. We contribute Bridge, a method that uses cognitive task analysis to
translate an expert's latent thought process into a decision-making model for
remediation. This involves an expert identifying (A) the student's error, (B) a
remediation strategy, and (C) their intention before generating a response. We
construct a dataset of 700 real tutoring conversations, annotated by experts
with their decisions. We evaluate state-of-the-art LLMs on our dataset and find
that the expert's decision-making model is critical for LLMs to close the gap:
responses from GPT4 with expert decisions (e.g., "simplify the problem") are
+76% more preferred than without. Additionally, context-sensitive decisions are
critical to closing pedagogical gaps: random decisions decrease GPT4's response
quality by -97% than expert decisions. Our work shows the potential of
embedding expert thought processes in LLM generations to enhance their
capability to bridge novice-expert knowledge gaps. Our dataset and code can be
found at: \url{https://github.com/rosewang2008/bridge}.
What is the application?
Who age?
Why use AI?
Study design
