Date
Publisher
arXiv
Large language models (LLMs) have advanced virtual educators and learners,
bridging NLP with AI4Education. Existing work often lacks scalability and fails
to leverage diverse, large-scale course content, with limited frameworks for
assessing pedagogic quality. To this end, we propose WikiHowAgent, a
multi-agent workflow leveraging LLMs to simulate interactive teaching-learning
conversations. It integrates teacher and learner agents, an interaction
manager, and an evaluator to facilitate procedural learning and assess
pedagogic quality. We introduce a dataset of 114,296 teacher-learner
conversations grounded in 14,287 tutorials across 17 domains and 727 topics.
Our evaluation protocol combines computational and rubric-based metrics with
human judgment alignment. Results demonstrate the workflow's effectiveness in
diverse setups, offering insights into LLM capabilities across domains. Our
datasets and implementations are fully open-sourced.
What is the application?
Who age?
Why use AI?
Study design
