Date
Publisher
arXiv
Large Language Models (LLMs) have demonstrated remarkable capabilities in
mathematical problem-solving. However, the transition from providing answers to
generating high-quality educational questions presents significant challenges
that remain underexplored. To advance Educational Question Generation (EQG) and
facilitate LLMs in generating pedagogically valuable and educationally
effective questions, we introduce EQGBench, a comprehensive benchmark
specifically designed for evaluating LLMs' performance in Chinese EQG. EQGBench
establishes a five-dimensional evaluation framework supported by a dataset of
900 evaluation samples spanning three fundamental middle school disciplines:
mathematics, physics, and chemistry. The dataset incorporates user queries with
varying knowledge points, difficulty gradients, and question type
specifications to simulate realistic educational scenarios. Through systematic
evaluation of 46 mainstream large models, we reveal significant room for
development in generating questions that reflect educational value and foster
students' comprehensive abilities.
What is the application?
Who is the user?
Who age?
Study design
