Date
Publisher
arXiv
Large language models (LLMs) have demonstrated potential in educational
applications, yet their capacity to accurately assess the cognitive alignment
of reading materials with students' developmental stages remains insufficiently
explored. This gap is particularly critical given the foundational educational
principle of the Zone of Proximal Development (ZPD), which emphasizes the need
to match learning resources with Students' Cognitive Abilities (SCA). Despite
the importance of this alignment, there is a notable absence of comprehensive
studies investigating LLMs' ability to evaluate reading comprehension
difficulty across different student age groups, especially in the context of
Chinese language education. To fill this gap, we introduce ZPD-SCA, a novel
benchmark specifically designed to assess stage-level Chinese reading
comprehension difficulty. The benchmark is annotated by 60 Special Grade
teachers, a group that represents the top 0.15% of all in-service teachers
nationwide. Experimental results reveal that LLMs perform poorly in zero-shot
learning scenarios, with Qwen-max and GLM even falling below the probability of
random guessing. When provided with in-context examples, LLMs performance
improves substantially, with some models achieving nearly double the accuracy
of their zero-shot baselines. These results reveal that LLMs possess emerging
abilities to assess reading difficulty, while also exposing limitations in
their current training for educationally aligned judgment. Notably, even the
best-performing models display systematic directional biases, suggesting
difficulties in accurately aligning material difficulty with SCA. Furthermore,
significant variations in model performance across different genres underscore
the complexity of task. We envision that ZPD-SCA can provide a foundation for
evaluating and improving LLMs in cognitively aligned educational applications.
What is the application?
Who is the user?
Why use AI?
Study design
