Date
Publisher
arXiv
This study introduces an evaluation benchmark for middle school algebra to be
used in artificial intelligence(AI) based educational platforms. The goal is to
support the design of AI systems that can enhance learner conceptual
understanding of algebra by taking into account their current level of algebra
comprehension. The data set comprises 55 misconceptions about algebra, common
errors, and 220 diagnostic examples identified in previous peer-reviewed
studies. We provide an example application using a large language model,
observing a range of precision and recall scores depending on the topic and
experimental setup that reaches 83.9% when including educator feedback and
restricting it by topic. We found that topics such as ratios and proportions
prove as difficult for LLMs as they are for students. We included a human
assessment of LLMs results and feedback from five middle school math educators
on the clarity and occurrence of misconceptions in the dataset and the
potential use of AI in conjunction with the dataset. Most educators (80% or
more) indicated that they encounter these misconceptions among their students,
suggesting the relevance of the data set to teaching middle school algebra.
Despite varying familiarity with AI tools, four out of five educators expressed
interest in using the data set with AI to diagnose student misconceptions or
train teachers. The results emphasize the importance of topic-constrained
testing, the need for multimodal approaches, and the relevance of human
expertise to gain practical insights when using AI for human learning.
What is the application?
Who is the user?
Who age?
Why use AI?
Study design