Date
Publisher
arXiv
A key ethical challenge in Automated Essay Scoring (AES) is ensuring that
scores are only released when they meet high reliability standards. Confidence
modelling addresses this by assigning a reliability estimate measure, in the
form of a confidence score, to each automated score. In this study, we frame
confidence estimation as a classification task: predicting whether an
AES-generated score correctly places a candidate in the appropriate CEFR level.
While this is a binary decision, we leverage the inherent granularity of the
scoring domain in two ways. First, we reformulate the task as an n-ary
classification problem using score binning. Second, we introduce a set of novel
Kernel Weighted Ordinal Categorical Cross Entropy (KWOCCE) loss functions that
incorporate the ordinal structure of CEFR labels. Our best-performing model
achieves an F1 score of 0.97, and enables the system to release 47% of scores
with 100% CEFR agreement and 99% with at least 95% CEFR agreement -compared to
approximately 92% (approx.) CEFR agreement from the standalone AES model where
we release all AM predicted scores.
What is the application?
Who is the user?
Who age?
Why use AI?
Study design
