Date
Publisher
arXiv
We present Team BD's submission to the BEA 2025 Shared Task on Pedagogical
Ability Assessment of AI-powered Tutors, under Track 1 (Mistake Identification)
and Track 2 (Mistake Location). Both tracks involve three-class classification
of tutor responses in educational dialogues - determining if a tutor correctly
recognizes a student's mistake (Track 1) and whether the tutor pinpoints the
mistake's location (Track 2). Our system is built on MPNet, a Transformer-based
language model that combines BERT and XLNet's pre-training advantages. We
fine-tuned MPNet on the task data using a class-weighted cross-entropy loss to
handle class imbalance, and leveraged grouped cross-validation (10 folds) to
maximize the use of limited data while avoiding dialogue overlap between
training and validation. We then performed a hard-voting ensemble of the best
models from each fold, which improves robustness and generalization by
combining multiple classifiers. Our approach achieved strong results on both
tracks, with exact-match macro-F1 scores of approximately 0.7110 for Mistake
Identification and 0.5543 for Mistake Location on the official test set. We
include comprehensive analysis of our system's performance, including confusion
matrices and t-SNE visualizations to interpret classifier behavior, as well as
a taxonomy of common errors with examples. We hope our ensemble-based approach
and findings provide useful insights for designing reliable tutor response
evaluation systems in educational dialogue settings.
What is the application?
Who age?
Why use AI?
Study design
