Date
Publisher
arXiv
Academic programs are increasingly recognizing the importance of personal and
professional skills and their critical role alongside technical expertise in
preparing students for future success in diverse career paths. With this
growing demand comes the need for scalable systems to measure, evaluate, and
develop these skills. Situational Judgment Tests (SJTs) offer one potential
avenue for measuring these skills in a standardized and reliable way, but
open-response SJTs have traditionally relied on trained human raters for
evaluation, presenting operational challenges to delivering SJTs at scale. Past
attempts at developing NLP-based scoring systems for SJTs have fallen short due
to issues with construct validity of these systems. In this article, we explore
a novel approach to extracting construct-relevant features from SJT responses
using large language models (LLMs). We use the Casper SJT to demonstrate the
efficacy of this approach. This study sets the foundation for future
developments in automated scoring for personal and professional skills.
What is the application?
Who is the user?
Who age?
Why use AI?
Study design
