Date
Publisher
arXiv
Practice tests for high-stakes assessment are intended to build test
familiarity, and reduce construct-irrelevant variance which can interfere with
valid score interpretation. Generative AI-driven, automated item generation
(AIG) scales the creation of large item banks and multiple practice tests,
enabling repeated practice opportunities. We conducted a large-scale
observational study (N = 25,969) using the Duolingo English Test (DET) -- a
digital, high-stakes, computer-adaptive English language proficiency test to
examine how increased access to repeated test practice relates to official
DETscores, test-taker affect (e.g., confidence), and score-sharing for
university admissions. To our knowledge, this is the first large-scale study
exploring the use of AIG-enabled practice tests in high-stakes language
assessment. Results showed that taking 1-3 practice tests was associated with
better performance (scores), positive affect (e.g., confidence) toward the
official DET, and increased likelihood of sharing scores for university
admissions for those who also expressed positive affect. Taking more than 3
practice tests was related to lower performance, potentially reflecting
washback -- i.e., using the practice test for purposes other than test
familiarity, such as language learning or developing test-taking strategies.
Findings can inform best practices regarding AI-supported test readiness. Study
findings also raise new questions about test-taker preparation behaviors and
relationships to test-taker performance, affect, and behaviorial outcomes.
What is the application?
Who is the user?
Who age?
Why use AI?
Study design
