Turning Language Model Training From Black Box Into A Sandbox

Authors

Nicolas Pope,

Matti Tedre

Date

01/2026

Publisher

arXiv

Link

https://arxiv.org/pdf/2601.21631v1

Most classroom engagements with generative AI focus on prompting pre-trained models, leaving the role of training data and model mechanics opaque. We developed a browser-based tool that allows students to train a small transformer language model entirely on their own device, making the training process visible. In a CS1 course, 162 students completed pre- and post-test explanations of why language models sometimes produce incorrect or strange output. After a brief hands-on training activity, students' explanations shifted significantly from anthropomorphic and misconceived accounts toward data- and model-based reasoning. The results suggest that enabling learners to directly observe training can support conceptual understanding of the data-driven nature of language models and model training, even within a short intervention. For K-12 AI literacy and AI education research, the study findings suggest that enabling students to train - and not only prompt - language models can shift how they think about AI.

What is the application?

Learning – Student Support

Who is the user?

Student

Who age?

Post-Secondary

Why use AI?

Outcomes – Durable Skills,

Other

Study design

Descriptive – Product Development,

Quantitative – Others