Date
Publisher
arXiv
It is astonishing how rapidly general-purpose AI has crossed familiar
thresholds in introductory physics. Comparing outputs from successive models,
GPT-5 Thinking moves far beyond the plug-and-chug tendencies seen earlier: on a
classic elevator problem it works symbolically, notes when variables cancel,
and verifies results; attempts to prompt novice-like behavior mainly affect
tone, not method. On representation translation, the model scores 24/26 (92.3%)
on TUG-Kv4.0. In a card-sorting proxy using two of my comprehensive finals (60
items), its categories reflect solution method rather than surface features.
Solving those same exams, it attains 27/30 and 25/30, with most misses in
ruler-based ray tracing and circuit interpretation. On epistemology, five
independent CLASS runs yield 100\% favorable, indicating a simulated
expert-like stance. Framed as a "boiling frog" problem, the paper argues for a
decisive jump: retire credit-bearing unsupervised closed-response online
assessments; grade process evidence; use paper, whiteboarding; shift weight to
modeling, data, and authentic labs; require transparent, citable AI use;
rebuild problem types; and lean on research-based instruction and peer
learning. The opportunity is to foreground what AI cannot substitute for:
modeling the world, arguing from evidence, and making principled
approximations.
What is the application?
Who is the user?
Who age?
Why use AI?
Study design
