Generative artificial intelligence (AI) has the potential to reshape the K-12 tutoring landscape with promises of serving more students at lower cost. But until recently, evidence on whether AI-enabled tutoring can actually improve student learning has been limited. Two new randomized controlled trials find that AI embedded in live, chat-based math tutoring can improve student academic outcomes, raising questions about the tradeoffs between cost and the value of personal connections provided by human tutors.
One study, conducted by researchers from Google and Eedi Labs, evaluated LearnLM, a generative AI tutoring system that provides responses to students’ questions that are reviewed by a supervising human tutor. Tutors could approve the AI’s responses as written or edit them before sending them to students. Between May and June 2025, 165 students ages 13 to 15 participated in the study. Within each tutoring session, students were randomly assigned to one of three conditions: chatting with a human tutor, chatting with LearnLM, or receiving static, pre-written hints from the Eedi Labs platform.
The results suggest that AI can function as a reliable instructional tool on its own. Supervising tutors approved 76.4 percent of LearnLM’s responses with little or no edits, and the LearnLM was just as effective as human tutors in helping students correct their mistakes. More notably, students who interacted with LearnLM performed better on subsequent, more challenging topics: they had a 66 percent success rate, compared with students tutored by humans alone (61 percent) or those who received static hints (56 percent).
A second study conducted by researchers at Stanford University examined a different model: Tutor CoPilot, an AI-tool designed to provide guidance to tutors during chat-based tutoring sessions. Different from LearnLM, which gives the supervising tutor only one suggested response, Tutor CoPilot gives tutors three suggested responses that tutors can choose from, edit, or regenerate. In a study conducted between March and May 2024, 1,000 elementary school students were randomly assigned to chat-based sessions with either a human tutor alone or a human tutor using Tutor CoPilot.
Students in the Tutor CoPilot condition were four percentage points more likely to achieve topic mastery than students assigned to human tutors, with the largest gains (up to 9 points) among students assigned to lower-rated and less-experienced tutors. The researchers suggest that these improvements were likely driven by the use of higher-quality instructional practices—tutors using CoPilot were 10 percentage points more likely to prompt students to explain their thinking, while tutors in the control condition were more likely to rely on generic encouragement.
...
