Understanding the Evidence Base on AI in K-12 Education

AI tools are arriving in schools faster than research can evaluate them. Teachers are experimenting with new tools and districts are writing policies, all while students are already using AI both inside and outside the classroom.

But for many education leaders, a basic question remains: What does rigorous research actually say about how AI affects teaching and learning?

To help answer that question, we released a new report: The Evidence Base on AI in K-12: A 2026 Review. The report reviews the current research, focusing specifically on studies that convincingly estimate causal impact, meaning studies that can tell us whether an AI tool changed outcomes for students or educators.

The Research Base Is Growing Quickly – But Rigorous Evidence Is Still Thin

Interest in AI and education has expanded rapidly. Our new report analyzed the more than 800 academic papers related to AI and K-12 education in the AI Hub Research Repository as of October 2025. The number of publications is growing dramatically. In only several months the Repository is now over 1,100 papers.

Figure 1: Cumulative number of papers included in the Research Repository
Cumulative number of papers included in the Research Repository

However, most of this research does not evaluate impact. After reviewing the full repository, we identified only 20 high-quality causal studies that rigorously examine how AI tools affect students or educators. These studies provide early signals about how current AI tools are shaping learning and teaching, while also highlighting how much we still do not know.

Figure 2: Percentage of Research Repository and causal impact papers by study design

Note: Technical/Computational is defined as research focusing on algorithm development, model benchmarking, creation of new datasets, or other computer science–related outcomes. The methodology generally includes computational experiments, including architecture descriptions, training procedures, dataset creation, performance evaluations, or ablation studies. Some examples of papers in this category are Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms and Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education. Other quantitative methods include mixed-methods studies, observational studies, case studies with quantitative components among others. Some of the causal impact papers include descriptive, technical/computational, or other quantitative methods (in addition to causal methods), which is why those papers are included in the causal impact papers.

Who the Research Focuses On

Most research on AI in education focuses on students as users of AI tools. Fewer studies examine how educators use AI in their work.

Figure 3: Percentage of Research Repository and causal impact papers by the user of the AI tool

Note: Studies can focus on multiple users, such as both students and educators. Other Users include school leaders and parents/caregivers.

The research also tends to concentrate on certain subjects – especially math, where the largest share of causal impact studies currently exist.

Figure 4: Percentage of Research Repository and causal impact papers by study outcome

Note: Literacy includes learning in reading, writing, or language arts skills. Other academic outcomes include learning in science, programming, language, social studies, and other subjects (besides literacy and math). Social-emotional includes improving skills like self-awareness, empathy, and self-regulation. Studies can focus on multiple outcomes, such as both math and other academic outcomes.

What Early Evidence Suggests

Across the causal studies reviewed in the report, several themes begin to emerge.

  1. Student performance often improves with access to AI tools, but once removed, results are mixed

    Many studies find that students perform better on tasks like math practice, programming assignments, or writing when they can use AI tools during the activity.

    Evidence across the causal studies is mixed when students complete assessments without AI support. In some cases, performance improves; in others, it remains unchanged or declines. This distinction highlights an important question for educators:

    Are AI tools helping students complete tasks, or helping them develop durable learning and skills?

  2. Tool design matters

    Not all AI tools function the same way in learning environments.

    Across the causal studies reviewed, evidence suggests that AI tools designed with pedagogical guardrails – such as tutoring systems that give hints or guide reasoning – show more promising outcomes than general-purpose chatbots that provide answers directly. Learning science provides one way to interpret these findings.

    Learning principleDescriptionAI Opportunity and Risk

    Cognitive Load Theory (Sweller, 1988)

    Manages the limited capacity of working memory by balancing intrinsic, extraneous, and germane (productive) loads.

    AI can reduce extraneous load by efficiently retrieving and organizing information, potentially freeing cognitive resources for deeper learning, but it can also reduce germane load—the productive struggle essential for learning.

    Vygotsky’s Zone of Proximal Development (Vygotsky, 1978)

    The optimal learning zone between what a learner can do independently and what they can achieve with appropriate support (Vygotsky, 1978). 

    The most effective AI tools would provide scaffolds within this zone and gradually release responsibility to the learner to prevent student dependency.

    Transfer of learning

    The process of applying knowledge gained in one context to new situations, which often requires explicit instructional support connecting the contexts (Barnett & Ceci, 2002). 

    One key question is whether practice with AI tools develops durable knowledge and skills that students can apply in new contexts, or whether it creates tool-dependent performance. 

    Metacognition

    The ability of students to monitor their understanding, identify gaps, select appropriate strategies, and adjust their approach based on feedback. 

    Metacognition is difficult to measure and AI could measure metacognition at scale. At the same time, when AI tools perform complete tasks for students, opportunities to develop metacognitive skills may be reduced.

    The expertise reversal effect (Kalyuga, 2007)

    The phenomenon where instructional techniques effective for novices (such as worked examples) become ineffective or even counterproductive for more advanced learners (who may benefit more from independent problem solving). 

    Effective AI tools would adapt their support level to learner expertise.


     

    Desirable difficulties (Bjork, 1994; Bjork & Bjork, 2011)

    Challenges during practice that produce better long-term retention and transfer, even though they feel less effective and produce lower immediate performance (Bjork, 1994; Bjork & Bjork, 2011). 

    AI tools would ideally introduce appropriate desirable difficulties, even if users prefer easier practice sessions.

    Tools that scaffold reasoning may help support learning, while tools that simply generate answers may reduce the cognitive effort that supports durable skill development.

  3. AI may meaningfully support educators

    Several studies in the report focus on educator-facing tools. These studies suggest that AI can help teachers:

    • Spend less time on lesson preparation
    • Ask more guiding questions of students by giving teachers messaging-based real-time suggestions
    • Improve teacher instructional quality through automated insights about classroom interactions or student progress

    Early causal studies suggest that educator-facing AI tools may reduce time spent on tasks such as lesson preparation while maintaining instructional quality. 

Important Questions Remain

While these early findings are informative, the evidence base has major gaps.

For example:

  • There are no high-quality causal studies of student AI use conducted in U.S. K-12 classrooms.
  • Most studies examine short-term outcomes rather than long-term learning.
  • Very little research examines impacts on equity, student wellness, or social development.

AI tools are rapidly integrating into education, answering these questions is increasingly important.

Why This Matters

Education system leaders are tasked with leading policy, procurement, and pedagogical decisions on the use of AI in their schools, districts, and states. Most discussions about AI in education focus on new tools, predictions about the future, or opinions about what schools should do next.

While the research is early, it’s important that these education system decisions are grounded in evidence: What does the current causal evidence actually show?

By synthesizing the strongest available studies, the report aims to give educators, school leaders, policymakers, and researchers a clearer starting point for making decisions in a rapidly evolving landscape.

Read the full report: Evidence Base of AI in K-12

The full report includes:

  • A map of the current research landscape
  • Analysis of over 800 AI-in-education papers
  • Key findings from the high quality 20 causal studies
  • Implications for educators and policymakers

As AI continues to evolve, the evidence base will evolve with it. Stay grounded in what research currently shows – and where more evidence is needed. Use evidence of impact to inform policy and buying decisions. Stay current on new research by following the Research Repository.