Date
Publisher
arXiv
The integration of large language models (LLMs) into educational tools has
the potential to substantially impact how teachers plan instruction, support
diverse learners, and engage in professional reflection. Yet little is known
about how educators actually use these tools in practice and how their
interactions with AI can be meaningfully studied at scale. This paper presents
a human-AI collaborative methodology for large-scale qualitative analysis of
over 140,000 educator-AI messages drawn from a generative AI platform used by
K-12 teachers. Through a four-phase coding pipeline, we combined inductive
theme discovery, codebook development, structured annotation, and model
benchmarking to examine patterns of educator engagement and evaluate the
performance of LLMs in qualitative coding tasks. We developed a hierarchical
codebook aligned with established teacher evaluation frameworks, capturing
educators' instructional goals, contextual needs, and pedagogical strategies.
Our findings demonstrate that LLMs, particularly Claude 3.5 Haiku, can reliably
support theme identification, extend human recognition in complex scenarios,
and outperform open-weight models in both accuracy and structural reliability.
The analysis also reveals substantive patterns in how educators inquire AI to
enhance instructional practices (79.7 percent of total conversations), create
or adapt content (76.1 percent), support assessment and feedback loop (46.9
percent), attend to student needs for tailored instruction (43.3 percent), and
assist other professional responsibilities (34.2 percent), highlighting
emerging AI-related competencies that have direct implications for teacher
preparation and professional development. This study offers a scalable,
transparent model for AI-augmented qualitative research and provides
foundational insights into the evolving role of generative AI in educational
practice.
What is the application?
Who is the user?
Why use AI?
Study design
