Date
Publisher
arXiv
Evaluating teaching effectiveness at scale remains a persistent challenge for
large universities, particularly within engineering programs that enroll tens
of thousands of students. Traditional methods, such as manual review of student
evaluations, are often impractical, leading to overlooked insights and
inconsistent data use. This article presents a scalable, AI-supported framework
for synthesizing qualitative student feedback using large language models. The
system employs hierarchical summarization, anonymization, and exception
handling to extract actionable themes from open-ended comments while upholding
ethical safeguards. Visual analytics contextualize numeric scores through
percentile-based comparisons, historical trends, and instructional load. The
approach supports meaningful evaluation and aligns with best practices in
qualitative analysis and educational assessment, incorporating student, peer,
and self-reflective inputs without automating personnel decisions. We report on
its successful deployment across a large college of engineering. Preliminary
validation through comparisons with human reviewers, faculty feedback, and
longitudinal analysis suggests that LLM-generated summaries can reliably
support formative evaluation and professional development. This work
demonstrates how AI systems, when designed with transparency and shared
governance, can promote teaching excellence and continuous improvement at scale
within academic institutions.
What is the application?
Who is the user?
Who age?
Why use AI?
