Date
Publisher
arXiv
Classroom behavior monitoring is a critical aspect of educational research,
with significant implications for student engagement and learning outcomes.
Recent advancements in Visual Question Answering (VQA) models offer promising
tools for automatically analyzing complex classroom interactions from video
recordings. In this paper, we investigate the applicability of several
state-of-the-art open-source VQA models, including LLaMA2, LLaMA3, QWEN3, and
NVILA, in the context of classroom behavior analysis. To facilitate rigorous
evaluation, we introduce our BAV-Classroom-VQA dataset derived from real-world
classroom video recordings at the Banking Academy of Vietnam. We present the
methodology for data collection, annotation, and benchmark the performance of
the selected VQA models on this dataset. Our initial experimental results
demonstrate that all four models achieve promising performance levels in
answering behavior-related visual questions, showcasing their potential in
future classroom analytics and intervention systems.
What is the application?
Who is the user?
Who age?
Why use AI?
Study design
