Date
Publisher
arXiv
Large language models (LLMs) offer promise in generating educational content,
providing instructor feedback, and reducing teacher workload on assessments.
While prior studies have focused on studying LLM-powered learning analytics,
limited research has examined how effective LLMs are in a bilingual context. In
this paper, we study the effectiveness of multilingual large language models
(MLLMs) across monolingual (English-only, Spanish-only) and bilingual
(Spanglish) student writing. We present a learning analytics use case that
details LLM performance in assessing acceptable and unacceptable explanations
of Science and Social Science concepts. Our findings reveal a significant bias
in the grading performance of pre-trained models for bilingual writing compared
to English-only and Spanish-only writing. Following this, we fine-tune
open-source MLLMs including Llama 3.1 and Mistral NeMo using synthetic datasets
generated in English, Spanish, and Spanglish. Our experiments indicate that the
models perform significantly better for all three languages after fine-tuning
with bilingual data. This study highlights the potential of enhancing MLLM
effectiveness to support authentic language practices amongst bilingual
learners. It also aims to illustrate the value of incorporating non-English
languages into the design and implementation of language models in education.
What is the application?
Who is the user?
Who age?
Why use AI?
Study design
