Date
Publisher
arXiv
This paper addresses the critical need for scalable and high-quality
educational assessment tools within the Malaysian education system. It
highlights the potential of Generative AI (GenAI) while acknowledging the
significant challenges of ensuring factual accuracy and curriculum alignment,
especially for low-resource languages like Bahasa Melayu. This research
introduces and compares four incremental pipelines for generating Form 1
Mathematics multiple-choice questions (MCQs) in Bahasa Melayu using OpenAI's
GPT-4o. The methods range from non-grounded prompting (structured and basic) to
Retrieval-Augmented Generation (RAG) approaches (one using the LangChain
framework, one implemented manually). The system is grounded in official
curriculum documents, including teacher-prepared notes and the yearly teaching
plan (RPT). A dual-pronged automated evaluation framework is employed to assess
the generated questions. Curriculum alignment is measured using Semantic
Textual Similarity (STS) against the RPT, while contextual validity is verified
through a novel RAG-based Question-Answering (RAG-QA) method. The results
demonstrate that RAG-based pipelines significantly outperform non-grounded
prompting methods, producing questions with higher curriculum alignment and
factual validity. The study further analyzes the trade-offs between the ease of
implementation of framework-based RAG and the fine-grained control offered by a
manual pipeline. This work presents a validated methodology for generating
curriculum-specific educational content in a low-resource language, introduces
a symbiotic RAG-QA evaluation technique, and provides actionable insights for
the development and deployment of practical EdTech solutions in Malaysia and
similar regions.
What is the application?
Who is the user?
Who age?
Why use AI?
Study design
