Date
Publisher
arXiv
Generative Artificial Intelligence (GenAI) holds a potential to advance
existing educational technologies with capabilities to automatically generate
personalised scaffolds that support students' self-regulated learning (SRL).
While advancements in large language models (LLMs) promise improvements in the
adaptability and quality of educational technologies for SRL, there remain
concerns about the hallucinations in content generated by LLMs, which can
compromise both the learning experience and ethical standards. To address these
challenges, we proposed GenAI-enabled approaches for evaluating personalised
SRL scaffolds before they are presented to students, aiming for reducing
hallucinations and improving the overall quality of LLM-generated personalised
scaffolds. Specifically, two approaches are investigated. The first approach
involved developing a multi-agent system approach for reliability evaluation to
assess the extent to which LLM-generated scaffolds accurately target relevant
SRL processes. The second approach utilised the "LLM-as-a-Judge" technique for
quality evaluation that evaluates LLM-generated scaffolds for their helpfulness
in supporting students. We constructed evaluation datasets, and compared our
results with single-agent LLM systems and machine learning approach baselines.
Our findings indicate that the reliability evaluation approach is highly
effective and outperforms the baselines, showing almost perfect alignment with
human experts' evaluations. Moreover, both proposed evaluation approaches can
be harnessed to effectively reduce hallucinations. Additionally, we identified
and discussed bias limitations of the "LLM-as-a-Judge" technique in evaluating
LLM-generated scaffolds. We suggest incorporating these approaches into
GenAI-powered personalised SRL scaffolding systems to mitigate hallucination
issues and improve the overall scaffolding quality.
What is the application?
Who is the user?
Who age?
Why use AI?
Study design
