Date
Publisher
arXiv
The automatic generation of hints by Large Language Models (LLMs) within
Intelligent Tutoring Systems (ITSs) has shown potential to enhance student
learning. However, generating pedagogically sound hints that address student
misconceptions and adhere to specific educational objectives remains
challenging. This work explores using LLMs (GPT-4o and Llama-3-8B-instruct) as
teachers to generate effective hints for students simulated through LLMs
(GPT-3.5-turbo, Llama-3-8B-Instruct, or Mistral-7B-instruct-v0.3) tackling math
exercises designed for human high-school students, and designed using cognitive
science principles. We present here the study of several dimensions: 1)
identifying error patterns made by simulated students on secondary-level math
exercises; 2) developing various prompts for GPT-4o as a teacher and evaluating
their effectiveness in generating hints that enable simulated students to
self-correct; and 3) testing the best-performing prompts, based on their
ability to produce relevant hints and facilitate error correction, with
Llama-3-8B-Instruct as the teacher, allowing for a performance comparison with
GPT-4o. The results show that model errors increase with higher temperature
settings. Notably, when hints are generated by GPT-4o, the most effective
prompts include prompts tailored to specific errors as well as prompts
providing general hints based on common mathematical errors. Interestingly,
Llama-3-8B-Instruct as a teacher showed better overall performance than GPT-4o.
Also the problem-solving and response revision capabilities of the LLMs as
students, particularly GPT-3.5-turbo, improved significantly after receiving
hints, especially at lower temperature settings. However, models like
Mistral-7B-Instruct demonstrated a decline in performance as the temperature
increased.
What is the application?
Who age?
Why use AI?
Study design