Date
Publisher
arXiv
This study proposes a method for knowledge distillation (KD) of fine-tuned
Large Language Models (LLMs) into smaller, more efficient, and accurate neural
networks. We specifically target the challenge of deploying these models on
resource-constrained devices. Our methodology involves training the smaller
student model (Neural Network) using the prediction probabilities (as soft
labels) of the LLM, which serves as a teacher model. This is achieved through a
specialized loss function tailored to learn from the LLM's output
probabilities, ensuring that the student model closely mimics the teacher's
performance. To validate the performance of the KD approach, we utilized a
large dataset, 7T, containing 6,684 student-written responses to science
questions and three mathematical reasoning datasets with student-written
responses graded by human experts. We compared accuracy with state-of-the-art
(SOTA) distilled models, TinyBERT, and artificial neural network (ANN) models.
Results have shown that the KD approach has 3% and 2% higher scoring accuracy
than ANN and TinyBERT, respectively, and comparable accuracy to the teacher
model. Furthermore, the student model size is 0.03M, 4,000 times smaller in
parameters and x10 faster in inferencing than the teacher model and TinyBERT,
respectively. The significance of this research lies in its potential to make
advanced AI technologies accessible in typical educational settings,
particularly for automatic scoring.
What is the application?
Who age?
Why use AI?
Study design
