Date
Publisher
arXiv
Assessment of children's speaking fluency in education is well researched for
majority languages, but remains highly challenging for low resource languages.
This paper proposes a system to automatically assess fluency by combining a
fine-tuned multilingual ASR model, an objective metrics extraction stage, and a
generative pre-trained transformer (GPT) network. The objective metrics include
phonetic and word error rates, speech rate, and speech-pause duration ratio.
These are interpreted by a GPT-based classifier guided by a small set of
human-evaluated ground truth examples, to score fluency. We evaluate the
proposed system on a dataset of children's speech in two low-resource
languages, Tamil and Malay and compare the classification performance against
Random Forest and XGBoost, as well as using ChatGPT-4o to predict fluency
directly from speech input. Results demonstrate that the proposed approach
achieves significantly higher accuracy than multimodal GPT or other methods.
What is the application?
Who age?
Why use AI?
Study design
