Date
Publisher
arXiv
Child literacy is a strong predictor of life outcomes at the subsequent
stages of an individual's life. This points to a need for targeted
interventions in vulnerable low and middle income populations to help bridge
the gap between literacy levels in these regions and high income ones. In this
effort, reading assessments provide an important tool to measure the
effectiveness of these programs and AI can be a reliable and economical tool to
support educators with this task. Developing accurate automatic reading
assessment systems for child speech in low-resource languages poses significant
challenges due to limited data and the unique acoustic properties of children's
voices. This study focuses on Xhosa, a language spoken in South Africa, to
advance child speech recognition capabilities. We present a novel dataset
composed of child speech samples in Xhosa. The dataset is available upon
request and contains ten words and letters, which are part of the Early Grade
Reading Assessment (EGRA) system. Each recording is labeled with an online and
cost-effective approach by multiple markers and a subsample is validated by an
independent EGRA reviewer. This dataset is evaluated with three fine-tuned
state-of-the-art end-to-end models: wav2vec 2.0, HuBERT, and Whisper. The
results indicate that the performance of these models can be significantly
influenced by the amount and balancing of the available training data, which is
fundamental for cost-effective large dataset collection. Furthermore, our
experiments indicate that the wav2vec 2.0 performance is improved by training
on multiple classes at a time, even when the number of available samples is
constrained.
What is the application?
Who is the user?
Who age?
Why use AI?