Date
Publisher
arXiv
The recent release of very large language models such as PaLM and GPT-4 has
made an unprecedented impact in the popular media and public consciousness,
giving rise to a mixture of excitement and fear as to their capabilities and
potential uses, and shining a light on natural language processing research
which had not previously received so much attention. The developments offer
great promise for education technology, and in this paper we look specifically
at the potential for incorporating large language models in AI-driven language
teaching and assessment systems. We consider several research areas and also
discuss the risks and ethical considerations surrounding generative AI in
education technology for language learners. Overall we find that larger
language models offer improvements over previous models in text generation,
opening up routes toward content generation which had not previously been
plausible. For text generation they must be prompted carefully and their
outputs may need to be reshaped before they are ready for use. For automated
grading and grammatical error correction, tasks whose progress is checked on
well-known benchmarks, early investigations indicate that large language models
on their own do not improve on state-of-the-art results according to standard
evaluation metrics. For grading it appears that linguistic features established
in the literature should still be used for best performance, and for error
correction it may be that the models can offer alternative feedback styles
which are not measured sensitively with existing methods. In all cases, there
is work to be done to experiment with the inclusion of large language models in
education technology for language learners, in order to properly understand and
report on their capacities and limitations, and to ensure that foreseeable
risks such as misinformation and harmful bias are mitigated.
What is the application?
Who age?
Study design
