Date
Publisher
arXiv
Background: Recently, ChatGPT and similar generative AI models have attracted
hundreds of millions of users and become part of the public discourse. Many
believe that such models will disrupt society and will result in a significant
change in the education system and information generation in the future. So
far, this belief is based on either colloquial evidence or benchmarks from the
owners of the models -- both lack scientific rigour.
Objective: Through a large-scale study comparing human-written versus
ChatGPT-generated argumentative student essays, we systematically assess the
quality of the AI-generated content.
Methods: A large corpus of essays was rated using standard criteria by a
large number of human experts (teachers). We augment the analysis with a
consideration of the linguistic characteristics of the generated essays.
Results: Our results demonstrate that ChatGPT generates essays that are rated
higher for quality than human-written essays. The writing style of the AI
models exhibits linguistic characteristics that are different from those of the
human-written essays, e.g., it is characterized by fewer discourse and
epistemic markers, but more nominalizations and greater lexical diversity.
Conclusions: Our results clearly demonstrate that models like ChatGPT
outperform humans in generating argumentative essays. Since the technology is
readily available for anyone to use, educators must act immediately. We must
re-invent homework and develop teaching concepts that utilize these AI models
in the same way as math utilized the calculator: teach the general concepts
first and then use AI tools to free up time for other learning objectives.
What is the application?
Who is the user?
Why use AI?
Study design
