Date
Publisher
arXiv
The potential of artificial intelligence (AI)-based large language models
(LLMs) holds considerable promise in revolutionizing education, research, and
practice. However, distinguishing between human-written and AI-generated text
has become a significant task. This paper presents a comparative study,
introducing a novel dataset of human-written and LLM-generated texts in
different genres: essays, stories, poetry, and Python code. We employ several
machine learning models to classify the texts. Results demonstrate the efficacy
of these models in discerning between human and AI-generated text, despite the
dataset's limited sample size. However, the task becomes more challenging when
classifying GPT-generated text, particularly in story writing. The results
indicate that the models exhibit superior performance in binary classification
tasks, such as distinguishing human-generated text from a specific LLM,
compared to the more complex multiclass tasks that involve discerning among
human-generated and multiple LLMs. Our findings provide insightful implications
for AI text detection while our dataset paves the way for future research in
this evolving area.
What is the application?
Who is the user?
Who age?
Why use AI?
Study design