Date
Publisher
arXiv
Large Language Models (LLMs) are increasingly used by teenagers and young
adults in everyday life, ranging from emotional support and creative expression
to educational assistance. However, their unique vulnerabilities and risk
profiles remain under-examined in current safety benchmarks and moderation
systems, leaving this population disproportionately exposed to harm. In this
work, we present Youth AI Risk (YAIR), the first benchmark dataset designed to
evaluate and improve the safety of youth LLM interactions. YAIR consists of
12,449 annotated conversation snippets spanning 78 fine grained risk types,
grounded in a taxonomy of youth specific harms such as grooming, boundary
violation, identity confusion, and emotional overreliance. We systematically
evaluate widely adopted moderation models on YAIR and find that existing
approaches substantially underperform in detecting youth centered risks, often
missing contextually subtle yet developmentally harmful interactions. To
address these gaps, we introduce YouthSafe, a real-time risk detection model
optimized for youth GenAI contexts. YouthSafe significantly outperforms prior
systems across multiple metrics on risk detection and classification, offering
a concrete step toward safer and more developmentally appropriate AI
interactions for young users.
What is the application?
Who is the user?
Who age?
Why use AI?
Study design
