Search and Filter

Submit a research study

Contribute to the repository:

Add a paper

First-Person Fairness in Chatbots

Authors
Tyna Eloundou, Anna-Luisa Brakman, Alex Beutel, David G. Robinson, Pamela Mishkin, Meghan Shah, Keren Gu-Lemberg, Johannes Heidecke, Lilian Weng, Adam Tauman Kalai
Date
Publisher
arXiv

Chatbots like ChatGPT are used by hundreds of millions of people for diverse purposes, ranging from r´ esum´ e writing to entertainment. These real-world applications are different from the institutional uses, such as r´ esum´ e screening or credit scoring, which have been the focus of much of AI research on bias and fairness. Ensuring equitable treatment for all users in these first-person contexts is critical. In this work, we study “first-person fairness,” which means fairness toward the user who is interacting with a chatbot. This includes providing high-quality responses to all users regardless of their identity or background, and avoiding harmful stereotypes. We propose a scalable, privacy-preserving method for evaluating one aspect of first-person fairness across a large, heterogeneous corpus of real-world chatbot interactions. Specifically, we assess potential bias linked to users’ names, which can serve as proxies for demographic attributes like gender or race, in chatbot systems such as ChatGPT, which provide mechanisms for storing and using user names. Our method leverages a second language model to privately analyze name-sensitivity in the chatbot’s responses. We verify the validity of these annotations through independent human evaluation. Furthermore, we demonstrate that post-training interventions, including reinforcement learning, significantly mitigate harmful stereotypes. Our approach not only provides quantitative bias measurements but also yields succinct descriptions of subtle response differences across sixty-six distinct tasks. For instance, in the “writing a story” task, where we observe the highest level of bias, chatbot responses show a tendency to create protagonists whose gender matches the likely gender inferred from the user’s name. Moreover, a general pattern emerges where users with female-associated names receive responses with friendlier and simpler language slightly more often on average than users with male-associated names. Finally, we provide the system messages required for external researchers to replicate this work and further investigate ChatGPT’s behavior with hypothetical user profiles, fostering continued research on bias in chatbot interactions.

What is the application?
Who is the user?
Who age?