Researchers at the University of California, Berkeley’s D-Lab have examined how large language models (LLMs), which power popular AI chatbots, handle moral dilemmas by comparing their responses to those of Reddit users on the “Am I the Asshole?” (AITA) forum. As more people seek advice and support from chatbots like ChatGPT, questions arise about the norms and biases these technologies may encode.
Pratik Sachdeva, a senior data scientist at UC Berkeley’s D-Lab, noted concerns about transparency in chatbot design. “Through their advice and feedback, these technologies are shaping how humans act, what they believe and what norms they adhere to,” said Sachdeva. “But many of these tools are proprietary. We don’t know how they were trained. We don’t know how they are aligned.”
To investigate further, Sachdeva and Tom van Nuenen, also a senior data scientist and lecturer at the D-Lab, analyzed over 10,000 real-world social conflicts posted on AITA. They asked seven different LLMs—including OpenAI’s GPT-3.5 and GPT-4; Claude Haiku; Google’s PaLM 2 Bison and Gemma 7B; Meta’s LLaMa 2 7B; and Mistral 7B—to judge who was at fault in each scenario.
The study found that while individual chatbots often differed in their judgments—reflecting distinct ethical standards—the consensus opinion among them usually matched the consensus reached by Reddit users. “When you have a dilemma, you might ask a series of different friends what they think, and each of them might give you a different opinion. In essence, this is what Reddit users are doing on the AITA forum,” Sachdeva explained. “You could do the same thing with chatbots — first, you ask ChatGPT, then you ask Claude and then you ask Gemini. When we did that, we found that there was consistency between the majority opinions of Redditors and the majority opinion of chatbots.”
Van Nuenen pointed out that AITA presents complex scenarios unlike those typically studied in academic research: “‘Am I the Asshole?’ is a useful antidote to the very structured moral dilemmas that we see in a lot of academic research,” he said. “The situations are messy, and it’s that messiness that we wanted to confront large language models with.”
Standardized response phrases used on AITA made it easier for researchers to compare chatbot judgments with those of human users.
The researchers observed that most LLMs provided consistent answers when presented with identical dilemmas multiple times—a sign that their outputs reflect underlying values rather than randomness. By analyzing written responses across six broad moral themes—fairness, feelings, harms, honesty, relational obligation and social norms—they identified some differences among models.
“We found that ChatGPT-4 and Claude are a little more sensitive to feelings relative to the other models, and that a lot of these models are more sensitive to fairness and harms, and less sensitive to honesty,” Sachdeva said. He added this sensitivity could influence whose side an LLM takes during disputes: “That could mean that when assessing a conflict, it might be more likely to take the side of someone who was dishonest than someone who caused harm.” The team plans further work to identify broader trends.
An unusual finding involved Mistral 7B’s use of labels: it frequently chose “No assholes here” not because no one was necessarily at fault but due to interpreting terms literally compared with other models. “Its own internalization of the concept of assholes was very different from the other models, which raises interesting questions about a model’s ability to pick up the norms of the subreddit,” Sachdeva said.
In follow-up research examining interactions between chatbots deliberating together over moral dilemmas, preliminary results show varying approaches toward consensus-building—GPT models were less likely than others to shift blame after input from peers.
Sachdeva emphasized awareness around reliance on AI for personal decisions: “We want people to be actively thinking about why they are using LLMs, when they are using LLMs and if they are losing the human element by relying on them too much,” he said. “Thinking about how LLMs might be reshaping our behavior and beliefs is something only humans can do.”
Error 500: We apologize, an error has ocurred.
Please try again or return to the homepage.