UC Berkeley study examines moral judgments made by leading AI chatbots

Pratik Sachdeva senior data scientist at UC Berkeley’s D-Lab - UC Berkeley
Pratik Sachdeva senior data scientist at UC Berkeley’s D-Lab - UC Berkeley
0Comments

As more people seek advice and emotional support from AI chatbots, concerns are growing about the influence these technologies have on human behavior. Chatbots, such as ChatGPT and others, are available at all times and can provide responses that users find thoughtful or validating. However, their advice may be shaped by the data used to train them, which could reflect norms and biases different from those of individual communities.

Pratik Sachdeva, a senior data scientist at UC Berkeley’s D-Lab, highlighted the uncertainty surrounding chatbot training and alignment: “Through their advice and feedback, these technologies are shaping how humans act, what they believe and what norms they adhere to,” said Sachdeva. “But many of these tools are proprietary. We don’t know how they were trained. We don’t know how they are aligned.”

To investigate the moral reasoning encoded in popular AI chatbots, Sachdeva and Tom van Nuenen, also a senior data scientist and lecturer at the D-Lab, turned to Reddit’s “Am I the Asshole?” (AITA) forum for their research. Their recent pre-print study involved presenting seven large language models (LLMs) with over 10,000 real-world social conflicts posted on AITA. The researchers asked each model to decide who was at fault in each scenario and compared their responses with those given by Reddit users.

The study found significant differences among chatbots in how they judged moral dilemmas on Reddit. Each LLM appeared to embody its own ethical standards. Despite these differences between models, when comparing consensus opinions among chatbots with those of Reddit users—often called Redditors—the researchers found broad agreement.

“When you have a dilemma, you might ask a series of different friends what they think, and each of them might give you a different opinion. In essence, this is what Reddit users are doing on the AITA forum,” Sachdeva explained. “You could do the same thing with chatbots — first, you ask ChatGPT, then you ask Claude and then you ask Gemini. When we did that, we found that there was consistency between the majority opinions of Redditors and the majority opinion of chatbots.”

On AITA forums where everyday interpersonal conflicts are discussed—ranging from broken promises to privacy violations—Redditors use standardized phrases like “You are the Asshole” or “Not the Asshole” to express their judgments; whichever response receives most upvotes becomes the final verdict.

“‘Am I the Asshole?’ is a useful antidote to the very structured moral dilemmas that we see in a lot of academic research,” Van Nuenen noted. “The situations are messy, and it’s that messiness that we wanted to confront large language models with.” He added that standardized phrasing made it easier for researchers to evaluate chatbot responses alongside human ones.

The seven LLMs included OpenAI’s GPT-3.5 and GPT-4; Claude Haiku; Google’s PaLM 2 Bison and Gemma 7B; Meta’s LLaMa 2 7B; and Mistral 7B. For every scenario analyzed from AITA posts, each model gave both a standard response phrase as well as an explanation for its reasoning.

While individual models often disagreed with one another regarding blame assignment in specific scenarios—a sign that each encodes distinct values—they were generally self-consistent when presented with repeated versions of identical dilemmas.

To better understand these internal differences in moral reasoning among models, Sachdeva and Van Nuenen analyzed written explanations for evidence of sensitivity across six broad themes: fairness, feelings, harms caused or suffered by parties involved in disputes (including honesty), relational obligation (such as loyalty), honesty itself as a separate value category distinct from harm/fairness/obligation/social norm adherence overall context sensitivity including external factors outside immediate conflict resolution choices themselves).

“We found that ChatGPT-4 and Claude are a little more sensitive to feelings relative to other models,” Sachdeva reported. He also noted most models showed higher sensitivity toward fairness or harm but lower toward honesty: “That could mean that when assessing a conflict it might be more likely [for some LLMs] to take side someone who was dishonest than someone who caused harm.” Future work aims at identifying clearer trends within these distinctions.

An unusual pattern emerged for Mistral 7B: it frequently used “No assholes here” not because it believed no one was culpable but because it interpreted terms literally rather than contextually like other models did.
“Its own internalization of concept ‘assholes’ was very different…which raises interesting questions about model’s ability pick up subreddit norms,” said Sachdeva.

Follow-up studies by Sachdeva & Van Nuenen focus on how multiple chatbots deliberate together over similar issues—and preliminary findings suggest some LLMs adapt less readily under peer challenge than others do while drawing upon divergent value frameworks during argumentation processes.

As research continues into transparency around design/training/alignment practices behind major AI systems—including ongoing advocacy efforts led by both scientists—they hope increased awareness will encourage responsible technology use among end-users too:

“We want people actively thinking about why using LLMs…and if losing human element relying them too much,” said Sachdeva.“Thinking about how LLMs might be reshaping our behavior/beliefs something only humans can do.”



Related

George M. Cook, Performing the Duties of the Director

Census Bureau schedules prerelease webinar for new American Community Survey estimates

The U.S. Census Bureau will host a prerelease webinar on January 22 at 1 p.m. ET to discuss the upcoming release of the 2020-2024 American Community Survey (ACS) 5-year estimates.

Elizabeth Auer has been working at the California Public Utilities Commission

Elizabeth Auer discusses her role supporting consumers at CPUC

Elizabeth Auer has been with the California Public Utilities Commission (CPUC) for three years and serves as a Staff Services Manager I in the Consumer Affairs Branch, based in Sacramento.

Chris Wright, Secretary of the U.S. Department of Energy

U.S. Department of Energy and NASA plan lunar nuclear reactor deployment by 2030

The U.S. Department of Energy (DOE) and NASA have announced a renewed partnership to develop a fission surface power system for use on the Moon, with plans to deploy a lunar surface reactor by 2030.

Trending

The Weekly Newsletter

Sign-up for the Weekly Newsletter from Oakland Business Daily.