UC Riverside researchers develop method for erasing private data from AI without source datasets

Ümit Yiğit Başaran doctoral student in electrical and computer engineering at UC Riverside - UC Riverside
Ümit Yiğit Başaran doctoral student in electrical and computer engineering at UC Riverside - UC Riverside
0Comments

A team of computer scientists at the University of California, Riverside has introduced a method for removing private and copyrighted information from artificial intelligence models without needing access to the original training data. The research was presented in July at the International Conference on Machine Learning in Vancouver, Canada.

This new approach responds to growing concerns about personal and copyrighted materials remaining accessible in AI models even after efforts by creators to restrict or delete such content. Traditional methods require retraining AI models with the original datasets, which is often costly and consumes significant energy. The UC Riverside method allows targeted information to be erased while preserving the functionality of the remaining model.

“In real-world situations, you can’t always go back and get the original data,” said Ümit Yiğit Başaran, a doctoral student in electrical and computer engineering at UC Riverside and lead author of the study. “We’ve created a certified framework that works even when that data is no longer available.”

The need for this type of technology has increased as tech companies face privacy regulations like the European Union’s General Data Protection Regulation (GDPR) and California’s Consumer Privacy Act, both designed to protect personal data used in large-scale machine learning systems.

Legal actions have also highlighted these issues; for example, The New York Times is currently suing OpenAI and Microsoft over alleged unauthorized use of its articles to train language models such as GPT.

AI models generate responses by predicting word patterns based on large collections of online texts. This sometimes leads to near-verbatim reproductions of original content, potentially allowing users to bypass paywalls or copyright protections.

The UC Riverside team—Başaran, professor Amit Roy-Chowdhury, and assistant professor Başak Güler—developed what they describe as a “source-free certified unlearning” technique. This involves using a surrogate dataset that resembles the original data statistically, adjusting model parameters, and introducing controlled random noise so that specific information can be deleted without reconstructing it later.

Their system builds upon existing optimization techniques in AI that estimate how a model would change if retrained from scratch. They enhanced this process with new noise-calibration mechanisms to account for differences between surrogate and original datasets.

Testing on synthetic and real-world datasets showed their method achieved privacy protection similar to full retraining but with significantly less computing power required.

Currently effective for simpler AI models still widely used today, this technique could eventually apply to more complex systems like ChatGPT. Roy-Chowdhury noted its potential impact extends beyond regulatory compliance: media organizations, healthcare providers, and other entities handling sensitive information embedded in AI could benefit from this tool. It may also give individuals greater control over having their personal or copyrighted content removed from AI systems.

“People deserve to know their data can be erased from machine learning models—not just in theory, but in provable, practical ways,” Güler said.

The research paper is titled “A Certified Unlearning Approach without Access to Source Data.” The project included collaboration with Sk Miraj Ahmed from Brookhaven National Laboratory in Upton, NY. Both Roy-Chowdhury and Güler hold faculty appointments in UC Riverside’s Department of Electrical and Computer Engineering as well as secondary appointments in Computer Science and Engineering.



Related

Jot Condie, President and Chief Executive Officer at California Restaurant Association

California Restaurant Association marks 120th anniversary supporting restaurant community

The California Restaurant Association is celebrating its 120th anniversary this year. The group highlights decades of support for restaurateurs across changing times. Members are invited to participate in an upcoming celebration on June 13.

Patti Poppe, Chief Executive Officer at Pacific Gas and Electric Company (PG&E)

PG&E Corporation Foundation funds more than 200 grants for local restaurants

The PG&E Corporation Foundation is providing over $1 million in new grants for independent restaurants across Northern and Central California through the Restaurants Care Resilience Fund. More than two hundred establishments stand to receive financial assistance aimed at strengthening local food businesses amid ongoing economic challenges.

Patti Poppe, Chief Executive Officer at Pacific Gas and Electric Company (PG&E)

PG&E unveils monitoring center aimed at preventing wildfires and outages

Pacific Gas and Electric Company has launched a new monitoring center designed to prevent wildfires and power outages by using advanced technology for early risk detection. The facility analyzes data from millions of sensors across PG&E’s network to identify potential hazards before they escalate.

Trending

The Weekly Newsletter

Sign-up for the Weekly Newsletter from Oakland Business Daily.