UC Riverside researchers develop method for erasing private data from AI without source datasets

Ümit Yiğit Başaran doctoral student in electrical and computer engineering at UC Riverside - UC Riverside
Ümit Yiğit Başaran doctoral student in electrical and computer engineering at UC Riverside - UC Riverside
0Comments

A team of computer scientists at the University of California, Riverside has introduced a method for removing private and copyrighted information from artificial intelligence models without needing access to the original training data. The research was presented in July at the International Conference on Machine Learning in Vancouver, Canada.

This new approach responds to growing concerns about personal and copyrighted materials remaining accessible in AI models even after efforts by creators to restrict or delete such content. Traditional methods require retraining AI models with the original datasets, which is often costly and consumes significant energy. The UC Riverside method allows targeted information to be erased while preserving the functionality of the remaining model.

“In real-world situations, you can’t always go back and get the original data,” said Ümit Yiğit Başaran, a doctoral student in electrical and computer engineering at UC Riverside and lead author of the study. “We’ve created a certified framework that works even when that data is no longer available.”

The need for this type of technology has increased as tech companies face privacy regulations like the European Union’s General Data Protection Regulation (GDPR) and California’s Consumer Privacy Act, both designed to protect personal data used in large-scale machine learning systems.

Legal actions have also highlighted these issues; for example, The New York Times is currently suing OpenAI and Microsoft over alleged unauthorized use of its articles to train language models such as GPT.

AI models generate responses by predicting word patterns based on large collections of online texts. This sometimes leads to near-verbatim reproductions of original content, potentially allowing users to bypass paywalls or copyright protections.

The UC Riverside team—Başaran, professor Amit Roy-Chowdhury, and assistant professor Başak Güler—developed what they describe as a “source-free certified unlearning” technique. This involves using a surrogate dataset that resembles the original data statistically, adjusting model parameters, and introducing controlled random noise so that specific information can be deleted without reconstructing it later.

Their system builds upon existing optimization techniques in AI that estimate how a model would change if retrained from scratch. They enhanced this process with new noise-calibration mechanisms to account for differences between surrogate and original datasets.

Testing on synthetic and real-world datasets showed their method achieved privacy protection similar to full retraining but with significantly less computing power required.

Currently effective for simpler AI models still widely used today, this technique could eventually apply to more complex systems like ChatGPT. Roy-Chowdhury noted its potential impact extends beyond regulatory compliance: media organizations, healthcare providers, and other entities handling sensitive information embedded in AI could benefit from this tool. It may also give individuals greater control over having their personal or copyrighted content removed from AI systems.

“People deserve to know their data can be erased from machine learning models—not just in theory, but in provable, practical ways,” Güler said.

The research paper is titled “A Certified Unlearning Approach without Access to Source Data.” The project included collaboration with Sk Miraj Ahmed from Brookhaven National Laboratory in Upton, NY. Both Roy-Chowdhury and Güler hold faculty appointments in UC Riverside’s Department of Electrical and Computer Engineering as well as secondary appointments in Computer Science and Engineering.



Related

George M. Cook, Performing the Duties of the Director

Census Bureau schedules prerelease webinar for new American Community Survey estimates

The U.S. Census Bureau will host a prerelease webinar on January 22 at 1 p.m. ET to discuss the upcoming release of the 2020-2024 American Community Survey (ACS) 5-year estimates.

Elizabeth Auer has been working at the California Public Utilities Commission

Elizabeth Auer discusses her role supporting consumers at CPUC

Elizabeth Auer has been with the California Public Utilities Commission (CPUC) for three years and serves as a Staff Services Manager I in the Consumer Affairs Branch, based in Sacramento.

Chris Wright, Secretary of the U.S. Department of Energy

U.S. Department of Energy and NASA plan lunar nuclear reactor deployment by 2030

The U.S. Department of Energy (DOE) and NASA have announced a renewed partnership to develop a fission surface power system for use on the Moon, with plans to deploy a lunar surface reactor by 2030.

Trending

The Weekly Newsletter

Sign-up for the Weekly Newsletter from Oakland Business Daily.