Researchers at the University of California, Santa Cruz have demonstrated that misleading text in the physical world can be used to hijack AI-enabled robots and autonomous systems. The study, led by Professor Alvaro Cardenas and Assistant Professor Cihang Xie from the Computer Science and Engineering department, is the first academic exploration of what they call “environmental indirect prompt injection attacks” on embodied AI systems.
“Every new technology brings new vulnerabilities,” said Cardenas. “Our role as researchers is to anticipate how these systems can fail or be misused—and to design defenses before those weaknesses are exploited.”
Embodied AI refers to physical technologies such as self-driving cars and delivery robots that are operated by artificial intelligence. These systems often use large visual-language models (LVLMs), which process both images and text to interpret their surroundings. According to Cardenas, “I expect vision-language models to play a major role in future embodied AI systems. Robots designed to interact naturally with people will rely on them, and as these systems move into real-world deployment, security has to be a core consideration.”
The idea for the research originated in an advanced security course taught by Cardenas, where graduate student Maciej Buszko proposed investigating prompt injection attacks—vulnerabilities known from chatbots like ChatGPT—in the context of embodied AI. Traditionally, these attacks involve manipulating digital text inputs to override an AI’s intended behavior. The UC Santa Cruz team extended this concept into the physical world.
The research group developed a set of attacks called CHAI (command hijacking against embodied AI), which were tested on three applications: autonomous driving, drones performing emergency landings, and drones conducting search missions. The team included Ph.D. students Luis Burbano (first author), Diego Ortiz, Siwei Yang, Haoqin Tu; Johns Hopkins professor Yinzhi Cao; and graduate student Qi Sun.
CHAI operates in two stages: it uses generative AI to optimize attack instructions so they are likely followed by the robot, then adjusts how the text appears—its placement, color, and size—to maximize effectiveness. The system was trained for English, Chinese, Spanish, and Spanglish.
The researchers found high success rates for their attacks: up to 95.5% for aerial object tracking tasks with drones, 81.8% for driverless cars, and 68.1% for drone landings. Tests were conducted using GPT4o—a recent public model from OpenAI—and Intern VL, an open-source alternative that runs locally rather than in the cloud.
In practical tests using a small robotic car inside UC Santa Cruz’s Baskin Engineering 2 building, printed images containing attack instructions successfully overrode the car’s navigation system even under different lighting conditions.
“We found that we can actually create an attack that works in the physical world, so it could be a real threat to embodied AI,” said Burbano. “We need new defenses against these attacks.”
Future work will include further experiments under varying weather conditions and comparisons between prompt-injection attacks and more traditional adversarial techniques such as adding visual noise or blurring images.
“We are trying to dig in a little deeper to see what are the pros and cons of these attacks, analyzing which ones are more effective in terms of taking control of the embodied AI or in terms of being undetectable by humans,” said Cardenas.
Cardenas’ group also plans to investigate potential defenses against such threats—such as authenticating text-based instructions perceived by robots or ensuring commands align with mission objectives and safety requirements.



