A recent study from researchers at the University of California San Diego and the University of Chicago highlights the difficulties visual artists face in protecting their work from generative AI tools that use online data for training. The findings will be presented at the 2025 Internet Measurement Conference in Madison, Wisconsin.
The researchers found that while there are ways to prevent AI crawlers—programs that collect internet data for AI models—from accessing artwork, most artists lack either the technical skills or control over their web hosting services to implement these protections. Many content management platforms do not allow users to modify key files like robots.txt, which can block certain web crawlers.
“At the core of the conflict in this paper is the notion that content creators now wish to control how their content is used, not simply if it is accessible. While such rights are typically explicit in copyright law, they are not readily expressible, let alone enforceable in today’s Internet. Instead, a series of ad hoc controls have emerged based on repurposing existing web norms and firewall capabilities, none of which match the specificity, usability, or level of enforcement that is, in fact, desired by content creators,” the researchers wrote.
The study surveyed over 200 visual artists about their awareness and use of tools designed to deter AI crawlers. Almost 80% reported trying to stop their work from being included in datasets for generative AI. About two-thirds used Glaze—a tool developed at the University of Chicago that masks artworks from AI systems—and more than half limited what they shared online or posted only low-resolution images.
Despite strong interest in blocking unwanted scraping (96% wanted access to such tools), many artists were unfamiliar with basic options like robots.txt files. More than three-quarters of professional artist websites reviewed by researchers were hosted on third-party platforms where artists could not edit robots.txt settings themselves.
Some website hosts do offer solutions; Squarespace allows users to easily block AI-related crawlers through its interface. However, only 17% of Squarespace-using artists enabled this option, possibly due to lack of awareness.
Robots.txt files are a common method for restricting crawler access but rely on voluntary compliance from companies running those bots. Researchers found that large corporations generally respect these rules—with one exception: Bytespider, run by ByteDance (owner of TikTok), did not appear to follow robots.txt restrictions.
Another recent development is Cloudflare’s “block AI bots” feature. Currently enabled by just 5.7% of Cloudflare sites according to the study’s authors, it could become more widely used as awareness grows.
“While it is an ‘encouraging new option’, we hope that providers become more transparent with the operation and coverage of their tools (for example by providing the list of AI bots that are blocked),” said Elisa Luo, a co-author and Ph.D. student at UC San Diego.
The legal landscape around these issues continues to shift. In Europe, new laws require companies developing AI models to obtain authorization from copyright holders before using protected works for training purposes. In contrast, U.S.-based companies face ongoing lawsuits over whether scraping publicly available content for model training constitutes fair use under copyright law.
“There is reason to believe that confusion around the availability of legal remedies will only further focus attention on technical access controls,” wrote the research team. “To the extent that any U.S. court finds an affirmative ‘fair use’ defense for AI model builders, this weakening of remedies on use will inevitably create an even stronger demand to enforce controls on access.”
The research received funding from NSF grant SaTC-2241303 and support from the Office of Naval Research project #N00014-24-1-2669.

