Are Existing Image Protections Already Obsolete?

Our research shows that off-the-shelf image-to-image models can erase state-of-the-art image protections, e.g., invisible watermarks, style mimicry protections, and deepfake shields.

See the Attack Read the Full Paper GitHub Code

Primer: How Image Protections Work

Schemes like watermarks, style cloaks, or anti-deepfake shields rely on adding imperceptible noise to an image. Here are two distinct examples from our paper:

Example 1: UnGANable (Deepfake Protection)

USENIX Sec'23

Adds an invisible cloak to a person's face. If an attacker tries to use a GAN model to manipulate the face (deepfake), the hidden noise breaks the AI's inversion process, producing garbage output.

Human Face

Original Image

Protective Noise

Protected Human Face

Protected Image

Example 2: SIREN (Style/Personalization Tracing)

IEEE S&P'25

Embeds a hidden 'coating' into an artist's drawing or specific dataset (e.g., Pokemon). If an unauthorized user trains an AI to mimic this art style, the generated images carry a traceable fingerprint.

Digital Artwork

Original Image

Protective Noise

Protected Digital Artwork

Protected Image

The Attack: Off-the-shelf Image-to-image Models are Powerful Erasers

You don't need a specialized counter-attack to defeat these sophisticated defenses. An off-the-shelf image-to-image AI guided by a simple prompt can erase protections.

Defeating UnGANable

The AI cleanly removes the protective facial noise. The attacker can now successfully invert the image and create a malicious deepfake manipulation.

Protected Human Face

1. Protected Image

Off-the-shelf image-to-image model
(e.g., FLUX, GPT-4o)
"Denoise the image"
Clean Human Face

2. Protection Destroyed

Defeating SIREN

The AI strips away the traceable 'coating' from the artwork. The attacker can now freely train a model to mimic the artist's style without any detectable fingerprint.

Protected Digital Artwork

1. Protected Image

Off-the-shelf image-to-image model
(e.g., FLUX, GPT-4o)
"Denoise the image"
Clean Digital Artwork

2. Protection Destroyed

Comprehensive Evaluation

We tested our simple generative denoising approach against state-of-the-art protections and compared it against highly specialized attack algorithms published in top security venues.

What Does This Mean?

Our findings carry significant implications for everyday users trying to protect their content, as well as researchers designing the next generation of defenses.

For Creators & Everyday Users

The Illusion of Safety: Current invisible protections—whether used to stop deepfakes, prevent art style mimicry, or watermark AI images—currently offer a false sense of security.

Because malicious actors can easily remove these defensive shields using widely available AI tools (like FLUX or GPT-4o) without needing specialized hacking skills, you should not rely solely on perturbation-based defenses to protect your digital content.

For the AI Security Community

  • Evaluate against SOTA Models: Defenses must be benchmarked against the latest generative AI tools. Newer foundation models naturally denoise protections far better than older models (like SD 1.5).
  • Simplicity Beats Complexity: Simple prompting of off-the-shelf denoisers matches or outperforms complex, specialized attack algorithms.
  • A Universal Threat Vector: Foundation models act as a convergent threat. Distinct security problems (watermarking, unlearnability, style protection) all fall to the exact same denoising vulnerability.

About the Research Team

We are a collaborative group of security, privacy, and machine learning researchers from Virginia Tech, UT San Antonio, and IIT Kharagpur.

Virginia Tech UT San Antonio IIT Kharagpur