Off-The-Shelf Generative Denoisers Are All You Need To Defeat Image Protection Schemes

Primer: How Image Protections Work

Schemes like watermarks, style cloaks, or anti-deepfake shields rely on adding imperceptible noise to an image. Here are two distinct examples from our paper:

Example 1: UnGANable (Deepfake Protection)

USENIX Sec'23

Adds an invisible cloak to a person's face. If an attacker tries to use a GAN model to manipulate the face (deepfake), the hidden noise breaks the AI's inversion process, producing garbage output.

Original Image

Protective Noise

Protected Image

Example 2: SIREN (Style/Personalization Tracing)

IEEE S&P'25

Embeds a hidden 'coating' into an artist's drawing or specific dataset (e.g., Pokemon). If an unauthorized user trains an AI to mimic this art style, the generated images carry a traceable fingerprint.

Original Image

Protective Noise

Protected Image

The Attack: Off-the-shelf Image-to-image Models are Powerful Erasers

You don't need a specialized counter-attack to defeat these sophisticated defenses. An off-the-shelf image-to-image AI guided by a simple prompt can erase protections.

Defeating UnGANable

The AI cleanly removes the protective facial noise. The attacker can now successfully invert the image and create a malicious deepfake manipulation.

1. Protected Image

Off-the-shelf image-to-image model
(e.g., FLUX, GPT-4o)
"Denoise the image"

2. Protection Destroyed

Defeating SIREN

The AI strips away the traceable 'coating' from the artwork. The attacker can now freely train a model to mimic the artist's style without any detectable fingerprint.

1. Protected Image

Off-the-shelf image-to-image model
(e.g., FLUX, GPT-4o)
"Denoise the image"

2. Protection Destroyed

Comprehensive Evaluation

We tested our simple generative denoising approach against state-of-the-art protections and compared it against highly specialized attack algorithms published in top security venues.

Protections Defeated

Our attack degraded these state-of-the-art protection schemes:

PRC Watermark

In-processing latent space watermark.

ICLR 2025

VINE

Robust post-processing watermark. We also show that VINE is highly vulnerable to a simple cropping attack!

ICLR 2025

SIREN

Verifying unauthorized data usage in personalized models.

IEEE S&P 2025

UnGANable

Mitigating Deepfake manipulations.

USENIX Sec 2023

MIST

Preventing art style imitation.

ICML 2023

Tree-Ring Watermark (TRW)

Invisible fingerprinting for diffusion images.

NeurIPS 2023

Specialized Attacks Outperformed

Our simple, off-the-shelf approach outperformed these custom-built attacks:

LightShed

Defeating perturbation-based image copyright protection.

USENIX Sec 2025

UnMarker

Universal attack on defensive watermarking.

IEEE S&P 2025

Noisy Upscaling

Removing protections against generative AI.

ICLR 2025

INSIGHT

Removing invisible protection against unauthorized usage.

USENIX Sec 2024

What Does This Mean?

Our findings carry significant implications for everyday users trying to protect their content, as well as researchers designing the next generation of defenses.

For Creators & Everyday Users

The Illusion of Safety: Current invisible protections—whether used to stop deepfakes, prevent art style mimicry, or watermark AI images—currently offer a false sense of security.

Because malicious actors can easily remove these defensive shields using widely available AI tools (like FLUX or GPT-4o) without needing specialized hacking skills, you should not rely solely on perturbation-based defenses to protect your digital content.

For the AI Security Community

Evaluate against SOTA Models: Defenses must be benchmarked against the latest generative AI tools. Newer foundation models naturally denoise protections far better than older models (like SD 1.5).
Simplicity Beats Complexity: Simple prompting of off-the-shelf denoisers matches or outperforms complex, specialized attack algorithms.
A Universal Threat Vector: Foundation models act as a convergent threat. Distinct security problems (watermarking, unlearnability, style protection) all fall to the exact same denoising vulnerability.

Primer: How Image Protections Work

Example 1: UnGANable (Deepfake Protection)

Original Image

Protective Noise

Protected Image

Example 2: SIREN (Style/Personalization Tracing)

Original Image

Protective Noise

Protected Image

The Attack: Off-the-shelf Image-to-image Models are Powerful Erasers

Defeating UnGANable

1. Protected Image

2. Protection Destroyed

Defeating SIREN

1. Protected Image

2. Protection Destroyed

Comprehensive Evaluation

Protections Defeated

PRC Watermark

VINE

SIREN

UnGANable

MIST

Tree-Ring Watermark (TRW)

Specialized Attacks Outperformed

LightShed

UnMarker

Noisy Upscaling

INSIGHT

What Does This Mean?

For Creators & Everyday Users

For the AI Security Community

About the Research Team

Xavier Pleimling

Sifat M. Abdullah

Gunjan Balde

Peng Gao

Mainack Mondal

Murtuza Jadliwala

Bimal Viswanath