Our research shows that off-the-shelf image-to-image models can erase state-of-the-art image protections, e.g., invisible watermarks, style mimicry protections, and deepfake shields.
Schemes like watermarks, style cloaks, or anti-deepfake shields rely on adding imperceptible noise to an image. Here are two distinct examples from our paper:
Adds an invisible cloak to a person's face. If an attacker tries to use a GAN model to manipulate the face (deepfake), the hidden noise breaks the AI's inversion process, producing garbage output.
Embeds a hidden 'coating' into an artist's drawing or specific dataset (e.g., Pokemon). If an unauthorized user trains an AI to mimic this art style, the generated images carry a traceable fingerprint.
You don't need a specialized counter-attack to defeat these sophisticated defenses. An off-the-shelf image-to-image AI guided by a simple prompt can erase protections.
The AI cleanly removes the protective facial noise. The attacker can now successfully invert the image and create a malicious deepfake manipulation.
The AI strips away the traceable 'coating' from the artwork. The attacker can now freely train a model to mimic the artist's style without any detectable fingerprint.
We tested our simple generative denoising approach against state-of-the-art protections and compared it against highly specialized attack algorithms published in top security venues.
Our attack degraded these state-of-the-art protection schemes:
In-processing latent space watermark.
Robust post-processing watermark. We also show that VINE is highly vulnerable to a simple cropping attack!
Verifying unauthorized data usage in personalized models.
Mitigating Deepfake manipulations.
Preventing art style imitation.
Invisible fingerprinting for diffusion images.
Our simple, off-the-shelf approach outperformed these custom-built attacks:
Defeating perturbation-based image copyright protection.
Universal attack on defensive watermarking.
Removing protections against generative AI.
Removing invisible protection against unauthorized usage.
Our findings carry significant implications for everyday users trying to protect their content, as well as researchers designing the next generation of defenses.
The Illusion of Safety: Current invisible protections—whether used to stop deepfakes, prevent art style mimicry, or watermark AI images—currently offer a false sense of security.
Because malicious actors can easily remove these defensive shields using widely available AI tools (like FLUX or GPT-4o) without needing specialized hacking skills, you should not rely solely on perturbation-based defenses to protect your digital content.
We are a collaborative group of security, privacy, and machine learning researchers from Virginia Tech, UT San Antonio, and IIT Kharagpur.