Efficient Training Image Extraction from Diffusion Models Ryan Webs @GoogleTechTalks

Google TechTalks | Efficient Training Image Extraction from Diffusion Models Ryan Webs @GoogleTechTalks | Uploaded December 2023 | Updated October 2024, 1 week ago.
A Google TechTalk, presented by Ryan Webster, 2023-09-13
Abstract: The recent demonstration of Carlini et al. shows highly duplicated training images can be copied by diffusion models during generation, which is problematic in terms of data privacy and copyright. Known as an extraction attack, this method reconstructs training images using only a model's generated samples. As the original work requires on the order of gpu-years to perform, we provide a pipeline that can run in gpu-days and can extract a similar number of images. We first de-duplicate the public dataset LAION-2B and demonstrate a high level of duplicated images. We then provide whitebox and blackbox extraction attacks on par with the original attack, whilst requiring significantly less network evaluations. As we can evaluate more samples, we expose the phenomenon of template copies, wherein a diffusion model copies a fixed image region and varies another. We demonstrate that new diffusion models that deduplicate their training set do not generate exact copies as in Carlini et al., but do generate templates. We conclude with several insights into copied images from a data perspective.

Differential privacy dynamics of noisy gradient descent

How to Turn Privacy ON and OFF and ON Again

Improving the Privacy Utility Tradeoff in Differentially Private Machine Learning with Public Data

Deep Learning 2.0: How Bayesian Optimization May Power the Next Generation of DL by Frank Hutter

2023 Blockly Developer Summit DAY 1-7: Cubi - Extending Blockly for Teachers

Information-Constrained Optimization: Can Adaptive Processing of Gradients Help?

Understanding Oversmoothing in Graph Neural Networks (GNNs): Insights from Two Theoretical Studies

Privacy-Aware Compression for Federated Learning

What Could Be the Data-Structures of the Mind?

2022 Blockly Developers Summit: Debugging in Blockly

Control, Confidentiality, and the Right to be Forgotten