Challenges in Augmenting Large Language Models with Private Data @GoogleTechTalks

Google TechTalks | Challenges in Augmenting Large Language Models with Private Data @GoogleTechTalks | Uploaded May 2024 | Updated October 2024, 1 week ago.
A Google TechTalk, presented by Ashwinee Panda, 2024-05-01
ABSTRACT: LLMs are making first contact with more data than ever before, opening up new attack vectors against LLM systems. We propose a new practical data extraction attack that we call "neural phishing" (ICLR 2024). This attack enables an adversary to target and extract PII from a model trained on user data without needing specific knowledge of the PII they wish to extract. Our attack is made possible by the few-shot learning capability of LLMs, but this capability also enables defenses. We propose Differentially Private In-Context Learning (ICLR 2024), a framework for coordinating independent LLM agents to answer user queries under DP. We first introduce new methods for obtaining consensus across potentially disagreeing LLM agents, and then explore the privacy-utility tradeoff of different DP mechanisms as applied to these new methods. We anticipate that further LLM improvements will continue to unlock both stronger adversaries and more robust systems.

Speaker: Ashwinee Panda (Princeton University)

Statistical Heterogeneity in Federated Learning

Marginal-based Methods for Differentially Private Synthetic Data

Raullen Chai | CEO & Co-founder of IoTex | web3 talks | Oct 6th 2022 | Hosted by Raphael Hyde

2023 Blockly Developer Summit Day 2-16: Curriculum Development Panel Discussion

Zürich Go Meetup: Zero-effort Type-safe Parsing of JSON and XML

2022 Blockly Developers Summit: Year in Review and Roadmap

Differentially Private Synthetic Data via Foundation Model APIs

The Chinese Computer: A Global History of the Information Age

Online Covering: Secretaries, Prophets and Universal Maps

Differential Privacy and the 2020 Census in the United States

Master Equation for Discrete-Time Stackelberg Mean Field Games