The Data Minimization Principle in Machine Learning @GoogleTechTalks

Google TechTalks | The Data Minimization Principle in Machine Learning @GoogleTechTalks | Uploaded May 2024 | Updated October 2024, 1 week ago.
A Google TechTalk, presented by Ferdinando Fioretto, 2024-04-10
ABSTRACT: The principle of data minimization aims to reduce the amount of data collected and retained to minimize the potential for misuse, unauthorized access, or data breaches. While endorsed by various global data protection regulations, its practical implementation in machine learning remains elusive due to the lack of a clear formulation.

We begin the talk by reviewing the principle of data minimization as presented in several data protection regulations and examining the challenges in formalizing this principle for machine learning tasks. We then propose an optimization-based formalization that attempts to closely follow the legal language of this principle. However, our empirical analysis reveals a potentially overlooked gap between the privacy expectations and actual benefits of data minimization, highlighting the need for approaches that address privacy in a more holistic framework.

Next, we shift gears and discuss the application of data minimization in inference tasks. In high-stakes domains such as law, recruitment, and healthcare, learning models frequently rely on sensitive user data for inference, necessitating the complete set of features. This not only poses significant privacy risks for individuals but also demands substantial human effort from organizations to verify information accuracy. We ask whether it is necessary to require all input features for a model to produce accurate or nearly accurate predictions during inference. We present a sequential algorithm to identify the minimal set of attributes that each individual should reveal, and an empirical assessment showing that individuals often need to disclose only a very small subset of their features without compromising decision-making accuracy.

Finally, I will conclude with a call for action and collaboration, seeking additional efforts in formalizing privacy legal principles in a way that they are actionable and deployable.

Speaker: Ferdinando Fioretto (University of Virginia)

2022 Blockly Developers Summit: Serialization

Chris Nunes, Scott Clark & BC Biermann | IMMUSE Founders | web3 talks | June 9th 2022 | Raphael Hyde

Improved Feature Importance Computation for Tree Models Based on the Banzhaf Value

$Academic Keynote: Differentially Private Covariance-Adaptive Mean Estimation, Adam Smith (BU) A Google TechTalk, presented by Adam Smith, 2021/11/9 ABSTRACT: Differentially Private Covariance-Adaptive Mean Estimation Covariance-adaptive mean estimation is a fundamental problem in statistics, where we are given n i.i.d. samples from a d-dimensional distribution with mean $mu$ and covariance $Sigma$ and the goal is to find an estimator $hatmu$ with small error $|hatmu-mu|_{Sigma}leq alpha$, where $|cdot|_{Sigma}$ denotes the Mahalanobis distance. (We call this covariance-adaptive since the accuracy metric depends on the data distribution.) It is known that the empirical mean of the dataset achieves this guarantee if we are given at least $n=Omega(d/alpha^2)$ samples. Unfortunately, the empirical mean and other statistical estimators can reveal sensitive information about the samples of the training dataset. To protect the privacy of the individuals who participate in the dataset, we study statistical estimators which satisfy differential privacy, a condition that has become a standard criterion for individual privacy in statistics and machine learning. We present two new differentially private mean estimators for d-dimensional (sub)Gaussian distributions with unknown covariance whose sample complexity is optimal up to logarithmic factors and matches the non-private one in many parameter regimes. Previous estimators with the same guarantee either require strong a priori bounds on the covariance matrix or require $Omega(d^{3/2})$ samples. Based on the paper https://arxiv.org/pdf/2106.13329.pdf, which will appear as a spotlight paper at NeurIPS 2021 and is joint work with Gavin Brown, Marco Gaboardi, Jonathan Ullman, and Lydia Zakynthinou. About the Speaker: Adam Smith, Boston University Adam Smith is a professor of computer science at Boston University. From 2007 to 2017, he served on the faculty of the Computer Science and Engineering Department at Penn State. His research interests lie in data privacy and cryptography, and their connections to machine learning, statistics, information theory, and quantum computing. He obtained his Ph.D. from MIT in 2004 and has held postdoc and visiting positions at the Weizmann Institute of Science, UCLA, Boston University and Harvard. He received a Presidential Early Career Award for Scientists and Engineers (PECASE) in 2009; a Theory of Cryptography Test of Time award in 2016; the Eurocrypt 2019 Test of Time award; and the 2017 Gödel Prize. For more information about the workshop: https://events.withgoogle.com/2021-workshop-on-federated-learning-and-analytics/#content$

Tree Learning: Optimal Algorithms and Sample Complexity

2023 Blockly Developer Summit Day 1-8: Blocks in Docs

Shiva Rajaraman | VP of Product at OpenSea | web3 talks | April 21st 2022 | MC: Raphael Hyde

Steven Goldfeder | CEO Offchain Labs / Arbitrum | web3 talks | Aug 24 2023 | MC: Marlon Ruiz

Fast Neural Kernel Embeddings for General Activations

Day 1 Lightning Talks: Privacy & Security