A Nearly Tight Analysis of Greedy k-means++ @GoogleTechTalks

Google TechTalks | A Nearly Tight Analysis of Greedy k-means++ @GoogleTechTalks | Uploaded April 2023 | Updated October 2024, 1 week ago.
A Google TechTalk, presented by Václav Rozhoň, 2023-04-13
Abstract: The famous k-means++ algorithm of Arthur and Vassilvitskii is the most popular practical algorithm for solving the k-means problem. The algorithm is very simple and computes the k output centers as follows: it samples the first center as a uniformly random point in the dataset and each of the following k−1 centers is then always sampled with probability proportional to the squared distance to the currently closest center. Amazingly, the k-means++ algorithm is known to return a Θ(log k) approximate solution in expectation.
In their seminal work, Arthur and Vassilvitskii asked about the guarantees of its following greedy variant: in every step, we sample ℓ candidate centers instead of one and then pick the one that minimizes the new cost. This is also how k-means++ is implemented in e.g. the popular Scikit-learn library. We analyze greedy k-means++: We prove that it is an O(ℓ^3 * log^3 k)-approximation algorithm and provide a near-matching lower bound.

Joint work with Christoph Grunau, Ahmet Alper Özüdoğru, Jakub Tětek
arxiv: arxiv.org/abs/2207.07949

Bio: Vaclav Rozhon is a PhD student at ETH Zurich advised by Mohsen Ghaffari. He works mostly on distributed and parallel algorithms; he also creates YouTube videos about algorithms (channel name: polylog). He has a young child and thus no hobbies.

A Google Talk Series on Algorithms, Theory, and Optimization

Auto-bidding in Online Advertising: Campaign Management and Fairness

Sergey Nazarov | Co-Founder Chainlink | web3 talks | Mar 16 2023 | MC: Marlon Ruiz

The Data Minimization Principle in Machine Learning

2022 Blockly Developers Summit: Serialization

Chris Nunes, Scott Clark & BC Biermann | IMMUSE Founders | web3 talks | June 9th 2022 | Raphael Hyde

Improved Feature Importance Computation for Tree Models Based on the Banzhaf Value

$Academic Keynote: Differentially Private Covariance-Adaptive Mean Estimation, Adam Smith (BU) A Google TechTalk, presented by Adam Smith, 2021/11/9 ABSTRACT: Differentially Private Covariance-Adaptive Mean Estimation Covariance-adaptive mean estimation is a fundamental problem in statistics, where we are given n i.i.d. samples from a d-dimensional distribution with mean $mu$ and covariance $Sigma$ and the goal is to find an estimator $hatmu$ with small error $|hatmu-mu|_{Sigma}leq alpha$, where $|cdot|_{Sigma}$ denotes the Mahalanobis distance. (We call this covariance-adaptive since the accuracy metric depends on the data distribution.) It is known that the empirical mean of the dataset achieves this guarantee if we are given at least $n=Omega(d/alpha^2)$ samples. Unfortunately, the empirical mean and other statistical estimators can reveal sensitive information about the samples of the training dataset. To protect the privacy of the individuals who participate in the dataset, we study statistical estimators which satisfy differential privacy, a condition that has become a standard criterion for individual privacy in statistics and machine learning. We present two new differentially private mean estimators for d-dimensional (sub)Gaussian distributions with unknown covariance whose sample complexity is optimal up to logarithmic factors and matches the non-private one in many parameter regimes. Previous estimators with the same guarantee either require strong a priori bounds on the covariance matrix or require $Omega(d^{3/2})$ samples. Based on the paper https://arxiv.org/pdf/2106.13329.pdf, which will appear as a spotlight paper at NeurIPS 2021 and is joint work with Gavin Brown, Marco Gaboardi, Jonathan Ullman, and Lydia Zakynthinou. About the Speaker: Adam Smith, Boston University Adam Smith is a professor of computer science at Boston University. From 2007 to 2017, he served on the faculty of the Computer Science and Engineering Department at Penn State. His research interests lie in data privacy and cryptography, and their connections to machine learning, statistics, information theory, and quantum computing. He obtained his Ph.D. from MIT in 2004 and has held postdoc and visiting positions at the Weizmann Institute of Science, UCLA, Boston University and Harvard. He received a Presidential Early Career Award for Scientists and Engineers (PECASE) in 2009; a Theory of Cryptography Test of Time award in 2016; the Eurocrypt 2019 Test of Time award; and the 2017 Gödel Prize. For more information about the workshop: https://events.withgoogle.com/2021-workshop-on-federated-learning-and-analytics/#content$

Tree Learning: Optimal Algorithms and Sample Complexity

2023 Blockly Developer Summit Day 1-8: Blocks in Docs