Google TechTalks
The Origin of the Universe and the Arrow of Time
updated
ABSTRACT: In the rapidly evolving field of artificial intelligence, the commitment to data privacy and intellectual property protection during Machine Learning operations has become a foundational necessity for society and businesses handling sensitive data. This is especially critical in sectors such as healthcare and finance, where ensuring confidentiality and safeguarding proprietary information are not just ethical imperatives but essential business requirements.
This presentation goes into the role of Fully Homomorphic Encryption (FHE), based on the open-source library Concrete ML, in advancing secure and privacy-preserving ML applications.
We begin with an overview of Concrete ML, emphasizing how practical FHE for ML was made possible. This sets the stage for discussing how FHE is applied to ML inference, demonstrating its capability to perform secure inference on encrypted data across various models. After inference, we speak about another important FHE application, the FHE training and how encrypted data from multiple sources can be used for training without compromising individual user's privacy.
FHE has lots of synergies with other technologies, in particular Federated Learning: we show how this integration strengthens privacy-preserving features of ML models during the full pipeline, training and inference.
Finally, we address the application of FHE in generative AI and the development of Hybrid FHE models (which are the subject of our RSA 2024 presentation). This approach represents a strategic balance between intellectual property protection, user privacy and computational performance, offering solutions to the challenges of securing one of the most important AI applications of our times.
SPEAKERS:
Jordan Frery, Concrete ML Tech Lead and Research at Zama
Benoit Chevallier-Mames, VP Cloud and ML at Zama
DATE:
May 8 2024
ABSTRACT: In 1989, a Chinese engineer made an audacious prediction: within his lifetime, Chinese would surpass English as the fastest computational language in the world, where "with leisurely keystrokes, users will be able to reach or greatly outpace the speed of human speech." Outlandish as this prognostication was, it has largely come true, with the fastest Chinese inputters reaching mind-boggling speeds of 221.9 characters per minute—or 3.7 Chinese characters every second. How is this possible? How did a script so long disparaged as cumbersome and helplessly complex suddenly rival—exceed, even—computational typing speeds clocked in other parts of the world?
Previewing his new book, The Chinese Computer: A Global History of the Information Age (forthcoming in May 2024), Prof. Thomas Mullaney charts out the history of computing in China, a terrain that remains unmapped despite China’s present-day status as a global IT powerhouse.
ABOUT THE SPEAKER: Thomas S. Mullaney is Professor of History and Professor of East Asian Languages and Cultures at Stanford University. He is also the Kluge Chair in Technology and Society at the Library of Congress, and a Guggenheim Fellow. He is the author or lead editor of 7 books, including The Chinese Typewriter (winner of the Fairbank prize), Your Computer is on Fire, Coming to Terms with the Nation: Ethnic Classification in Modern China, and the forthcoming The Chinese Computer—the first comprehensive history of Chinese-language computing. His writings have appeared in the Journal of Asian Studies, Technology & Culture, Aeon, Foreign Affairs, and Foreign Policy, and his work has been featured in the LA Times, The Atlantic, the BBC, and in invited lectures at Google, Microsoft, Adobe, and more. He holds a PhD from Columbia University.
Host: Shumin Zhai.
ABSTRACT: Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs.
ABOUT THE SPEAKER: Ziming Liu is a fourth-year PhD student at MIT & IAIFI, advised by Prof. Max Tegmark. His research interests lie in the intersection of AI and physics (science in general):
Physics of AI: “AI as simple as physics”
Physics for AI: “AI as natural as physics”
AI for physics: “AI as powerful as physicists”
He publishes papers both in top physics journals and AI conferences. He serves as a reviewer for Physcial Reviews, NeurIPS, ICLR, IEEE, etc. He co-organized the AI4Science workshops. His research have strong interdisciplinary nature, e.g., Kolmogorov-Arnold networks (Math for AI), Poisson Flow Generative Models (Physics for AI), Brain-inspired modular training (Neuroscience for AI), understanding Grokking (physics of AI), conservation laws and symmetries (AI for physics).
Google Algorithms Seminar. ABSTRACT: I introduce a framework for studying transient matching in decentralized markets where workers learn about their preferences through their experiences. Limits on the number of available positions force workers to compete over matches. Each capacity-constrained firm employs workers whose match value exceeds a threshold. Since employment offers both payoff and information benefits, workers effectively face a multi-armed bandit problem. To them each firm acts as a bandit where the probability of "success'' at the firm is driven by market competition. In such markets, aggregate demand for firms satisfies the gross substitutes condition which ensures equilibrium existence. The resulting search patterns match a variety of stylized facts from labor market data. High-quality workers search less and tenure increases with age. In general, equilibria are inefficient because competition depresses the level of search. Natural interventions designed to improve efficiency are effective in uncongested markets, but can fail when congestion is severe. From a market design perspective, the utilization of headhunters has differential effects depending on workers’ quality, conclusively improving both outcomes for low-quality workers and overall efficiency. Reducing congestion through unemployment benefits, can depress search and may ultimately reduce match efficiency.
About the Speaker: Andrew Ferdowsian completed his Ph.D. in economics at Princeton University in May 2023. While at Princeton he earned the Goldfeld Fellowship, alongside multiple Dietrich Economic Theory Center Grants. This past year he served as a tenure track lecturer (assistant professor) at the University of Exeter, and this fall he will be joining the University of Notre Dame as a tenure track assistant professor. His research focuses on improving market places, such as the marketplace for public housing in Singapore and the labor market for medical residents. His research can be found at ferdowsian.net/pages/research/.
ABSTRACT: Why some software products succeed and others fail is rarely clear, even with hindsight, and success involves many factors. But one factor is always necessary: a compelling usage scenario. This scenario typically doesn’t let users do something they couldn’t do before, but makes doing something they already want to do easier.
I’ll give several examples of this, and then generalize this idea to thinking of software products in terms of “concepts”—small services that provide coherent value and that, while being composable, have no mutual dependencies. I’ll illustrate the use of concepts in clarifying software design, and will tell you about some recent work in which concepts, due to their modular qualities, enable LLM-based generation of an entire app.
About the Speaker: Daniel Jackson is professor of computer science at MIT, and associate director of CSAIL. For his research in software, he won the ACM SIGSOFT Impact Award, the ACM SIGSOFT Outstanding Research Award and was made an ACM Fellow. He is the lead designer of the Alloy modeling language, and author of Software Abstractions. He chaired a National Academies study on software dependability, and has collaborated on software projects with NASA on air-traffic control, with Massachusetts General Hospital on proton therapy, and with Toyota on autonomous cars. His book, Essence of Software: Why Concepts Matter for Great Design, was recently published by Princeton University Press. Jackson is also a keen photographer.
Google Algorithms Seminar. ABSTRACT: This talk will focus on two very related computational problems. The first is Attention, the task at the core of Transformer and other Large Language Models. The second is Kernel Density Estimation, a statistical task which has seen applications from machine learning to computational physics.
Both of these problems have straightforward quadratic-time algorithms, and I'll discuss a recent line of work investigating when faster algorithms are possible. I'll survey two algorithmic techniques which lead to almost linear-time algorithms in different parameter regimes: the Fast Multipole Method, and the polynomial method. I'll then overview our new fine-grained hardness results, which prove that these techniques are essentially optimal in the parameter regimes where they apply, and highlight the situations where improvements may be possible.
Based on joint work with Amol Aggarwal (in CCC 2022), Zhao Song (in NeurIPS 2023), and Yunfeng Guan (to appear in CCC 2024).
About the Speaker: Josh Alman is an Assistant Professor of Computer Science at Columbia University. He works on algorithm design and complexity theory, focusing especially on algebraic tools for speeding up algorithms.
ABSTRACT: We investigate the privacy properties of two proposed methods for attribution reporting through the lens of label inference measures. A label inference measure quantifies the relationship between an adversary’s prior and posterior knowledge about the private labels after a private data release. In this talk, I will discuss new ways that we model our adversary, and present two measures that capture different notions of label inference success. I will also present some empirical and theoretical findings that help guide the discussion around risks and accuracy tradeoffs of potential attribution reporting methods.
This is based on research with Andres Muñoz Medina, Travis Dick, Robert Busa-Fekete, Claudio Gentile, and Adam Smith during my Google PhD research internship and subsequent Student Researcher position.
Speaker: Marika Swanberg (Boston University)
ABSTRACT: The rise of Generative Artificial Intelligence (GenAI) models is revolutionizing the creative domain. By using models like Gitbub Copilot, Open AI GPT, Stable Diffusion, Midjourney, or DeviantArt, non-professional users can generate high-quality content such as text, images, music, or code. These powerful tools facilitate new unimaginable ways of human creativity on a large scale, disrupting the professional creative sectors. This article proposes a novel approach that leverages the capacity of GenAI to assist in copyright legal disputes.
GenAI models are trained on examples, generalizing expressive patterns and applying these learnings to perform different tasks, such as auto-completing sentences or generating visual outputs in response to a textual prompt. These models are designed to grasp complex probability distributions from training samples by identifying recurring relationships between input and output data.
Similarly, humans learn from a corpus of preexisting materials, memorize impressions, learn styles, extract themes from text, generalize principles from new materials, and engage in deconstructing and reconstructing processes. Unlike human learning, which occurs within the confines of the human mind, GenAI learning involves digital replication. Consequently, GenAI has sparked numerous class actions alleging copyright infringement. These claims assert that the models in-fringe copyright, either because they are trained on copyrighted materials without authorization, generate derivative works of those materials or both.
While copyright law prohibits the unauthorized copying of protected expressions, it permits the extrapolation and learning of ideas. For a work to be copyrighted, it must be original, meaning the author must originate it. As a result, the law does not protect expressions that are generic and, therefore, cannot be attributed to any particular author, such as ideas, scènes à faire, or conventional programming standards.
For centuries, courts have struggled to consistently differentiate between original expressions and generic ones, resulting in systematic over-protection of copyrighted works. GenAI presents an unprecedented opportunity to inform and improve this legal analysis. By learning from data at various levels of granularity, GenAI systems are revealing the shared patterns in preexisting works that were previously difficult to measure accurately.
In this article, we propose a novel approach for measuring originality to assist in copyright legal disputes. We harness the powerful learning capacity of GenAI to gain more nuanced insights into the genericity of expressions on a significantly larger scale. Based on interdisciplinary research in computer science and law, we propose employing data-driven bi-as—a fundamental aspect of inductive machine learning—to assess the genericity of expressive compositions in preexisting works.
During learning, GenAI models distill and rank expressive compositions based on their prevalence in the models’ datasets. The more frequently these expressive compositions appear in the GenAI models’ datasets (indicating their “generic” nature), the more likely GenAI models are to utilize them when generating new works. Conversely, the rarer expressive compositions appear in the GenAI models’ datasets (indicating their “original” nature), the less likely GenAI models are to utilize them.
Leveraging the capacity of GenAI to learn with greater nuance and on a much grander scale could have groundbreaking implications for copyright law. It could assist courts in determining copyright scope, potentially leading to more efficient and equitable resolutions. Moreover, it has the potential to inform the Copyright Office’s registration practices and provide a valuable signal to facilitate market licensing transactions. Finally, by harnessing GenAI to measure originality at scale, our approach offers valuable insights to policymakers as they grapple with adapting copy-right law to meet the new challenges of an era of “cheap creativity” enabled by GenAI.
ssrn.com/abstract=4530717
The speaker is Uri Hacohen (Tel Aviv University)
ABSTRACT: The principle of data minimization aims to reduce the amount of data collected and retained to minimize the potential for misuse, unauthorized access, or data breaches. While endorsed by various global data protection regulations, its practical implementation in machine learning remains elusive due to the lack of a clear formulation.
We begin the talk by reviewing the principle of data minimization as presented in several data protection regulations and examining the challenges in formalizing this principle for machine learning tasks. We then propose an optimization-based formalization that attempts to closely follow the legal language of this principle. However, our empirical analysis reveals a potentially overlooked gap between the privacy expectations and actual benefits of data minimization, highlighting the need for approaches that address privacy in a more holistic framework.
Next, we shift gears and discuss the application of data minimization in inference tasks. In high-stakes domains such as law, recruitment, and healthcare, learning models frequently rely on sensitive user data for inference, necessitating the complete set of features. This not only poses significant privacy risks for individuals but also demands substantial human effort from organizations to verify information accuracy. We ask whether it is necessary to require all input features for a model to produce accurate or nearly accurate predictions during inference. We present a sequential algorithm to identify the minimal set of attributes that each individual should reveal, and an empirical assessment showing that individuals often need to disclose only a very small subset of their features without compromising decision-making accuracy.
Finally, I will conclude with a call for action and collaboration, seeking additional efforts in formalizing privacy legal principles in a way that they are actionable and deployable.
Speaker: Ferdinando Fioretto (University of Virginia)
ABSTRACT: LLMs are making first contact with more data than ever before, opening up new attack vectors against LLM systems. We propose a new practical data extraction attack that we call "neural phishing" (ICLR 2024). This attack enables an adversary to target and extract PII from a model trained on user data without needing specific knowledge of the PII they wish to extract. Our attack is made possible by the few-shot learning capability of LLMs, but this capability also enables defenses. We propose Differentially Private In-Context Learning (ICLR 2024), a framework for coordinating independent LLM agents to answer user queries under DP. We first introduce new methods for obtaining consensus across potentially disagreeing LLM agents, and then explore the privacy-utility tradeoff of different DP mechanisms as applied to these new methods. We anticipate that further LLM improvements will continue to unlock both stronger adversaries and more robust systems.
Speaker: Ashwinee Panda (Princeton University)
ABSTRACT: In this talk, I will give a brief tutorial of Oblivious RAM (ORAM). I will talk about how ORAM evolved from a theoretical concept to large-scale real-world deployment, and the various emerging demands and use cases of ORAM in both the blockchain community and for traditional cloud service providers. In particular, I will talk about Signal's deployment of Path ORAM over their billion-sized database, and how ORAM allowed them to cut their 500 servers downto 6 servers.
Finally, I will describe a new initiative to build an open-source Oblivious STL library, aiming to provide an oblivious counterpart of the standard STL library.
I will describe our initial efforts at building Oblivious STL. Specifically, I will focus on how using external-memory algorithms techniques can allow us to achieve a 10-100x performance improvement over state-of-the-art implementations for hardware enclaves. In particular, while the literature on ORAM typically uses computational overhead as the performance metric, for hardware enclaves, the number of page swaps is often the dominant metric. Through the help of external-memory algorithms, we can achieve an asymptotical improvement in the number of page swaps.
The speaker is Elaine Shi (Carnegie Mellon University)
ABSTRACT: Membership inference attacks (MIA) aim to detect if a particular data point was used in training a machine learning model. Recent strong attacks have high computational costs and inconsistent performance under varying conditions, rendering them unreliable for practical privacy risk assessment. I will present RMIA, a novel, efficient, and robust membership inference attack algorithm that accurately differentiates between population data and training data of a model, with minimal computational overhead. We achieve this through a robust statistical test that effectively leverages both reference models and reference data samples from the population. Our algorithm exhibits superior test power (true-positive rate) compared to all prior methods, even at extremely low false-positive rates (as low as 0). Under computation constraints, where only a limited number of pre-trained reference models (as few as 1) are available, and also when we vary other elements of the attack, our method performs exceptionally well, unlike some prior attacks that approach random guessing. I will argue that MIA tests, as privacy auditing tools, must be stress tested under low computation budget, few available reference models, and changes to data and models. A strong test is the one that outperforms others in all these practical scenarios, and not only in ideal cases. RMIA lays the groundwork for practical yet accurate and reliable data privacy risk analysis of machine learning.
Reza Shokri (National University of Singapore)
ABSTRACT: In this talk, we draw attention to a new set of inference-time privacy risks of using LLMs in interactive settings, where they are fed different information from multiple sources and expected to reason about what to share in their outputs. We discuss how existing evaluation frameworks don’t fully capture the nuances of such problems, and introduce future research directions for better auditing models for privacy risks, and providing better mitigations.
The speakers are Niloofar Mireshghallah & Hyunwoo Kim (University of Washington)
ABSTRACT: For Software Engineering practitioners, the past 10 years have seen an explosive rise in the adoption of continuous integration systems and automated software testing. Having sufficient test coverage is now considered key to maintaining enough control of large software systems to make changes quickly and reliably. Although we’ve started to write tests, there is still a lot to learn about how to test well - with 40 years of invention and innovation in test strategies and technologies, it’s hard to know what tools are appropriate when testing a given interface. Even among industry leaders, a lot of good testing is still a matter of “I’ll know it when I see it.” In this talk I’ll argue that long-ignored concepts from software design are essential in understanding how to test appropriately. Along the way I’ll tie together ideas and technologies from both design and testing: control, contracts, and design qualities help us understand how best to use unit tests, dynamic analysis, fuzzing, and property-based testing approaches.
Speaker: Titus Winters
Bio: Titus is a Senior Principal Scientist at Adobe, focusing on Developer Experience. He has served on the C++ standards committee, chairing the working group for the design and evolution of the C++ standard library. He has also served on the ACM/IEEE/AAAI CS2023 steering committee, helping set curriculum requirements for Computer Science undergraduate degrees, focusing on the requirements for software engineering. Titus was a thought leader at Google for many years, focusing on C++, software engineering practice, technical debt, and culture. He is the lead author for the book Software Engineering at Google. (O'Reilly, 2020).
Tune in for our chat with Charles Hoskinson. Charles has been in crypto for the last 12 years since the beginning of Bitcoin. Charles works with the Cardano community to help build the foundations for a future where everyone can contribute and have their voice heard. Thank you Charles for joining!
Today’s episode will dive deep on the Bitcoin halving, on-chain governance, why partner chains, and what the actual value of a chain could be.
Hosted by,
Marlon Ruiz
linkedin.com/in/ruizmarlon
Google web3 community lead | customer engineer
#crypto #web3 #blockchain #governance #IOG #cardano #google or #googletechtalks
Google Algorithms Seminar. ABSTRACT: Motivated by the growing interest in correlation-robust stochastic optimization, we investigate stochastic selection problems beyond independence. Specifically, we consider the instructive case of pairwise-independent priors and matroid constraints. We obtain essentially-optimal bounds for contention resolution and prophet inequalities. The impetus for our work comes from the recent work of Caragiannis et al., who derived a constant-approximation for the single-choice prophet inequality with pairwise-independent priors.
For general matroids, our results are tight and largely negative. For both contention resolution and prophet inequalities, our impossibility results hold for the full linear matroid over a finite field. We explicitly construct pairwise-independent distributions which rule out an omega(1/Rank)-balanced offline CRS and an omega(1/log Rank)-competitive prophet inequality against the (usual) oblivious adversary. For both results, we employ a generic approach for constructing pairwise-independent random vectors -- one which unifies and generalizes existing pairwise-independence constructions from the literature on universal hash functions and pseudorandomness. Specifically, our approach is based on our observation that random linear maps turn linear independence into stochastic independence.
We then examine the class of matroids which satisfy the so-called partition property -- these include most common matroids encountered in optimization. We obtain positive results for both online contention resolution and prophet inequalities with pairwise-independent priors on such matroids, approximately matching the corresponding guarantees for fully independent priors. These algorithmic results hold against the almighty adversary for both problems.
ABOUT THE SPEAKER: I am Neel, a third-year CS PhD student in the USC Theory Group, where I am being advised by Shaddin Dughmi and David Kempe. I am broadly interested in Combinatorial Optimization in the presence of Uncertainty or Incentives, Algorithmic Contract Theory and Computational Economics. Recently, I have been also working on problems in designing (fast) Dynamic Algorithms for Graph Problems. I have also worked on designing AI/ML algorithms with the consideration of ethical aspects.
Before starting my Ph.D., I spent a wonderful one and a half years at the National University of Singapore as a Research Assistant working with Yair Zick and Reza Shokri. Before that, I completed my B.Stat at Indian Statistical Institute, Kolkata.
Project CARA NASA CARA Orbital Collision Avoidance Technology. ABSTRACT: NASA's Conjunction Analysis and Risk Assessment (CARA) mission is tasked to detect and advise on close approaches between Earth-orbiting satellites. Join us for a discussion of the core algorithms underpinning it, and why it's necessary and useful in the first place.
ABSTRACT: Often when we talk about libraries, frameworks and modules, we talk about their technical aspects, like the language they are created in. However, programming languages and code bases can also be viewed through a cognitive lens. In this talk, Felienne Hermans, author of the Programmer's Brain, will discuss how your brain sees code, and what the impact of that is for reading code and for learning new languages. Examining code from a cognitive perspective, rather than a technical perspective, can help you gain a better perspective on how people interact with code and how you learn.
About the Speaker: Felienne Hermans
Felienne is a professor of Computer Science Education at the Vrije Universiteit Amsterdam. She also works as a high-school CS teacher one day a week at Lyceum Kralingen in the Codasium program. Felienne is the creator of the Hedy programming language, a gradual and multi-lingual programming language designed for teaching. She is the author of “The Programmer’s Brain“, a book that helps programmers understand how their brain works and how to use it more effectively. In 2021, Felienne was awarded the Dutch Prize for ICT research. She also has a weekly column on BNR, a Dutch radio station.
A Google Algorithms Seminar. ABSTRACT: We discuss the use of optimal transport techniques to derive finite-time error bounds for reinforcement learning in mean-payoff Markov decision processes. The results are obtained as a special case of stochastic Krasnoselski—Mann fixed point iterations for nonexpansive maps. We present sufficient conditions on the stochastic noise and stepsizes that guarantee almost sure convergence of the iterates towards a fixed point, as well as non-asymptotic error bounds and convergence rates. Our main results concern the case of a martingale difference noise with variances that can possibly grow unbounded. We also analyze the case of uniformly bounded variances, and how they apply for Stochastic Gradient Descent in convex optimization.
ABOUT THE SPEAKER: Roberto Cominetti is a professor with the Faculty of Engineering and Sciences at Universidad Adolfo Ibáñez, Santiago, Chile. His research interests include convex analysis and game theory and their applications in transportation networks.
A Google Algorithms Seminar. ABSTRACT: One Tree to Rule Them All: Polylogarithmic Universal Steiner Trees and Strong Sparse Partition Hierarchies. This talk concerns universal Steiner trees (USTs). Informally, a UST is a single spanning tree that is a good approximation for every instance of Steiner tree. More formally, an α-approximate universal Steiner tree (UST) of a graph G for root vertex r is a spanning tree T such that, for any vertex subset S containing r, the cost of the minimal subtree of T connecting S is within an α factor of the cost of the minimum cost subgraph of G connecting S. α-approximate USTs immediately give α-approximations for well-studied variants of Steiner tree such as online or oblivious Steiner tree. Sub-linear-approximate USTs are known but neither the existence of nor poly-time algorithms for computing poly-logarithmic-approximate USTs were previously known.
In this talk, I will discuss the first construction of poly-logarithmic-approximate USTs which (nearly) exponentially improve the approximation guarantees of previous constructions and (nearly) match logarithmic lower bounds. The result is based on new constructions of poly-logarithmic-quality graph hierarchies called strong sparse partition hierarchies which may be interesting in their own right. Roughly, a strong sparse partition (i.e. each level of this hierarchy) is a vertex partition that provides deterministic guarantees on the number of parts balls of appropriate radii intersect.
Joint work with Costas Busch, Da Qi Chen, Arnold Filtser, Daniel Hathcock and Rajmohan Rajaraman
Bio: Ellis Hershkowitz is an Assistant Professor in the Brown University Computer Science Department interested in graph, online and approximation algorithms as well as metric embeddings. He completed his PhD at Carnegie Mellon and did a postdoc at ETH Zurich.
A Google Algorithm Seminar. ABSTRACT: Oversmoothing in Graph Neural Networks (GNNs) refers to the phenomenon where increasing network depth leads to homogeneous node representations. Over the last few years, it has remained as one of the central challenges of building more powerful Graph Neural Networks (GNNs). In this talk, I will discuss two recent papers on this phenomenon and provide some new insights.
The first work studies why oversmoothing happens at a relatively shallow depth in GNNs. By carefully analyzing the oversmoothing mechanisms in a stylized formulation, we distinguish between adverse mixing that homogenizes nodes across different classes and beneficial denoising within the same class. We quantify these two effects on random graphs sampled from the Contextual Stochastic Block Model (CSBM) and show that oversmoothing occurs once the mixing effect starts to dominate the denoising effect. We establish that the number of layers required for this transition is O(logN/log(logN)) for sufficiently dense graphs with N nodes. We also extend our analysis to study the effects of Personalized PageRank (PPR), or equivalently, the effects of initial residual connections on oversmoothing, and shed light on when and why they might not be an ideal solution to the problem.
In the second work, we study oversmoothing in attention-based GNNs, such as Graph Attention Networks (GATs) and transformers. Treating attention-based GNNs as dynamical systems, our study demonstrates that the graph attention mechanism cannot prevent oversmoothing and loses expressive power exponentially. From a technical point of view, the proposed framework significantly extends the existing results on oversmoothing, and can account for asymmetric, state-dependent and time-varying aggregation operators and a wide range of common nonlinear activation functions, such as ReLU, LeakyReLU, GELU and SiLU.
The talk is based on the following papers: arxiv.org/abs/2212.10701, arxiv.org/abs/2305.16102. Joint works with Amir Ajorlou (MIT), Zhengdao Chen (NYU/Google), William Wang (MIT), Zihui Wu (Caltech) and Ali Jadbabaie (MIT).
ABOUT THE SPEAKER: Xinyi Wu is a fourth-year Ph.D. student in the Institute for Data, Systems, and Society (IDSS) at Massachusetts Institute of Technology (MIT), advised by Professor Ali Jadbabaie. She is affiliated with the Laboratory for Information and Decision Systems (LIDS). She is a recipient of the MIT Michael Hammer Fellowship. She is interested in applied graph theory, dynamical systems, networks, and machine learning on graphs. Her work on oversmoothing in GNNs has been awarded as Spotlight paper in NeurIPS 2023.
ABSTRACT: Software is a message from one developer to other developers across time. As such, developing software incurs a social debt to all those developers who will touch this software in the future---be that an older version of the original creator or someone who isn't even born yet. Understood this way, software development poses two challenges: (1) companies must learn to identify people who understand this idea, because being able to "grind leetcode" doesn't qualify; (2) colleges must create alternative programming curricula to turn students into apprentice developers, because the traditional curriculum doesn't.
This talk will present my answer to the second challenge. I have spent the last 25 years creating undergraduate programming courses that are all about software-as-a-message, and the talk will provide an overview of this alternative curriculum approach. The first challenge remains yours to overcome.
About the Speaker: Matthias Felleisen
Matthias Felleisen is Trustee Professor of Computer Science at Northeastern University. He is also a Fellow of the ACM, received the organization's Karl Karlstrom Award for his work on curriculum development, and was honored with the ACM SIGPLAN Lifetime Award for his research on programming languages.
Abstract: Cutting-edge diffusion models produce high-quality and customizable images, enabling their use for commercial art and graphic design purposes. First, I will discuss our study of various frameworks to detect replication. Then, I will show how we used these frameworks to identify memorization in Stable Diffusion 1.4 model. In the second part, I will discuss various factors contributing to memorization in diffusion models. While it is widely believed that duplicated images in the training set are responsible for content replication at inference time, I will show results on how the text conditioning of the model also plays an important role. Lastly, I will discuss several techniques we proposed based on these findings for reducing data replication in both training and inference times.
Abstract: The recent demonstration of Carlini et al. shows highly duplicated training images can be copied by diffusion models during generation, which is problematic in terms of data privacy and copyright. Known as an extraction attack, this method reconstructs training images using only a model's generated samples. As the original work requires on the order of gpu-years to perform, we provide a pipeline that can run in gpu-days and can extract a similar number of images. We first de-duplicate the public dataset LAION-2B and demonstrate a high level of duplicated images. We then provide whitebox and blackbox extraction attacks on par with the original attack, whilst requiring significantly less network evaluations. As we can evaluate more samples, we expose the phenomenon of template copies, wherein a diffusion model copies a fixed image region and varies another. We demonstrate that new diffusion models that deduplicate their training set do not generate exact copies as in Carlini et al., but do generate templates. We conclude with several insights into copied images from a data perspective.
Google Algorithms Seminar - ABSTRACT: In predictions-to-decisions pipelines, statistical forecasts are useful insofar as they are a trustworthy guide to downstream rational decision making.
Towards this, we study the problem of making online predictions of an adversarially chosen high-dimensional state that are \emph{unbiased} subject to an arbitrary collection of conditioning events, with the goal of tailoring these events to downstream decision makers who will use these predictions to inform their actions. We give efficient general algorithms for solving this problem, and a number of applications that stem from choosing an appropriate set of conditioning events, including:
(1) tractable algorithms with strong no-regret guarantees over large action spaces, (2) a high-dimensional best-in-class (omniprediction) result, (3) fairness guarantees of various flavors, and (4) a novel framework for uncertainty quantification in multiclass settings.
For example, we can efficiently produce predictions targeted at any polynomially many decision makers, offering each of them optimal swap regret if they simply best respond to our predictions. Generalizing this, in the online combinatorial optimization setting we obtain the first efficient algorithms that guarantee, for up to polynomially many decision makers, no regret on any polynomial number of {subsequences} that can depend on their actions as well as any external context. As an illustration, for the online routing problem this easily implies --- for the first time --- efficiently obtainable no-swap-regret guarantees over all exponentially many paths that make up an agent's action space (and this can be obtained for multiple agents at once); by contrast, prior no swap regret algorithms such as Blum-Mansour would be intractable in this setting as they need to enumerate over the exponentially large action space. Moreover, our results imply novel regret guarantees for extensive-form games.
Turning to uncertainty quantification in ML, we show how our methods let us estimate (in the online adversarial setting) multiclass/multilabel probability vectors in a transparent and trustworthy fashion: in particular, downstream prediction set algorithms (i.e., models that predict multiple labels at once rather than a single one) will be incentivized to simply use our estimated probabilities as if they were the true conditional class probabilities, and their predictions will be guaranteed to satisfy multigroup fairness and other "conditional coverage" guarantees. This gives a powerful new alternative to well-known set-valued prediction paradigms such as conformal and top-K prediction. Moreover, our predictions can be guaranteed to be "best-in-class" --- i.e. to beat any polynomial collection of other (e.g. NN-based) multiclass vector predictors, simultaneously as measured by all Lipschitz Bregman loss functions (including L2 loss, cross-entropy loss, etc.). This can be interpreted as a high-dimensional omniprediction result.
ABOUT THE SPEAKER: George Noarov is a CS PhD student at the University of Pennsylvania, advised by Michael Kearns and Aaron Roth. He is broadly interested in theoretical CS and ML, with particular focus on fair/robust/trustworthy ML, online learning, algorithmic game theory, and uncertainty quantification. He obtained his B.A. in Mathematics from Princeton University, where his advisors were Mark Braverman and Matt Weinberg. He has received several academic awards, and his research has been supported by an Amazon Gift for Research in Trustworthy AI.
Google Algorithms Seminar - ABSTRACT: Attention layers, as commonly used in transformers, form the backbone of modern deep learning, yet there is little mathematical work detailing their benefits and deficiencies as compared with other architectures. In this talk, I'll present both positive and negative results on the representation power of attention layers, with a focus on intrinsic complexity parameters such as width, depth, and embedding dimension. On the positive side, I'll present a sparse averaging task, where recurrent networks and feedforward networks all have complexity scaling polynomially in the input size, whereas transformers scale merely logarithmically in the input size. On the negative side, I'll present a triple detection task, where attention layers in turn have complexity scaling linearly in the input size. I'll discuss these results and some of our proof techniques, which emphasize the value of communication complexity in the analysis of transformers. Based on joint work with Daniel Hsu and Matus Telgarsky.
Bio: Clayton Sanford is an incoming 5th (and final) year PhD student at Columbia studying machine learning theory. His work focuses primarily on the representational properties and inductive biases of neural networks. He has additionally worked on solving learning combinatorial algorithms with transformers (as a Microsoft Research intern this summer) and climate modeling with ML (as an Allen Institute for AI intern in summer 2022).
Happy to share our #web3talks fireside chat with our special guest Steven Goldfeder, CEO & Co-Founder of Offchain Labs and builder of the Arbitrum chain. Fun fact, Offchain Labs celebrated their 5th anniversary on the day of this chat. Arbitrum mainnet went live earlier this year and has had great success so far.
We dive deep on the Arbitrum chain, get his perspective on L2s, and how his team is working to enable developers and scale Arbitrum.
Thank you Steven for joining!
Hosted by,
Marlon Ruiz
linkedin.com/in/ruizmarlon
Google web3 community lead | customer engineer
#blockchain #dapps #crypto #defi #web3 #arbitrum #google or #googletechtalks
Google Algorithms Seminar. ABSTRACT: We initiate an investigation of private sampling from distributions. Given a dataset with n independent observations from an unknown distribution P, a sampling algorithm must output a single observation from a distribution that is close in total variation distance to P while satisfying differential privacy. Sampling abstracts the goal of generating small amounts of realistic-looking data.
We provide tight upper and lower bounds for the dataset size needed for this task for three natural families of distributions: arbitrary distributions on {1,…,k}, arbitrary product distributions on {0,1}^d, and product distributions on {0,1}^d with bias in each coordinate bounded away from 0 and 1. We demonstrate that, in some parameter regimes, private sampling requires asymptotically fewer observations than learning a description of P nonprivately; in other regimes, however, private sampling proves to be as difficult as private learning. Notably, for some classes of distributions, the overhead in the number of observations needed for private learning compared to non-private learning is completely captured by the number of observations needed for private sampling.
This work appeared at NeurIPS and the full version can be found here: arxiv.org/abs/2211.08193
Bio: Marika is a rising fifth year PhD student at Boston University advised by Adam Smith. Her research spans differential privacy, cryptography, and their intersection with legal questions, and more recently she has become interested in practical implementations of differential privacy. Before interning at Google, she was a visiting assistant professor at Reed College, and she has also done research for Tumult Labs.
Google Algorithms Seminar ABSTRACT: Neighborhood and graph construction is fundamental in data analysis and machine learning. k-nearest neighbor (kNN) and epsilon-neighborhood methods are the most commonly used methods for this purpose due to their computational simplicity. However, the interpretation and the choice of parameter k/epsilon, though receiving much attention over the years, still remains ad hoc.
In this talk, I will present an alternative view of neighborhoods where I demonstrate that neighborhood definitions are sparse signal approximation problems. Specifically, we will see that (1) kNN and epsilon-neighborhood approaches are sub-optimal thresholding-based representations; (2) an improved and efficient definition based on basis pursuits exists, namely, non-negative kernel regression (NNK); and (3) selecting orthogonal signals for sparse approximation corresponds to the selection of neighbors that are not geometrically redundant. NNK neighborhoods are adaptive, sparse, and exhibit superior performance in graph-based signal processing and machine learning.
We will then discuss a k-means like algorithm where we leverage the polytope geometry and sparse coding view of NNK for data summarization and outlier detection. I will conclude by discussing a graph framework for an empirical understanding of deep neural networks (DNN). The developed graph metrics characterize the input-output geometry of the embedding spaces induced in DNN and provide insights into the similarities and differences between models, their invariances, and their generalization and transfer learning performances.
Bio: Sarath Shekkizhar received his bachelor's (Electronics and Communication) and double master's (Electrical Engineering, Computer Science) degrees from the National Institute of Technology, Tiruchirappalli, India, and the University of Southern California (USC), Los Angeles, USA, respectively. He recently graduated from Antonio Ortega's group with his doctoral degree in Electrical and Computer Engineering at USC. He is the recipient of the IEEE best student paper award at ICIP 2020 and was named a Rising Star in Signal Processing at ICASSP 2023. His research interests include graph signal processing, non-parametric methods, and machine learning.