Getting AI to Forget About You: Data Deletion From Machine Learning Models @SimonsInstituteTOC

Getting AI to Forget About You: Data Deletion From Machine Learning Models

Simons Institute 2019-05-06 | James Zou (Stanford University)
https://simons.berkeley.edu/talks/getting-ai-forget-about-you-data-deletion-machine-learning-models
Beyond Differential Privacy

The Enigma of LLMs: on Creativity, Compositionality, Pluralism, and Paradoxes

Simons Institute 2024-10-16 | Yejin Choi (University of Washington / NVIDIA)
https://simons.berkeley.edu/talks/yejin-choi-university-washington-nvidia-2024-09-06
Special Year on Large Language Models and Transformers: Part 1 Boot Camp

About a year ago, I gave a talk at the Simons Institute workshop on "Large Language Models and Transformers" on "Possible Impossibilities and Impossible Possibilities". This talk is a follow-up on that talk, sharing my own experiences encountering divisive reactions from the research community, and following up on some of the open research questions with potentially confusing findings and even more open research questions. More concretely, I will touch on creativity, compositionality, and pluralism of LLMs, followed by generative AI paradoxes. The underlying thesis of my talk will be that it is time for us to acknowledge that we don't know about LLMs as much as some might like to assume we do, and that "Science of LLMs" is an important research direction to complement "Engineering of LLMs".

Introduction to Transformers

Simons Institute 2024-10-16 | Daniel Hsu (Columbia University)
https://simons.berkeley.edu/talks/daniel-hsu-columbia-university-2024-09-04
Special Year on Large Language Models and Transformers: Part 1 Boot Camp

Toward Understanding In-context Learning

Simons Institute 2024-10-16 | Tengyu Ma (Stanford University)
https://simons.berkeley.edu/talks/tengyu-ma-stanford-university-2024-09-04
Special Year on Large Language Models and Transformers: Part 1 Boot Camp

I will introduce the in-context learning capability of large language models, the ability to learn to solve a downstream task simply by conditioning on a prompt consisting of input-output examples without any parameter updates. I will present a few papers that aim to theoretically explain the mechanisms of in-context learning on simplified data distributions.

LLM Safety, Alignment, and Generalization

Simons Institute 2024-10-16 | Roger Grosse (University of Toronto)
https://simons.berkeley.edu/talks/roger-grosse-university-toronto-2024-09-05
Special Year on Large Language Models and Transformers: Part 1 Boot Camp

As LLM capabilities improve rapidly across a range of domains (including ones the designers didn’t intend), it becomes increasingly challenging to rule out catastrophic harms. I’ll argue for the need to make affirmative safety cases for LLMs. Once LLMs are capable of complex autonomous plans, understanding their motivational structures becomes increasingly central to safety. I’ll highlight the need for a science of LLM generalization so that we can understand how the training data affects a model’s beliefs and motivations.

On large language models and transformers: perspectives from physics, neuroscience, and theory

Simons Institute 2024-10-16 | Surya Ganguli (Stanford University)
https://simons.berkeley.edu/talks/surya-ganguli-stanford-university-2024-09-05
Special Year on Large Language Models and Transformers: Part 1 Boot Camp

Training Large Language Models: Practices and Research Questions

Simons Institute 2024-10-16 | Danqi Chen (Princeton University)
https://simons.berkeley.edu/talks/danqi-chen-princeton-university-2024-09-05
Special Year on Large Language Models and Transformers: Part 1 Boot Camp

In this tutorial, I will provide a comprehensive walk-through of the pipeline for training large language models, covering both pre-training and post-training phases. My goal is to discuss the best practices at each stage of training as known today—drawing from open models and public research papers—including data curation, training algorithms, and safety mitigations. The tutorial aims to serve as a starting point to facilitate discussions on the open research questions in training the next generation of large language models.

Iterative preference learning methods for large language model post training

Simons Institute 2024-10-16 | Wei Xiong (UIUC)
https://simons.berkeley.edu/talks/wei-xiong-uiuc-2024-09-12
Emerging Generalization Settings

Reinforcement Learning from Human Feedback (RLHF) is the leading technique to align foundation large language (LLM) model with the human preferences, and achieves tremendous successes in the application of Chat-GPT, Gemini, and Claude. Despite its successes, our understanding of this new learning paradigm is still limited especially for the open-source community. In this talk, we begin with a standard mathematical formulation—the reverse-KL regularized contextual bandit—and explore its learnability from a statistical efficiency standpoint. Our findings demonstrate that RLHF benefits from continuous online exploration through interactions with human evaluators. Drawing on these insights, we introduce a novel, provably efficient online iterative training framework. This framework leads to the development of innovative RLHF algorithms such as iterative direct preference learning. Additionally, we will also discuss the practical experimental details to make the state-of-the-art chatbot with only open-source data within this framework, as demonstrated in our open-source project RLHFlow.

Know When You Know: Handling Adversarial Data by Abstaining

Simons Institute 2024-10-16 | Surbhi Goel (University of Pennsylvania)
https://simons.berkeley.edu/talks/surbhi-goel-university-pennsylvania-2024-09-13
Emerging Generalization Settings

In this talk, we will focus on the problem of sequential prediction in the stochastic setting subject to adversarial interference (or distribution shift), where clean-label adversarial (or out-of-distribution) examples can be injected at any point. Traditional algorithms, designed for purely stochastic data, fail to generalize effectively in the presence of such injections, often leading to erroneous predictions. Conversely, approaches that assume fully adversarial data yield overly pessimistic bounds, offering limited practical utility. To overcome these limitations, we will introduce a new framework that allows the learner to abstain from making predictions at no cost on adversarial injections, thereby asking the learner to make predictions only when it is certain. We will design algorithms in this new model that retain the guarantees of the purely stochastic setting even in the presence of arbitrarily many adversarial examples. We will conclude with several exciting open questions that our new framework posits.

On the Curses of Future and History in Off-policy Evaluation in non-Markov Environments

Simons Institute 2024-10-16 | Nan Jiang (University of Illinois Urbana-Champaign)
https://simons.berkeley.edu/talks/nan-jiang-university-illinois-urbana-champaign-2024-09-12
Emerging Generalization Settings

Coverage is a central concept in learning decision-making strategies from data, which characterizes how much the data---which is collected using a certain policy---tells us about a *different* policy. The mathematical characterization of coverage is extensively studied in Markov settings, e.g., offline RL in MDPs, where the basic definition takes the form of state density-ratio boundedness. However, real-world applications of RL, including RLHF in LLMs, often deal with non-Markov observations (e.g., sequence of tokens), and a direct reduction to the Markov case (by treating history as state) leads to exponentially large coverage coefficients, which is non-satisfactory in theory and also fails to explain practical successes.

In this work, we aim to develop better theoretical understanding of coverage in non-Markov environments, in a minimal setup of off-policy evaluation (OPE) in partially observable MDPs (POMDPs). We propose a novel framework called future-dependent value functions, and identify coverage assumptions tailored to the structure of POMDPs, namely belief coverage and outcome coverage. Under these coverage assumptions, we provide estimators that enable the first polynomial sample complexity (in the sense of no exponential dependence on horizon) guarantee for OPE in POMDPs.

Neural Networks meet Nonparametric Regression: Generalization by Weight Decay and Large...

Simons Institute 2024-10-16 | Yu-Xiang Wang (UC San Diego)
https://simons.berkeley.edu/talks/yu-xiang-wang-uc-san-diego-2024-09-13
Emerging Generalization Settings

How do overparameterized deep learning models avoid overfitting? Why do deep neural networks work better in practice than classical methods, e.g., kernels / splines? The talk covers a recent line of research that inspects DNNs from the classical non-parametric regression (or “curve fitting”) which reveals that the reason why DNNs work better might be due to its adaptivity when we tune its standard hyperparameters, which implicitly discovers hidden sparsity and low-dimensional structures. I will go over theory and examples to illustrate this point. The results provide new insight on overparameterization, representation learning, and how neural networks generalize (often adaptively and nearly optimally) through optimization-algorithm induced implicit bias such as Edge-of-Stability and Minima Stability.

Does generalization imply accuracy on the line? A new look at robust generalization

Simons Institute 2024-10-16 | Sanmi Koyejo (Stanford University)
https://simons.berkeley.edu/talks/sanmi-koyejo-stanford-university-2024-09-11
Emerging Generalization Settings

Human and machine inductive biases for compositional linguistic generalization

Simons Institute 2024-10-16 | Najoung Kim (Boston University)
https://simons.berkeley.edu/talks/najoung-kim-boston-university-2024-09-11
Emerging Generalization Settings

Compositionality is considered a central property of human language. One key benefit of compositionality is the generalization it enables---the production and comprehension of novel expressions analyzed as new compositions of familiar parts. Whether artificial neural networks (ANNs) can generalize in such a way, and if so, the conditions under which such a generalization is achieved have been a longstanding line of inquiry in Cognitive Science. In this talk, I will discuss several semantic parsing tests we proposed to evaluate compositional linguistic generalization in ANNs, reviewing modeling results from the past few years and comparing them to human generalization patterns. In short, models can match human patterns in cases where only lexical substitution of known examples is required, but fail to do so when the generalization targets are structurally novel without being augmented with targeted structural scaffolding. In addition to this general picture, I will further highlight the difficulty of testing generalization in the current modeling landscape without open access to the training data, as well as the need and opportunity for better understanding structural generalization in humans.

Generalization in Robotic Behavior Cloning: Pitfalls and Promises

Simons Institute 2024-10-16 | Max Simchowitz (Carnegie Mellon University)
https://simons.berkeley.edu/talks/max-simchowitz-carnegie-mellon-university-2024-09-13
Emerging Generalization Settings

As AI agents interact with their environments, they may fail to generalize due to conditions at interaction time differing substantially from the given training data. This talk will provide a perspective on these challenges from the world of robotics.

We will explore how robotic behavior cloning — teaching a robot to imitate from example demonstrations — in continuous state/action spaces differs from agents which take in and produce discrete tokens, such as LLMs. By combining ideas from control theoretic stability, generative sampling oracles, and a couple tricks from statistics, we introduce a framework for behavior cloning that enables an agent to imitate nearly arbitrary behavior with provable guarantees, even when the dynamics governing the agent and environments interaction are nonlinear and defy classical control-theoretic notions of stability. We conclude with some emerging empirical methodology that might lead to more generalizable, general-purpose robot agents.

Unexpected Test Losses from Generalization Theory?

Simons Institute 2024-10-16 | Frederic Koehler (University of Chicago)
https://simons.berkeley.edu/talks/frederic-koehler-university-chicago-2024-09-13
Emerging Generalization Settings

Language Model Alignment: Theory & Algorithms

Simons Institute 2024-10-16 | Ahmad Beirami (Google)
https://simons.berkeley.edu/talks/ahmad-beirami-google-2024-09-12
Emerging Generalization Settings

The goal of the language model alignment (post-training) process is to draw samples from an aligned distribution that improves a reward (e.g., make the generation safer or more factual) but does not perturb much from the base model. A simple baseline for this task is best-of-N, where N responses are drawn from the base model, ranked based on a reward, and the highest ranking one is selected. More sophisticated techniques generally solve a KL-regularized reinforcement learning (RL) problem with the goal of maximizing expected reward subject to a KL divergence constraint between the aligned model and the base model. In this talk, we give an overview of language model alignment and give an understanding of key results in this space through simplified examples. We also present a new modular alignment technique, called controlled decoding, which solves the KL-regularized RL problem while keeping the base model frozen through learning a prefix scorer, offering inference-time configurability. Finally, we also shed light on the remarkable performance of best-of-N in terms of achieving competitive or even better reward-KL tradeoffs when compared to state-of-the-art alignment baselines.

Controlling distribution shifts in language models: a data-centric approach.

Simons Institute 2024-10-16 | Tatsunori Hashimoto (Stanford University)
https://simons.berkeley.edu/talks/tatsunori-hashimoto-stanford-university-2024-09-12
Emerging Generalization Settings

Language model pretraining has been a remarkably strong recipe for cross-task and cross-domain generalization in NLP. However, these gains have come at the expense of control: we rarely control the training data for language models, and gaps between pretraining and our target evaluation lead to distribution shifts. We present two complementary approaches to control this gap — algorithmically filtering data to focus training on the most benchmark-relevant parts of the distribution, as well as adapting to new domains by synthesizing domain-specific pretraining data at scale.

In-Context Principle Learning from Mistakes

Simons Institute 2024-10-16 | Uri Alon (Google DeepMind)
https://simons.berkeley.edu/talks/uri-alon-google-deepmind-2024-09-09
Emerging Generalization Settings

When do dependencies in your data help?

Simons Institute 2024-10-16 | Ankur Moitra (Massachusetts Institute of Technology)
https://simons.berkeley.edu/talks/ankur-moitra-massachusetts-institute-technology-2024-09-13
Emerging Generalization Settings

The problem of learning graphical models from iid data is widely studied, but unfortunately strong computational lower bounds are known when there are higher order dependencies. I will show how assuming the data is generated by a natural process called the Glauber dynamics allows us to circumvent these barriers, by harnessing the dependencies. This is based on joint work with Jason Gaitonde and Elchanan Mossel.

Generalization in Diffusion Models Arises from Geometry-Adaptive Harmonic Rrepresentations

Simons Institute 2024-10-16 | Zahra Kadkhodaie (New York University)
https://simons.berkeley.edu/talks/zahra-kadkhodaie-new-york-university-2024-09-10
Emerging Generalization Settings

Robust Generalization in the Era of LLMs: Jailbreaking Attacks and Defenses

Simons Institute 2024-10-16 | Hamed Hassani (University of Pennsylvania)
https://simons.berkeley.edu/talks/hamed-hassani-university-pennsylvania-2024-09-11
Emerging Generalization Settings

Despite efforts to align large language models (LLMs) with human intentions, popular LLMs such as GPT, Llama, Claude, and Gemini are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content. For this reason, interest has grown in improving the robustness of LLMs against such attacks. In this talk, we review the current state of the jailbreaking literature, including new questions about robust generalization, discussions of new black-box attacks on LLMs, defenses against jailbreaking attacks, and a new leaderboard to evaluate the robust generalization of production LLMs.

Generalization from the behavioral perspective

Simons Institute 2024-10-16 | Max Raginsky (University of Illinois at Urbana-Champaign)
https://simons.berkeley.edu/talks/max-raginsky-university-illinois-urbana-champaign-2024-09-12
Emerging Generalization Settings

Adaptive Generalization: The Role of Dynamic Evaluation in Low-Resource Settings

Simons Institute 2024-10-16 | Diyi Yang (Stanford University)
https://simons.berkeley.edu/talks/diyi-yang-stanford-university-2024-09-09
Emerging Generalization Settings

How well does diffusion model generate? - ...

Simons Institute 2024-10-16 | Molei Tao (Georgia Tech), Yuqing Wang (Georgia Institute of Technology)
https://simons.berkeley.edu/talks/molei-tao-georgia-tech-2024-09-10
Emerging Generalization Settings

The Pitfalls of Next-token Prediction

Simons Institute 2024-10-16 | Vaishnavh Nagarajan (Google)
https://simons.berkeley.edu/talks/vaishnavh-nagarajan-google-2024-09-11
Emerging Generalization Settings

Can a mere next-token predictor faithfully model human intelligence? We crystallize this intuitive concern and point out prevalent fallacies in this ongoing debate. Primarily, we argue that the two often-conflated phases of next-token prediction -- autoregressive inference and teacher-forced training -- must be treated distinctly. The popular criticism that errors can compound during autoregressive inference, crucially assumes that teacher-forcing has learned an accurate next-token predictor. This assumption sidesteps a more deep-rooted problem we expose: in certain classes of tasks, teacher-forcing can simply fail to learn an accurate next-token predictor in the first place. We describe a general mechanism of how teacher-forcing can fail, and design a minimal planning task where both the Transformer and the Mamba architecture empirically fail in that manner -- remarkably, despite the task being straightforward to learn. We provide preliminary evidence that this failure can be resolved when training to predict multiple tokens in advance. We hope this finding can ground future debates and inspire explorations beyond the next-token prediction paradigm.

When calibration goes awry: hallucination in language models

Simons Institute 2024-10-16 | Adam Kalai (OpenAI)
https://simons.berkeley.edu/talks/adam-kalai-openai-2024-09-10
Emerging Generalization Settings

“Hallucinations” are a major problem for language models. We shed light on this phenomenon by showing that calibration, which is naturally encouraged during the pre-training of language models, leads to hallucinations. Moreover, the rate of hallucinations depends on the domain via the classic Good-Turing estimator. Interestingly, this estimate is small for facts like paper titles, which have been a notorious source of hallucinations. The analysis also suggests methods for mitigating hallucinations. This is joint work with Santosh Vempala and was done while the speaker was at Microsoft Research New England.

Out-of-Distribution Generalization as Reasoning: Are LLMs Competitive?

Simons Institute 2024-10-16 | Les Valiant (Harvard University)
https://simons.berkeley.edu/talks/les-valiant-harvard-university-2024-09-10
Emerging Generalization Settings

Humans can answer questions such as “Did Aristotle own a coffee making machine?” without having seen many or even any examples that showed Aristotle owning or missing a kitchen appliance. We suggest that humans answer such questions by chaining together beliefs that are each learned from examples in the PAC sense, and then chained in a way that is provably sound. Following the robust logic framework, soundness is interpreted here in the probabilistic sense that if the beliefs that are combined are each supported by the training data to a certain probability, such as 90%, then the conclusion of the chaining will be provably also supported by the training data to some other level, such as 80%. Just as PAC learning is principled, but for its success needs something from the world, namely that the concept is learnable from the available data, sound chaining as just described is also principled, but needs from the world that the component beliefs are learnable separately from the available data to sufficient accuracy. If the world is modular in that it abounds with rules that are separately learnable from data over different limited feature sets, then the chaining process will make predictions that are technically out-of-distribution but still principled. We shall discuss the power of the robust logic framework in this context, and its relevance to large language models.

LLM Metacognition: Understanding and leveraging Thinking about Thinking

Simons Institute 2024-10-16 | Sanjeev Arora (Princeton University)
https://simons.berkeley.edu/talks/sanjeev-arora-princeton-university-2024-09-09
Emerging Generalization Settings

The talk will present evidence that today’s large language models (LLMs) display somewhat deeper ``understanding’’ than one would naively expect. This understanding has to do with their own "skills".

1. When asked to solve a task by combining a set of k simpler skills (“test of compositional capability”), they are able to do so despite not having seen those combinations of skills during their training.

2. They show ability to reason about of their own learning processes [Didolkar, Goyal et al'24] which is reminiscent of "metacognitive knowledge"[Flavel'76] in humans. For instance, given examples of an evaluation task, they can produce a catalog of suitably named skills that are relevant for solving each example of that task. Furthermore, this catalog of skills is meaningful, in the sense that incorporating it into training and reasoning pipelines improves performance (including of other unrelated LLMs) on that task. Or they can generate powerful synthetic datasets for Instruction-following [Kaur, Park et al'24] and MATH [Shah et al'24]

We discuss mechanisms by which such complex understanding could arise (including a theory by [Arora,Goyal’23] that tries to explain (a)).

Transformers, parallel computation, and logarithmic depth

Simons Institute 2024-10-15 | Daniel Hsu (Columbia University)
https://simons.berkeley.edu/talks/daniel-hsu-columbia-university-2024-09-23
Transformers as a Computational Model

We show that a constant number of self-attention layers can efficiently simulate, and be simulated by, a constant number of communication rounds of Massively Parallel Computation. As a consequence, we show that logarithmic depth is sufficient for transformers to solve basic computational tasks that cannot be efficiently solved by several other neural sequence models and sub-quadratic transformer approximations. We thus establish parallelism as a key distinguishing property of transformers. This is joint work with Clayton Sanford (Google) and Matus Telgarsky (NYU).

Limitations of attention mechanism, with implications in generalization and optimization

Simons Institute 2024-10-15 | Bingbin Liu (Carnegie Mellon University)
https://simons.berkeley.edu/talks/bingbin-liu-carnegie-mellon-university-2024-09-23
Transformers as a Computational Model

We study Transformer's reasoning capabilities by formulating sequential reasoning tasks with finite-state automata. We show that o(T)-layer Transformers can simulate T steps of sequential reasoning, leveraging tools from Krohn-Rhodes theory and circuit complexity. However, our empirical findings reveal that models trained in practice often fail to discover such optimal constructions, due to optimization challenges and the richness of representation. Notably, Transformers struggle with out-of-distribution generalization on a simple task easily solved by RNNs. The two types of OOD failures highlight two inherent limitations of the Transformer architecture.

The Parallelism Tradeoff: Understanding Transformer Expressivity Through Circuit Complexity

Simons Institute 2024-10-15 | Will Merrill (New York University)
https://simons.berkeley.edu/talks/will-merrill-new-york-university-2024-09-23
Transformers as a Computational Model

Despite their omnipresence in modern NLP, characterizing the computational power of transformer neural nets remains an interesting open question. We prove that transformers whose arithmetic precision is logarithmic in the number of input tokens (and whose feedforward nets are computable using space linear in their input) can be simulated by constant-depth logspace-uniform threshold circuits. This provides insight on the power of transformers using known results in complexity theory. For example, if L≠P (i.e., not all poly-time problems can be solved using logarithmic space), then transformers cannot even accurately solve linear equalities or check membership in an arbitrary context-free grammar with empty productions. Our result intuitively emerges from the transformer architecture's high parallelizability. We thus speculatively introduce the idea of a fundamental parallelism tradeoff: any model architecture as parallelizable as the transformer will obey limitations similar to it. Since parallelism is key to training models at massive scale, this suggests a potential inherent weakness of the scaling paradigm.

Talk By Misha Belkin (UCSD)

Simons Institute 2024-10-15 | Misha Belkin (UCSD)
https://simons.berkeley.edu/talks/misha-belkin-ucsd-2024-09-23
Transformers as a Computational Model

Iterated Models: Expressive Power, Learning, and Chain of Thought

Simons Institute 2024-10-15 | Nati Srerbo (Toyota Technological Institute at Chicago)
https://simons.berkeley.edu/talks/nati-srerbo-toyota-technological-institute-chicago-2024-09-23
Transformers as a Computational Model

We consider sequence-to-sequence models that iterate the same function, from some base function class, at every step to obtain the next token. E.g., transformers use the same weights and thus same mapping at each step to obtain the next token from the previous one. We discuss the computational/representational power of such models even with very simple base classes, and the sample and computational complexity of learning either end-of-end or with access to the entire "chain of thought".

The emergence of clusters in self-attention dynamics

Simons Institute 2024-10-15 | Philippe Rigollet (MIT)
https://simons.berkeley.edu/talks/philippe-rigollet-mit-2024-09-24
Transformers as a Computational Model

Computational Benefits and Limitations of Transformers and State-Space Models

Simons Institute 2024-10-15 | Eran Malach (Kempner Institute, Harvard University)
https://simons.berkeley.edu/talks/eran-malach-kempner-institute-harvard-university-2024-09-24
Transformers as a Computational Model

In this talk, we will discuss the mechanisms that enable retrieval, copying, and length generalization in language models, as well as how the choice of network architecture influences the model's success or failure in basic tasks. First, we will present theoretical and empirical evidence demonstrating that Transformers, the dominant architecture for sequence modeling, excel at copying and retrieval tasks, whereas LSTM and state-space models (e.g., Mamba) perform poorly on these same tasks. Next, we will show how the ability of Transformers to copy long sequences can be leveraged to achieve length generalization across various algorithmic and arithmetic tasks.

Transformer Expressivity and Formal Logic

Simons Institute 2024-10-15 | David Chiang (University of Notre Dame)
https://simons.berkeley.edu/talks/david-chiang-university-notre-dame-2024-09-24
Transformers as a Computational Model

I will present several results (published and unpublished) relating transformers to first order logic, linear temporal logic, and counting extensions of both. These results reveal abilities and limitations of transformers as well as implications like the effect of depth on expressivity.

Interpretability Agents

Simons Institute 2024-10-15 | Sarah Schwettmann (MIT)
https://simons.berkeley.edu/talks/sarah-schwettmann-mit-2024-09-24
Transformers as a Computational Model

Currently, answering a new question about a model requires an enormous amount of effort by experts. Researchers must formalize their question, formulate hypotheses about a model’s decision-making process, design datasets on which to evaluate model behavior, then use these datasets to refine and validate hypotheses. Consequently, intensive explanatory auditing is beyond the reach of most model users and providers, and applications of mechanistic interpretability are bottlenecked by the need for human labor. How can we usefully automate and scale model interpretation?

I will introduce Automated Interpretability Agents (AIAs) that, given a question about a model of interest, design and perform experiments on the model to answer the question. This paradigm encompasses both behavioral testing (as commonly applied in fairness and safety applications) and more basic, mechanistic research questions. AIAs are built from language models equipped with tools, and compose interpretability subroutines into Python programs. They operationalize hypotheses about models as code, and update those hypotheses after observing model behavior on inputs for which they make different predictions. AIAs are designed modularly such that their toolkit can evolve as bottom-up work introduces new interpretability techniques, and as users encounter new applications. I will present recent work showing that AIAs reach human-level performance on a variety of model understanding tasks. My hope is that this line of research helps lay the groundwork for a richer interface for interpretability: one that is iterative, modular, allows real-time testing of hypotheses, and scales to large and complex models.

Language Generation in the Limit

Simons Institute 2024-10-15 | Jon Kleinberg (Cornell University)
https://simons.berkeley.edu/talks/jon-kleinberg-cornell-university-2024-09-25
Transformers as a Computational Model

Although current large language models are complex, the most basic specifications of the underlying language generation problem itself are simple to state: given a finite set of training samples from an unknown language, produce valid new strings from the language that don't already appear in the training data. Here we ask what we can conclude about language generation using only this specification, without any further properties or distributional assumptions. In particular, we consider models in which an adversary enumerates the strings of an unknown target language that is known only to come from a possibly infinite list of candidates, and we show that it is possible to give certain non-trivial guarantees for language generation in this setting. The resulting guarantees contrast dramatically with negative results due to Gold and Angluin in a well-studied model of language learning where the goal is to identify an unknown language from samples; the difference between these results suggests that identifying a language is a fundamentally different problem than generating from it. (This is joint work with Sendhil Mullainathan.)

Do Large Language Models Perform Latent Reasoning? (Remote Talk)

Simons Institute 2024-10-15 | Mor Geva (Tel Aviv University)
https://simons.berkeley.edu/talks/mor-geva-tel-aviv-university-2024-09-25
Transformers as a Computational Model

Language Acquisition in Language Models

Simons Institute 2024-10-15 | Naomi Saphra (Kempner Institute at Harvard University)
https://simons.berkeley.edu/talks/naomi-saphra-kempner-institute-harvard-university-2024-09-25
Transformers as a Computational Model

Most interpretability research in NLP focuses on understanding the behavior and features of a fully trained model. However, certain insights into model behavior may only be accessible by observing the trajectory of the training process. We present a case study of syntax acquisition in masked language models (MLMs) that demonstrates how analyzing the evolution of interpretable artifacts throughout training deepens our understanding of emergent behavior. In particular, we study Syntactic Attention Structure (SAS), a naturally emerging property of MLMs wherein specific Transformer heads tend to focus on specific syntactic relations. We identify a brief window in pretraining when models abruptly acquire SAS, concurrent with a steep drop in loss. This breakthrough precipitates the subsequent acquisition of linguistic capabilities. We then examine the causal role of SAS by manipulating SAS during training, and demonstrate that SAS is necessary for the development of grammatical capabilities. We further find that SAS competes with other beneficial traits during training, and that briefly suppressing SAS improves model quality. These findings offer an interpretation of a real-world example of both simplicity bias and breakthrough training dynamics.

What was Revolutionized by the Transformer Revolution?

Simons Institute 2024-10-15 | Stella Biderman (ElutherAI)
https://simons.berkeley.edu/talks/stella-biderman-elutherai-2024-09-25
Transformers as a Computational Model

The advent of large scale generative artificial intelligence has rapidly changed the landscape of machine learning and even computer science as a whole. While this is often credited to the transformer architecture being special, Stella will argue that the transformer has actually been the vehicle for a paradigm shift in how machine learning processes are developed. She will then discuss how this paradigm shift has created new research questions, sharing results from several of her recent papers ranging in topic from memorization to mechanistic interpretability to evaluation science.

Learning to Reason with LLMs

Simons Institute 2024-10-15 | Noam Brown (OpenAI)
https://simons.berkeley.edu/talks/noam-brown-openai-2024-09-26
Transformers as a Computational Model

Large language models (LLMs) have demonstrated remarkable capabilities in generating coherent text and completing various natural language tasks. Nevertheless, their ability to perform complex, general reasoning has remained limited. In this talk, I will describe OpenAI's new o1 model, an LLM trained via reinforcement learning to generate a hidden chain of thought before its response. We have found that the performance of o1 consistently improves with more reinforcement learning compute and with more inference compute. o1 surpasses previous state-of-the-art models in a variety of benchmarks that require reasoning, including mathematics competitions, programming contests, and advanced science question sets. I will discuss the implications of scaling this paradigm even further.

Using recurrence to achieve weak to strong generalization

Simons Institute 2024-10-15 | Tom Goldstein (University of Maryland)
https://simons.berkeley.edu/talks/tom-goldstein-university-maryland-2024-09-26
Transformers as a Computational Model

Weak-to-strong generalization refers to the ability of a reasoning model to solve "harder" problems than those in its training set. I'll argue that recurrent architectures, in which networks can dynamically scale the level of computation used to solve a problem, are necessary to achieve dramatic weak to strong behavior. I'll present examples where recurrent networks exhibit weak-to-strong generalization for a range of simple reasoning problems. Then I'll show that transformer-based LLMs benefit from recurrence as well, boosting their performance on weak-to-strong arithmetic tasks.

Towards Understanding Modern Alchemy

Simons Institute 2024-10-15 | Ekin Akyurek (MIT)
https://simons.berkeley.edu/talks/ekin-akyurek-mit-2024-09-26
Transformers as a Computational Model

Language models exhibit in-context learning (ICL), the ability to learn new tasks from just a few examples of prompts in the context presented to them. Prior work has studied ICL through the lens of simple learning problems like linear regression, but there remains a gap in understanding the rich language generation capabilities exhibited in real language models. In this talk, I will discuss a new model problem for understanding ICL — in-context learning of (formal) languages (ICLL). In ICLL, language models are presented with example strings from a probabilistic language and must generate additional strings from that same language. Focusing on regular languages sampled from random finite automata, we study the behavior of a variety of sequence models on the ICLL task. We show that Transformers significantly outperform recurrent and convolutional models on these tasks. Moreover, we find evidence that their ability to do so relies on specialized "n-gram heads" (higher-order variants of induction heads) that compute input-conditional next-token distributions. Finally, we show that hard-wiring these heads into neural models improves performance not just on formal language learning, but modeling of real natural-language text — improving the perplexity of 340M-parameter models by up to 1.14 points (6.7%) on the SlimPajama dataset.

A Retrieval-based Language Model at Scale (Remote Talk)

Simons Institute 2024-10-15 | Sewon Min (UC Berkeley & AI2)
https://simons.berkeley.edu/talks/sewon-min-uc-berkeley-ai2-2024-09-26
Transformers as a Computational Model

Retrieval-based LMs, which combine learned parameters with a datastore—a large collection of text documents—offer a compelling alternative to dense models, removing the need for remembering every detail from data and allowing for seamless updating. In this talk, I present two recent works in improving retrieval-based models in the context of LLM. In the first work, we pre-train an LM to condition on retrieved documents, unlike previous approaches that use an LM trained with a standard objective as it is. Our model outperforms previous approaches, especially when retrieval context is irrelevant and distracting. In the second work, we study the scaling properties of retrieval-based LMs. With a new datastore consisting of 1.4 trillion tokens, we show that compute-optional setup is almost always with retrieval over a range of downstream tasks. I will conclude by discussing open-ended questions—whether retrieval can bring the effect of training on data, how retrieval can handle data restrictions, and the potential for modular LMs to generalize this approach.

Understanding and Improving Efficient Language Models

Simons Institute 2024-10-15 | Simran Arora (Stanford University)
https://simons.berkeley.edu/talks/simran-arora-stanford-university-2024-09-26
Transformers as a Computational Model

A key bottleneck in machine learning is compute: ML is succeeding at modeling text, code, and even DNA, but to reach the full potential, we need more resources than we currently have. The Transformer architecture, which powers industry models, requires compute and memory to grow with the input size, precluding dreams of modeling inputs containing millions of lines of code or the 3.2Bn nucleotide pairs in genome sequences. Candidate efficient language models (LMs), which seek to reduce the compute and memory requirements relative to Transformers, are emerging at a rapid pace. However, deviating from the Transformer-orthodoxy is a major risk: models are served to millions of users and take billions of dollars to train, and we have limited insight into how alternatives will impact quality. In this talk, I first discuss our research on developing our understanding of why architectures work well, in order to distill them into their most efficient forms. Despite the breadth of skills required in language modeling (syntax, fact memorization, reasoning) and breadth of efficient LMs being proposed, our ICLR 2024 work found that a single skill called associative recall (AR) surprisingly explains 80%+ of the language modeling quality difference between Transformers and a popular class of efficient LMs. A LM performs AR if it is able to recall and use information that it has seen earlier in the input. We will use AR to theoretically and empirically explain the tradeoffs across several classes of efficient LM architectures. I will then share how we use this theoretical analysis in our ICML 2024 work (Spotlight top 3.5% of 10K papers) to develop new hardware-efficient ML architectures (BASED and JRT), which have expanded the Pareto-frontier of the quality-efficiency tradeoff space beyond prior LMs.

On the Tradeoffs of State Space Models

Simons Institute 2024-10-15 | Albert Gu (Carnegie Mellon University)
https://simons.berkeley.edu/talks/albert-gu-carnegie-mellon-university-2024-09-27
Transformers as a Computational Model

This talk will provide a high level overview of a recently popular subquadratic alternative to the Transformer, the state space model (SSM). We will discuss ways to think about the characteristics of these models and the fundamental tradeoffs between SSMs and Transformers.

Exact solutions to the geometric dynamics of signal propagation through transformers predict...

Simons Institute 2024-10-15 | Surya Ganguli (Stanford University)
https://simons.berkeley.edu/talks/surya-ganguli-stanford-university-2024-09-27
Transformers as a Computational Model

Using Algorithms to Understand Transformers (and Using Transformers to Understand Algorithms)

Simons Institute 2024-10-15 | Vatsal Sharan (University of Southern California)
https://simons.berkeley.edu/talks/vatsal-sharan-university-southern-california-2024-09-27
Transformers as a Computational Model

We will discuss how algorithmic tools and understanding borrowed from optimization theory, Fourier transforms, and Boolean function analysis can help understand the mechanisms employed by Transformers to solve basic computational tasks such as linear regression and addition. We will examine the role of the architecture and pre-trained data in enabling Transformers to learn their employed mechanisms. Finally, we will discuss work on using Transformers themselves to discover and design data structures for tasks such as nearest neighbor search.

Associative memories as a building block in Transformers

Simons Institute 2024-10-15 | Alberto Bietti (Flatiron Institute)
https://simons.berkeley.edu/talks/alberto-bietti-flatiron-institute-2024-09-27
Transformers as a Computational Model

Large language models based on transformers have achieved great empirical successes. However, as they are deployed more widely, there is a growing need to better understand their internal mechanisms in order to make them more reliable. These models appear to store vast amounts of knowledge from their training data, and to adapt quickly to new information provided in their context or prompt.
Through toy tasks for reasoning and factual recall, we highlight the role of weight matrices as associative memories, and provide theoretical results on how gradients enable their learning during training, and how over-parameterization affects their storage capacity.

Re-thinking Transformers: Searching for Efficient Linear Layers over a Continuous Space of...

Simons Institute 2024-10-09 | Andrew Gordon Wilson (New York University)
https://simons.berkeley.edu/talks/andrew-gordon-wilson-new-york-university-2024-09-27
Transformers as a Computational Model

Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts to develop alternatives have focused on a small number of hand-crafted structured matrices, and have neglected to investigate whether these structures can surpass dense layers in terms of compute-optimal scaling laws when both the model size and training examples are optimally allocated. In this work, we present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. This framework encompasses many previously proposed structures, such as low-rank, Kronecker, Tensor-Train, and Monarch, along with many novel structures. We develop a taxonomy of all such operators based on their computational and algebraic properties, which provides insights into their scaling laws. Combining these insights with empirical evaluation, we identify a subset of structures that achieve better performance than dense layers as a function of training compute. To further improve their compute efficiency, we develop a natural extension of these structures that convert into a sparse mixture-of-experts layer. The resulting layer significantly outperforms dense layers in compute-optimal training efficiency for large language models