ML Efficiency for Large Models: From Data Efficiency to Faster Transformers @SimonsInstituteTOC

Simons Institute | ML Efficiency for Large Models: From Data Efficiency to Faster Transformers @SimonsInstituteTOC | Uploaded 2 months ago | Updated 9 hours ago
Vahab Mirrokni (Google Research, NYC)
https://simons.berkeley.edu/talks/vahab-mirrokni-google-research-nyc-2024-06-18
ML Efficiency for Large Models: From Data Efficiency to Faster Transformers

Scaling large models efficiently for faster training and inference is a fundamental challenge. In this talk, we present a number of algorithmic challenges and potential solutions from theory to practice. First, we discuss data efficiency and model efficiency problems that can be formalized as a subset selection problem. For model efficiency, we present sequential attention for feature selection and sparsification[ICLR'23, arxiv]. For data efficiency, we present a sensitivity sampling technique for improved quality and efficiency of the models[ICML'24]. Furthermore, we discuss the intrinsic quadratic complexity of attention models as well as token generation. We first discuss HyperAttention; a technique to develop linear-time attention algorithms under mild assumptions[ICLR'24]. We then present PolySketchFormer, a technique to bypass the hardness results of achieving sub-quadratic attention by applying sketching of polynomial functions[ICML'24]. Finally, we show how to address the complexity of token generation via clustering techniques[arxiv].

Evidence of social learning across symbolic cultural barriers in sperm whales

Understanding (a bit about) hallucinations in Generative AI

Harnessing the properties of equivariant neural networks to understand and design materials

Some thoughts on ML-based protein engineering

Tractable Representations for Boolean Functional Synthesis

Generalizable sampling of conformational ensembles with latent space dynamics

Cut Sparsification and Succinct Representation of Submodular Hypergraphs

Graph Connectivity Using Star Contraction

$Almost-Optimal Sublinear Additive Spanners Zihan Tan (Rutgers University) https://simons.berkeley.edu/talks/zihan-tan-rutgers-university-2024-07-29 Sublinear Graph Simplification Given an undirected unweighted graph G=(V, E) on n vertices and m edges, a subgraph H is a spanner of G with stretch function f, iff for every pair s, t of vertices, dist_H(s,t) is at most f(dist_G(s,t)). When f(d)=d+o(d), H is called a sublinear additive spanner. In this talk, we show that for any constant integer k＞=2, every graph on n vertices has a sublinear additive spanner with stretch function f(d)=d+O(d^{1-1/k}) and O(n^{1+1/(2^{k+1}-1)+o(1)}) edges. The size bound of our spanners almost matches the lower bound proved in [Abboud-Bodwin-Pettie 2017], which holds for any data structure that maintains distances within the same stretch function. This talk is based on joint work with Tianyi Zhang$