Risks from Learned Optimization: Evan Hubinger at MLAB2 @aisafetytalks

AI Safety Talks The Paper:
arxiv.org/abs/1906.01820

Rob Miles' videos about the subject:
youtube.com/watch?v=bJLcIBixGj8
youtube.com/watch?v=IeWljQw3UgQ

A paper referenced in the talk, "Deep learning generalizes because the parameter-function map is biased towards simple functions": arxiv.org/abs/1805.08522

This was filmed as part of Redwood Research's Machine Learning for Alignment Bootcamp

updated 1 year ago

Risks from Learned Optimization: Evan Hubinger at MLAB2

AI Safety Talks 2022-12-01 | The Paper:
arxiv.org/abs/1906.01820

Rob Miles' videos about the subject:
youtube.com/watch?v=bJLcIBixGj8
youtube.com/watch?v=IeWljQw3UgQ

A paper referenced in the talk, "Deep learning generalizes because the parameter-function map is biased towards simple functions": arxiv.org/abs/1805.08522

This was filmed as part of Redwood Research's Machine Learning for Alignment Bootcamp

6:How to Build a Safe Advanced AGI?: Evan Hubinger 2023

AI Safety Talks 2023-05-13 | Part 6 of a series of talks in which researcher Evan Hubinger explores the problems of safety for artificial general intelligence

This was recorded as part of the SERI ML Alignment Theory Scholars Program: serimats.org

5:Predictive Models: Evan Hubinger 2023

AI Safety Talks 2023-05-13 | Part 5 of a series of talks in which researcher Evan Hubinger explores the problems of safety for artificial general intelligence

This was recorded as part of the SERI ML Alignment Theory Scholars Program: serimats.org

4:How Do We Become Confident in the Safety of an ML System?: Evan Hubinger 2023

AI Safety Talks 2023-05-13 | Part 4 of a series of talks in which researcher Evan Hubinger explores the problems of safety for artificial general intelligence

This was recorded as part of the SERI ML Alignment Theory Scholars Program: serimats.org

3:How Likely is Deceptive Alignment?: Evan Hubinger 2023

AI Safety Talks 2023-05-13 | Part 3 of a series of talks from researcher Evan Hubinger.

Blog post: bit.ly/deceptive-alignment

The Paper, "Risks from Learned Optimization in Advanced Machine Learning Systems": arxiv.org/abs/1906.01820
The blog post series: alignmentforum.org/posts/FkgsxrGf3QxhfLWHG/risks-from-learned-optimization-introduction

This was recorded as part of the SERI ML Alignment Theory Scholars Program: serimats.org

2:Risks from Learned Optimization: Evan Hubinger 2023

AI Safety Talks 2023-05-13 | Part 2 of a series of talks from researcher Evan Hubinger.
The Paper, "Risks from Learned Optimization in Advanced Machine Learning Systems": arxiv.org/abs/1906.01820
The blog post series: alignmentforum.org/posts/FkgsxrGf3QxhfLWHG/risks-from-learned-optimization-introduction

This was recorded as part of the SERI ML Alignment Theory Scholars Program: serimats.org

1:AGI Safety: Evan Hubinger 2023

AI Safety Talks 2023-05-13 | Part 1 of a series of talks in which researcher Evan Hubinger explores the problems of safety for artificial general intelligence

This was recorded as part of the SERI ML Alignment Theory Scholars Program: serimats.org

Concrete Open Problems in Mechanistic Interpretability: Neel Nanda at SERI MATS

AI Safety Talks 2023-05-05 | How can we look inside neural networks and figure out how they do what they do? This is likely to be very important for alignment and safety, but the research is at an early stage, with lots of opportunities for great work. Researcher Neel Nanda talks about some of them in this talk.

neelnanda.io/mechanistic-interpretability/getting-started

alignmentforum.org/posts/LbrPTJ4fmABEdEnLf/200-concrete-open-problems-in-mechanistic-interpretability

This was recorded as part of the SERI ML Alignment Theory Scholars Program: serimats.org