Welch Labs | The moment we stopped understanding AI [AlexNet] @WelchLabsVideo | Uploaded 3 months ago | Updated 1 hour ago
Thanks to KiwiCo for sponsoring today's video! Go to kiwico.com/welchlabs and use code WELCHLABS for 50% off your first month of monthly lines and/or for 20% off your first Panda Crate.
Activation Atlas Posters!
welchlabs.com/resources/5gtnaauv6nb9lrhoz9cp604padxp5o
welchlabs.com/resources/activation-atlas-poster-mixed5b-13x19
welchlabs.com/resources/large-activation-atlas-poster-mixed4c-24x36
welchlabs.com/resources/activation-atlas-poster-mixed4c-13x19
Special thanks to the Patrons:
Juan Benet, Ross Hanson, Yan Babitski, AJ Englehardt, Alvin Khaled, Eduardo Barraza, Hitoshi Yamauchi, Jaewon Jung, Mrgoodlight, Shinichi Hayashi, Sid Sarasvati, Dominic Beaumont, Shannon Prater, Ubiquity Ventures, Matias Forti
Welch Labs
Ad free videos and exclusive perks: patreon.com/welchlabs
Watch on TikTok: tiktok.com/@welchlabs
Learn More or Contact: welchlabs.com
Instagram: instagram.com/welchlabs
X: twitter.com/welchlabs
References
AlexNet Paper
proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
Original Activation Atlas Article- explore here - Great interactive Atlas! https://distill.pub/2019/activation-atlas/
Carter, et al., "Activation Atlas", Distill, 2019.
Feature Visualization Article: https://distill.pub/2017/feature-visualization/
`Olah, et al., "Feature Visualization", Distill, 2017.`
Great LLM Explainability work: https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html
Templeton, et al., "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet", Transformer Circuits Thread, 2024.
“Deep Visualization Toolbox" by Jason Yosinski video inspired many visuals:
youtube.com/watch?v=AgkfIQ4IGaM
Great LLM/GPT Intro paper
arxiv.org/pdf/2304.10557
3B1Bs GPT Videos are excellent, as always:
youtube.com/watch?v=eMlx5fFNoYc
youtube.com/watch?v=wjZofJX0v4M
Andrej Kerpathy's walkthrough is amazing:
youtube.com/watch?v=kCc8FmEb1nY
Goodfellow’s Deep Learning Book
deeplearningbook.org
OpenAI’s 10,000 V100 GPU cluster (1+ exaflop) news.microsoft.com/source/features/innovation/openai-azure-supercomputer
GPT-3 size, etc: Language Models are Few-Shot Learners, Brown et al, 2020.
Unique token count for ChatGPT: cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken
GPT-4 training size etc, speculative:
patmcguinness.substack.com/p/gpt-4-details-revealed
semianalysis.com/p/gpt-4-architecture-infrastructure
Historical Neural Network Videos
youtube.com/watch?v=FwFduRA_L6Q
youtube.com/watch?v=cNxadbrN_aI
Errata
1:40 should be: "word fragment is appended to the end of the original input". Thanks for Chris A for finding this one.
Thanks to KiwiCo for sponsoring today's video! Go to kiwico.com/welchlabs and use code WELCHLABS for 50% off your first month of monthly lines and/or for 20% off your first Panda Crate.
Activation Atlas Posters!
welchlabs.com/resources/5gtnaauv6nb9lrhoz9cp604padxp5o
welchlabs.com/resources/activation-atlas-poster-mixed5b-13x19
welchlabs.com/resources/large-activation-atlas-poster-mixed4c-24x36
welchlabs.com/resources/activation-atlas-poster-mixed4c-13x19
Special thanks to the Patrons:
Juan Benet, Ross Hanson, Yan Babitski, AJ Englehardt, Alvin Khaled, Eduardo Barraza, Hitoshi Yamauchi, Jaewon Jung, Mrgoodlight, Shinichi Hayashi, Sid Sarasvati, Dominic Beaumont, Shannon Prater, Ubiquity Ventures, Matias Forti
Welch Labs
Ad free videos and exclusive perks: patreon.com/welchlabs
Watch on TikTok: tiktok.com/@welchlabs
Learn More or Contact: welchlabs.com
Instagram: instagram.com/welchlabs
X: twitter.com/welchlabs
References
AlexNet Paper
proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
Original Activation Atlas Article- explore here - Great interactive Atlas! https://distill.pub/2019/activation-atlas/
Carter, et al., "Activation Atlas", Distill, 2019.
Feature Visualization Article: https://distill.pub/2017/feature-visualization/
`Olah, et al., "Feature Visualization", Distill, 2017.`
Great LLM Explainability work: https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html
Templeton, et al., "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet", Transformer Circuits Thread, 2024.
“Deep Visualization Toolbox" by Jason Yosinski video inspired many visuals:
youtube.com/watch?v=AgkfIQ4IGaM
Great LLM/GPT Intro paper
arxiv.org/pdf/2304.10557
3B1Bs GPT Videos are excellent, as always:
youtube.com/watch?v=eMlx5fFNoYc
youtube.com/watch?v=wjZofJX0v4M
Andrej Kerpathy's walkthrough is amazing:
youtube.com/watch?v=kCc8FmEb1nY
Goodfellow’s Deep Learning Book
deeplearningbook.org
OpenAI’s 10,000 V100 GPU cluster (1+ exaflop) news.microsoft.com/source/features/innovation/openai-azure-supercomputer
GPT-3 size, etc: Language Models are Few-Shot Learners, Brown et al, 2020.
Unique token count for ChatGPT: cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken
GPT-4 training size etc, speculative:
patmcguinness.substack.com/p/gpt-4-details-revealed
semianalysis.com/p/gpt-4-architecture-infrastructure
Historical Neural Network Videos
youtube.com/watch?v=FwFduRA_L6Q
youtube.com/watch?v=cNxadbrN_aI
Errata
1:40 should be: "word fragment is appended to the end of the original input". Thanks for Chris A for finding this one.