Attention is all you need explained

Mor Geva: Transformer Feed Forward Layers are Key-Value Memories, and Build Predictions

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

VÝZVA - KDYŽ LE SY VYHRAJE, JDU HNED DO CLASHE 🤯

I wish I could change THIS fast! 🤣

Increíble final 😱

Gail Weiss: Thinking Like Transformers

Formal Languages and Neural Networks Seminar

zhlédnutí 13 029

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 24. 02. 2022
Paper presented by Gail Weiss to the Neural Sequence Model Theory discord on the 24th of February 2022.
Gail's references:
On Transformers and their components:
- Thinking Like Transformers (Weiss et al, 2021) arxiv.org/abs/2106.06981 (REPL here: github.com/tech-srl/RASP)
- Attention is All You Need (Vaswani et al, 2017) arxiv.org/abs/1706.03762
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al, 2018) arxiv.org/abs/1810.04805
- Improving Language Understanding by Generative Pre-Training (Radford et al, 2018) s3-us-west-2.amazonaws.com/op...
- Are Transformers universal approximators of sequence-to-sequence functions? (Yun et al, 2019) arxiv.org/abs/1912.10077
- Theoretical Limitations of Self-Attention in Neural Sequence Models (Hahn, 2019) arxiv.org/abs/1906.06755
- On the Ability and Limitations of Transformers to Recognize Formal Languages (Bhattamishra et al, 2020) arxiv.org/abs/2009.11264
- Attention is Turing-Complete (Perez et al, 2021) jmlr.org/papers/v22/20-302.html
- Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers (Wei et al, 2021) arxiv.org/abs/2107.13163
- Multilayer feedforward networks are universal approximators (Hornik et al, 1989) www.cs.cmu.edu/~epxing/Class/...
- Deep Residual Learning for Image Recognition (He at al, 2016) www.cv-foundation.org/openacc...
- Universal Transformers (Dehghani et al, 2018) arxiv.org/abs/1807.03819
- Improving Transformer Models by Reordering their Sublayers (Press et al, 2019) arxiv.org/abs/1911.03864
On RNNs:
- Explaining Black Boxes on Sequential Data using Weighted Automata (Ayache et al, 2018) arxiv.org/abs/1810.05741
- Extraction of rules from discrete-time recurrent neural networks (Omlin and Giles, 1996) www.semanticscholar.org/paper...
- Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples (Weiss et al, 2017) arxiv.org/abs/1711.09576
- Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning (Rabusseau et al, 2018) arxiv.org/abs/1807.01406
- On the Practical Computational Power of Finite Precision RNNs for Language Recognition (Weiss et al, 2018) aclanthology.org/P18-2117/
- Sequential Neural Networks as Automata (Merrill, 2019) aclanthology.org/W19-3901.pdf
- A Formal Hierarchy of RNN Architectures (Merrill et al, 2020) aclanthology.org/2020.acl-mai...
- Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets (Joulin and Mikolov, 2015) proceedings.neurips.cc/paper/...
- Learning to Transduce with Unbounded Memory (Grefenstette et al, 2015) proceedings.neurips.cc/paper/...
Paper mentioned in discussion at the end:
- Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth (Dong et al, 2021) icml.cc/virtual/2021/oral/9822
Věda a technologie

Komentáře • 8

@swim3936 Před rokem ⁺²
fantastic presentation!
@GodofStories Před rokem ⁺²
This is great
@alexanderkyte4675 Před rokem ⁺⁷
Could I please have the slides? They’re partially obscured by the listeners here. I’d like to use them for a reading group.
@formallanguagesandneuralne5578 Před rokem ⁺³
hey, not managing to respond from my own account so positing from here - the slides are on my website, which is hosted on github - gailweiss dot github dot io
@stevenshaw124 Před rokem ⁺²
this was an excellent presentation! thank you!
@homeboundrecords6955 Před rokem ⁺¹
I'll bet this reply will not be read, but... isn't the "subject" = "I" and the "object" = "dog" ?
@LGcommaI Před rokem ⁺¹
Yes, that's correct. The terminology is confusing though (IF one knows Latin): the 'subject' literally is 'that which is (thrown) UNDER' while the 'object' is 'that which is (thrown) on top' . Everyday sensibilities thus would expect that the object is the one who does sth. and the subject the one which has sth. done TO it. The standard convention is the OPPOSITE however.
@RaviAnnaswamy Před rokem ⁺¹
@@LGcommaI object generally refers to inert things and the 'subject' is used as English word for persons (King asked his subjects to pay more tax during the drought years...). This could be the reason for English grammar using subject for the actor and object for the acted upon (victim).

Další v pořadí

Automatické přehrávání

Attention is all you need explained

Attention is all you need explained

Mor Geva: Transformer Feed Forward Layers are Key-Value Memories, and Build Predictions

Mor Geva: Transformer Feed Forward Layers are Key-Value Memories, and Build Predictions

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

VÝZVA - KDYŽ LE SY VYHRAJE, JDU HNED DO CLASHE 🤯

VÝZVA - KDYŽ LE SY VYHRAJE, JDU HNED DO CLASHE 🤯

I wish I could change THIS fast! 🤣

I wish I could change THIS fast! 🤣

Increíble final 😱

Increíble final 😱

Olive can see you 😱

Olive can see you 😱

Gail Weiss: Thinking like Transformers

Gail Weiss: Thinking like Transformers

MIT Introduction to Deep Learning | 6.S191

MIT Introduction to Deep Learning | 6.S191

The Attention Mechanism in Large Language Models

The Attention Mechanism in Large Language Models

CS480/680 Lecture 19: Attention and Transformer Networks

CS480/680 Lecture 19: Attention and Transformer Networks

LSTM is dead. Long Live Transformers!

LSTM is dead. Long Live Transformers!

What are Transformer Models and how do they work?

What are Transformer Models and how do they work?

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

[ 100k Special ] Transformers: Zero to Hero

[ 100k Special ] Transformers: Zero to Hero

Physics Informed Neural Networks (PINNs) [Physics Informed Machine Learning]

Physics Informed Neural Networks (PINNs) [Physics Informed Machine Learning]

ВОЗМОЖНО ЛИ ПОЧИСТИТЬ КЛАВИАТУРУ КЛЕЕМ?🤔 #shorts

ВОЗМОЖНО ЛИ ПОЧИСТИТЬ КЛАВИАТУРУ КЛЕЕМ?🤔 #shorts

#phonescreenprotector #tempered #smartphone #temperedglass #cellphone #goodthing #mobilephone #tech

#phonescreenprotector #tempered #smartphone #temperedglass #cellphone #goodthing #mobilephone #tech

Prusa Pro HT90 is here: The Only 3D Printer an Engineer Needs

Prusa Pro HT90 is here: The Only 3D Printer an Engineer Needs

High voltage Ground Fault testing.

High voltage Ground Fault testing.

Ordering my 1,000,000 subscriber play button #carterpcs #tech #techtok #gaming #techfacts

Ordering my 1,000,000 subscriber play button #carterpcs #tech #techtok #gaming #techfacts

She Wanted a Custom PC, Now This… 😬

She Wanted a Custom PC, Now This… 😬

The Weird, Terrible Smartphones They Only Have in North Korea

The Weird, Terrible Smartphones They Only Have in North Korea