Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

鱿鱼游戏谁能坚持到最后呢！#火影忍者 #佐助 #家庭

Sigma Girl Pizza #funny #memes #comedy

Kolik nabereš svalů za 1 den, měsíc nebo rok? 💪💪

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Gabriel Mongaras

zhlédnutí 9 423

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 11. 09. 2024
Paper here: arxiv.org/abs/...
The annotated S4: srush.github.i...
Notes: drive.google.c...

Komentáře • 14

@gabrielmongaras Před 9 měsíci ⁺¹⁰
I forgot to mention that this model is trained like a normal transformer and since everything is causal, you should be able to train using the efficient parallel technique that the transformer uses, a single forward pass for an entire sequence of data.
@berkk1993 Před 9 měsíci ⁺⁶
I just opened your channel to ask your for Mamba video and here I see this video. You are awesome dude, I can express how much you contribute to my life. Thank you many times!!!
@AM-yk5yd Před 9 měsíci ⁺⁸
19:50 I think A is DxN because they use diagonal matrix. They mention S4D, and that paper has example of also linear initialization: "A = -0.5 + 1j * np.pi * np.arange(N//2) # S4D-Lin initialization". It's structured after all.
@Anonn724 Před 9 měsíci ⁺⁴
Please don't stop with this videos. They are extremely useful to go through with you. Much love
@orrimoch5226 Před 8 měsíci ⁺¹
Wow Gabrial, Great job!
I like your calm attitude and simple way of explaining this complex subject!
As electrical engineer and as a data scientist I highly appreciate your content!
@marshallmcluhan33 Před 9 měsíci ⁺³
Thanks for the vid. I Can't wait to see if it's overhyped or not hehe. TriDao knows his attention mechanisms.
@MatterExplained Před 9 měsíci ⁺³
thx for doing this paper, was a bit lost on state space models
@acasualviewer5861 Před 9 měsíci ⁺¹
I was a bit lost.. now I'm more lost. ;)
@MatterExplained Před 9 měsíci
@@acasualviewer5861haha, i did watch some lectures by the first author tho
@saculzemog Před 8 měsíci ⁺³
shouldn't 24:28 A,B, and C be LxN not LxD ?
@ml-ok3xq Před 9 měsíci ⁺²
I think it's independent because you can diagonalize the state transition matrix and then each value only interacts with itself.
@grimsk Před 9 měsíci ⁺¹
점점 물리학의 개념들에 가까워지는 기분이.. 🙂
@yccui Před 5 měsíci
If all the matrices are learnable, I wonder why the authors use the HiPPO matrix to initialize A？ What's the point?
@gabrielmongaras Před 5 měsíci
I was actually wrong about the HiPPO "A" matrix being learnable. I think this matrix is actually static, which makes sense as it adds some basic structure to the model.

Další v pořadí

Automatické přehrávání

Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality

Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

鱿鱼游戏谁能坚持到最后呢！#火影忍者 #佐助 #家庭

鱿鱼游戏谁能坚持到最后呢！#火影忍者 #佐助 #家庭

Sigma Girl Pizza #funny #memes #comedy

Sigma Girl Pizza #funny #memes #comedy

Kolik nabereš svalů za 1 den, měsíc nebo rok? 💪💪

Kolik nabereš svalů za 1 den, měsíc nebo rok? 💪💪

KAŽDÝ MŮŽE RAPOVAT (bohužel)

KAŽDÝ MŮŽE RAPOVAT (bohužel)

MedAI #41: Efficiently Modeling Long Sequences with Structured State Spaces | Albert Gu

MedAI #41: Efficiently Modeling Long Sequences with Structured State Spaces | Albert Gu

MLBBQ: “Are Transformers Effective for Time Series Forecasting?” by Joanne Wardell

MLBBQ: “Are Transformers Effective for Time Series Forecasting?” by Joanne Wardell

Brutally Honest Advice For Young Men - Robert Greene

Brutally Honest Advice For Young Men - Robert Greene

Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

Do we need Attention? - Linear RNNs and State Space Models (SSMs) for NLP

Do we need Attention? - Linear RNNs and State Space Models (SSMs) for NLP

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

KAN: Kolmogorov-Arnold Networks

KAN: Kolmogorov-Arnold Networks

Mamba - a replacement for Transformers?

Mamba - a replacement for Transformers?

Structured State Space Models for Deep Sequence Modeling (Albert Gu, CMU)

Structured State Space Models for Deep Sequence Modeling (Albert Gu, CMU)

Climbers face sudden avalanche! 😳🫣 - 🎥 BVIRAL / redddlong

Climbers face sudden avalanche! 😳🫣 - 🎥 BVIRAL / redddlong

Игра для тех, у кого нет игр, но есть 🥚 #настольныеигры #boardgames #настолки #настольные_игры

Игра для тех, у кого нет игр, но есть 🥚 #настольныеигры #boardgames #настолки #настольные_игры

KONČÍM CESTU NA OLYMPII A ZÁVODNÍ KARIÉRU

KONČÍM CESTU NA OLYMPII A ZÁVODNÍ KARIÉRU

BEST AIRPODS MAGIC SECRET | @Whoispelagheya

BEST AIRPODS MAGIC SECRET | @Whoispelagheya

Will A Guitar Boat Hold My Weight?

Will A Guitar Boat Hold My Weight?

The First Time You Say ' Mom ' #shortsfeed #funny

The First Time You Say ' Mom ' #shortsfeed #funny

Sabrina Carpenter - Taste (Official Video)

Sabrina Carpenter - Taste (Official Video)