Informer attention code - FROM SCRATCH!

The complete guide to Transformer neural Networks!

Informer: Time series Transformer - EXPLAINED!

Nejlepší zapečené párky 🍺 #ostravskygastrošef #food #heřmangazda

SPONGEBOB POWER-UPS IN BRAWL STARS!!!

Proč první Deadpool nemĕl ústa? #deadpool #wolverine #shorts

Informer attention Architecture - FROM SCRATCH!

CodeEmporium

zhlédnutí 2 483

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 28. 08. 2024

Komentáře • 22

@LeoLan-vv1nq Před 2 měsíci ⁺¹
Amazing work, can't wait for next episode !
@mohamadalikhani2665 Před měsícem ⁺¹
Thank you for your amazing content. Where can I acess the drawio file?
@neetpride5919 Před 2 měsíci ⁺³
Why aren't the padding tokens appended during data preprocessing, before the inputs are turned by the feedfoward layer into the key, query, value, vectors?
@slayer_dan Před 2 měsíci ⁺¹
Adding padding before forming K, Q, and V vectors would insert extra tokens into the input sequences, altering their lengths and potentially distorting the underlying data structure. As a result, the subsequent computation of K, Q, and V vectors would incorporate these padding tokens, affecting the model's ability to accurately represent the original data. During the attention calculation, these padding tokens would influence the attention scores, potentially diluting the focus on the actual content of the input sequences. This could lead to less effective attention patterns and hinder the model's ability to learn meaningful representations from the data.
Furthermore, applying padding after forming K, Q, and V vectors allows for the efficient use of masking techniques to exclude padding tokens from the attention mechanism. By setting the attention scores corresponding to padding positions to negative infinity before the softmax operation, the model effectively ignores these tokens during attention calculation. This approach preserves the integrity of the input sequences, ensures accurate attention computations, and maintains the model's focus on relevant information within the data.
P.S. I used ChatGPT to format my answer because it can do this thing better.
@neetpride5919 Před 2 měsíci
@@slayer_dan how could it possibly save computing power to pad the matrices with multiple, 512-element vectors, rather than simply appending tokens to the initial sequence of tokens?
@deltamico Před 2 měsíci ⁺¹
Take it with a grain if salt but I think if you hardcore the mask to not be paid attention to, you don't need learn that extra behavior for the [pad] token so it's more stable.
@jarhatz Před měsícem ⁺¹
@@neetpride5919 Multiplying matrices on the GPU can be optimized by efficiently sizing the matrices such that they fit more cleanly in GPU cache. For example, suppose you have two skinny tall matrices that you want to multiply together. Sometimes, the kernel operations that occur across one (tall) axis can be the bottleneck in compute time. There are instances in optimization where padding matrices with 0s to uniform square shapes or multiples of the cache block size can speed up the kernel operations on the GPU.
@adelAKAdude Před 2 měsíci
great video thanks
question ... in the third question ... how do sample subset of keys, queries "depending on importance"
@samson6707 Před 2 dny
can i find the flow chart graphic of the informer model on github? and is draw io for free?
@Ishaheennabi Před 2 měsíci ⁺²
Love from kashmir india bro!❤❤❤
@sudlow3860 Před 2 měsíci
With regard to the quiz I think it is B D B. Not sure how this is going to launch a discussion though. You present things very well.
@CodeEmporium Před 2 měsíci
Ding ding ding! Good work on the quiz! While this may or may not spark a discussion, just wanted to say thanks for participating :)
@rpraver1 Před 2 měsíci
Also as always great video, hoping in future you deal with encoder only and decoder only transformers...
@CodeEmporium Před 2 měsíci
Yep! For sure. Thank you so much!
@user-qd2oc6xq8n Před 2 měsíci
Can u tell an interactive model of AI neural network for school project.. And ur videos are nice and I understand easily.. Pls tell
@-beee- Před 2 měsíci ⁺¹
I would love if the quizzes had answers in the comments eventually. I know this is a fresh video, but I want to check my work, not just have a discussion 😅
@dumbol8126 Před 2 měsíci
is this same as the wjat timesfm uses
@eadweard. Před 2 měsíci
In answer to your question, I can either:
A) mono-task
or
B) screw up several things at once
@theindianrover2007 Před 2 měsíci
cool!
@CodeEmporium Před 2 měsíci
Thank you 🙏
@rpraver1 Před 2 měsíci
Not sure if just me, but starting at about 4:50 your graphics are so dark...
maybe go to a white background or light gray, like your original png...
@CodeEmporium Před 2 měsíci
Yea. Let me try brightening them up for future videos if I can. Thanks for the heads up

Další v pořadí

Automatické přehrávání

Informer attention code - FROM SCRATCH!

Informer attention code - FROM SCRATCH!

The complete guide to Transformer neural Networks!

The complete guide to Transformer neural Networks!

Informer: Time series Transformer - EXPLAINED!

Informer: Time series Transformer - EXPLAINED!

Nejlepší zapečené párky 🍺 #ostravskygastrošef #food #heřmangazda

Nejlepší zapečené párky 🍺 #ostravskygastrošef #food #heřmangazda

SPONGEBOB POWER-UPS IN BRAWL STARS!!!

SPONGEBOB POWER-UPS IN BRAWL STARS!!!

Proč první Deadpool nemĕl ústa? #deadpool #wolverine #shorts

Proč první Deadpool nemĕl ústa? #deadpool #wolverine #shorts

Only I get to bully my sister 😤

Only I get to bully my sister 😤

Encrypting Data in the Browser - Exploring Web Crypto APIs by Aakansha Doshi

Encrypting Data in the Browser - Exploring Web Crypto APIs by Aakansha Doshi

Transformer Attention for Time Series - Follow-Up with Real World Data

Transformer Attention for Time Series - Follow-Up with Real World Data

Informer distillation - EXPLAINED!

Informer distillation - EXPLAINED!

Why do databases store data in B+ trees?

Why do databases store data in B+ trees?

How much training data does a neural network need?

How much training data does a neural network need?

How to train a model to generate image embeddings from scratch

How to train a model to generate image embeddings from scratch

This is why Deep Learning is really weird.

This is why Deep Learning is really weird.

Embeddings - EXPLAINED!

Embeddings - EXPLAINED!

Gender reveal 🤰🩵 #hannahstocking #shorts

Gender reveal 🤰🩵 #hannahstocking #shorts

So brutal REVENGE 😂😭🔥 @BrutalAssaultOFFICIAL #youtube #festival #comedy #metal #corpsepaint

So brutal REVENGE 😂😭🔥 @BrutalAssaultOFFICIAL #youtube #festival #comedy #metal #corpsepaint

7 Nejhorších Katastrof v Česku

7 Nejhorších Katastrof v Česku

Truck catches on fire and biker helps put it out 🔥😱 (via themountainmiller/ig)

Truck catches on fire and biker helps put it out 🔥😱 (via themountainmiller/ig)

Get 10 Mega Boxes OR 60 Starr Drops!!

Get 10 Mega Boxes OR 60 Starr Drops!!

IKON A SÉGRA HRAJÍ KÁMEN NŮŽKY PAPÍR CHALLENGE O JÍDLO V BAZÉNĚ ! 😂🍕🍟 #shorts

IKON A SÉGRA HRAJÍ KÁMEN NŮŽKY PAPÍR CHALLENGE O JÍDLO V BAZÉNĚ ! 😂🍕🍟 #shorts

KOCOVINA VE 20 vs VE 30 LETECH 😅😂

KOCOVINA VE 20 vs VE 30 LETECH 😅😂

248 lízátek za 2 500 korun! 😝

248 lízátek za 2 500 korun! 😝