Illustrated Guide to Transformers Neural Network: A step by step explanation

Sdílet
Vložit
  • čas přidán 19. 06. 2024
  • Transformers are the rage nowadays, but how do they work? This video demystifies the novel neural network architecture with step by step explanation and illustrations on how transformers work.
    CORRECTIONS:
    The sine and cosine functions are actually applied to the embedding dimensions and time steps!
    ⭐ Play and Experiment With the Latest AI Technologies at grandline.ai ⭐
    Hugging Face Write with Transformers
    transformer.huggingface.co/

Komentáře • 595

  • @Leon-pn6rb
    @Leon-pn6rb Před 3 lety +197

    this is great but would've loved if you could have taken a sample sentence as an input and show us how it transforms as it moves through the different parts of the transformer.
    Perhaps an idea for the next video!

    • @tunisitherapie3078
      @tunisitherapie3078 Před rokem +4

      @The A.I. Hacker - Michael Phi please do !

    • @aysesalihasunar9563
      @aysesalihasunar9563 Před 2 měsíci

      The video actually led me to expect this example as well! It would be highly beneficial.

  • @architkhare729
    @architkhare729 Před 3 lety +11

    Wow , this was great, I have watched a no of videos on the transformer models, and they have all contributed to my understanding, but this puts everything together so neatly. Amazing, please keep making more such videos.

  • @valentinfontanger4962
    @valentinfontanger4962 Před 2 lety +7

    I used multiple sources to learn about the transformer architecture. Regarding the decoder part, you really helped me understanding what was the input and how the different operations are performed ! Thanks a lot :)

  • @abail7010
    @abail7010 Před rokem +7

    I have been struggling with this architecture for an eternity now and this is the first time I really understood what's going on in this graphic. Thank you so much for this nice and clear explanation!

    • @leif1075
      @leif1075 Před rokem

      What about the architecture made you struggle if I may ask?

  • @jenishah9825
    @jenishah9825 Před 3 lety +5

    This video marks an end to my search for one place explanation of Transformers. Thanks a lot for putting this up! :)

  • @mrowkenesser
    @mrowkenesser Před rokem +5

    Man thanks for this video, reading a paper for newbie is super difficult, but such explanations like you've posted for key, value and query as well as reasoning for masking is very, very helpful. I subscribed to your channel and am looking forward for new stuff.

  • @manikantansrinivasan5261
    @manikantansrinivasan5261 Před 11 měsíci

    This is literally the best explanation of Transformers I have ever seen!

  • @yishaibasserabie5765
    @yishaibasserabie5765 Před rokem +1

    This is by far the best explanation I’ve ever seen on Transformer Networks. Very very well done

  • @lone0017
    @lone0017 Před 3 lety +3

    Brilliant explanation with visually intuitive animations ! I rarely comment or subscribe to anything but this time I instantly do both after watching the video. And how coincidental it is that this was uploaded on my birthday. Hope to see more videos from you.

  • @sank_y
    @sank_y Před 3 lety +156

    12:56 encoder has hiddens state of key-value pairs, and in the decoder, the previous output is compressed into a query. The next output is produced by mapping this query and the set of keys and values.

    • @mrexnx
      @mrexnx Před 3 lety +23

      this is critical! I was pretty confused on this for awhile until I realized he swapped the Query and Values on accident.

    • @Random4Logic
      @Random4Logic Před 2 lety +4

      ah someone else realised it. this comment should be pinned ^^

    • @MJJ12337
      @MJJ12337 Před 2 lety +3

      you are correct

    • @leif1075
      @leif1075 Před rokem

      @@mrexnx Correct me if I'm wrong but the only reason you put the mask so it doesn't attend to "future" words in the sentence is cause of the nature of the Ebglish language..since English is written left tor ought unlike other languages. Otherwise you shouldn't have thst mask because you would need to attend to words on right or maybe left also?

    • @fahmidhossainSakib
      @fahmidhossainSakib Před rokem +1

      @@leif1075 I also thought something like that, that means, in case of Arabic, this direction of masking should not work !

  • @MinhNguyen-ro6lm
    @MinhNguyen-ro6lm Před 3 lety +657

    I must say you’ve given the best explanation on transformers that’ve saved me lots of time studying the original paper. Please produce more vids like this, I would recommend the BERT family and the GPT family as well 👏👍

    • @xtremechaos5771
      @xtremechaos5771 Před 3 lety +12

      I agree. I can't seem to find a good explanation on the BERT model

    • @ronnieadam1807
      @ronnieadam1807 Před 2 lety

      Sorry to be offtopic but does anyone know of a tool to log back into an instagram account??
      I stupidly forgot the password. I appreciate any tips you can offer me.

    • @ronnieadam1807
      @ronnieadam1807 Před 2 lety

      @Matias Santino I really appreciate your reply. I found the site through google and Im trying it out atm.
      Takes a while so I will reply here later with my results.

    • @ronnieadam1807
      @ronnieadam1807 Před 2 lety

      @Matias Santino It worked and I actually got access to my account again. I'm so happy:D
      Thank you so much you saved my account!

    • @matiassantino4452
      @matiassantino4452 Před 2 lety

      @Ronnie Adam you are welcome :)

  • @mariosessa3828
    @mariosessa3828 Před 2 lety

    Thanks for your explanation, very clean and well built in every argument about transformers. I was so lucky to get this video randomly on CZcams. Good job!

  • @Random4Logic
    @Random4Logic Před 2 lety +149

    There is actually a small mistake at 12:56: The Ecnoders Output are the VALUES and keys! for the Decoders second self attention.
    So it is: Value and Key from Encoder with Query from Decoder are combined.
    from the "Self Attention is all you need" paper: "In "encoder-decoder attention" layers, the queries come from the previous decoder layer,
    and the memory keys and values come from the output of the encoder."

    • @curiousseeker3784
      @curiousseeker3784 Před 10 měsíci

      Ik that's true but still it doesn't make sense, What's the use of the masked attentions score that we just computed in that Multi-Head attention layer since it's not carried forward

    • @weibangzhang3994
      @weibangzhang3994 Před 9 měsíci +1

      yes you are right

    • @vineetsrivastava4906
      @vineetsrivastava4906 Před 8 měsíci +5

      I asked chatgpt (GPT4) for the same and look at the response:
      "Let's break down the process:
      Encoders in Transformers:
      They take the input sequence and process it through multiple layers of self-attention and feed-forward networks.
      The final output from the last encoder is used as the "Keys" (K) and "Values" (V) for the subsequent decoder.
      Decoders in Transformers:
      Decoders also have multiple layers.
      In the first self-attention mechanism of the decoder, the target sequence (for example, the partially generated translation) attends to itself. This is analogous to the self-attention in the encoder but has a slight difference. In the decoder's self-attention, future tokens are masked, so they can't be attended to. This ensures that the prediction for a particular word doesn’t depend on future words in the sequence.
      In the subsequent multi-head attention mechanism, the "Queries" (Q) come from the decoder's previous layer, and they attend to the "Keys" (K) and "Values" (V) from the encoder's output.
      This essentially means that the decoder is using the information from the encoder to help generate the next token in the sequence.
      So, your statement is correct: "Value and Key from Encoder with Query from Decoder are combined." In the Transformer's decoder, for every step in its layers, the Queries (Q) from the decoder attend to the Keys (K) and Values (V) from the encoder output."

    • @yugiblox3274
      @yugiblox3274 Před 7 měsíci

      Unless it’s a decoder only transformer

    • @joachimheirbrant1559
      @joachimheirbrant1559 Před 2 měsíci

      indeed it is like this as the dot product of the keys and querries construct the relation between the input and the already generated output if noy K and Q where from the encoder it wouldn't capture the relation between the input and already generated output

  • @anshuljain2258
    @anshuljain2258 Před 2 lety

    Half of it went through my head. Just beautiful. I'll watch it many more times.. That's how I know the content is gooood.

  • @elorine8801
    @elorine8801 Před 3 lety

    This illustrated explanation is just so well done :OOOOO
    I'm a novice at Deep neuronal networks and just by looking at the video, I just understood everything !
    Completely recommended to understand Transformers :)
    Good work :D

  • @sloanNYC
    @sloanNYC Před rokem

    Incredibly interesting. It is amazing how much processing and storage is required to achieve this.

  • @lifewhimsy
    @lifewhimsy Před rokem +1

    This is THE BEST transformer video I have encountered.

  • @cocoph
    @cocoph Před 3 hodinami

    This is the best explanation of transformers models, please keep going on this channel. There are lots of models still need to explain!

  • @Dexter01
    @Dexter01 Před 4 lety +56

    This tutorial is absolute brilliant, I have to see it again and read the illustrated guide, there are so many infos!! Thank you!!!

  • @Hooowwful
    @Hooowwful Před 3 lety

    Favourite video on the topic! I'm reasonably knowledgeable on ML, but the other 5-10 videos I've tried so far all resulted in increased confusion. This is clear.Nice one 👍🏿

  • @tingyizhou8736
    @tingyizhou8736 Před 2 lety

    This is one of the best introductory videos I've seen on this subject. Thank you!

  • @gudisamahesh
    @gudisamahesh Před 3 měsíci

    This seems to be one of the best videos on Transformers

  • @flwi
    @flwi Před rokem +1

    That's a very good explanation imo! Thanks for taking the time to produce such a gem.

  • @Lolzor87a
    @Lolzor87a Před 3 lety +1

    Wow. This is some really good explanation! I don't have much NLP background except RNN/LSTM and things before DL (N-gram), but wanted to know more about Attention mechanism for robotics application. (my field) Most other explanation either skimmed over the mathematics, or used NLP specific nomenclature/concepts that made it hard to understand for non-NLP people.
    This was some good stuff! Much appreciated and Keep up the good work!

  • @helloWorld01010
    @helloWorld01010 Před rokem +4

    You did an amazing job explaining the workflow … looked for more similar stuff… please continue … I hope you will be back to help people like me

  • @danilob2b2
    @danilob2b2 Před 3 lety

    I watched a second time not for better understand the video, but to appreciate it. It is very well done and pretty clear. Thank you.

  • @Waterlmelon
    @Waterlmelon Před 3 měsíci

    amazing explanation, honestly this is the first time i understand how Transformers work.

  • @mitch7w
    @mitch7w Před 9 měsíci +4

    Best explanation I've seen so far, thanks so much! 😃

  • @garymail4393
    @garymail4393 Před rokem

    This is the best video I have seen about transformers. Very articulate and concise. Great job

  • @fghgffgvbgh
    @fghgffgvbgh Před 2 lety

    Thanks a lot. This is by far the most clear explanation of the paper. Kudos. Hope you can do similar videos for say Bert, XLNET architectures as well.

  • @mohitjoshi8818
    @mohitjoshi8818 Před rokem

    Your videos are the BEST, I understood RNNs, LSTMs, GRUs and Transformers in less than an hour.
    Thankyou.

  • @-long-
    @-long- Před 2 lety

    My first read about Micheal Phi was "Stop Installing Tensorflow using pip for performance sake!" in TowardDataScience blog (as I recall you was "Micheal Nguyen" at that time). My first impression was like "oh this guy was good at explanation". Then I read his several blogs, and now here I am. I never knew that you have a channel. You are one of the best educator I've ever known. Thanks so much.

  • @alvinphantomhive3794
    @alvinphantomhive3794 Před 3 lety

    Now i have two great heroes that explain complex concept using mindblowin visualization, first is 3b1b for complex math topics, then Michael Phi for complex machine learning architecture! Just wow ... salute sir! thank you so much!

  • @udbhavprasad3521
    @udbhavprasad3521 Před 3 lety +1

    Honestly this is the best explanation I've ever seen on transformers and attention

  • @TheForresthu
    @TheForresthu Před 3 lety

    The explanation about Transformer architecture is clear, and the animation in presentation is really good, it catches my attention :)

  • @biplobbiswas5478
    @biplobbiswas5478 Před 3 lety

    The best explanation so far. Loved the animated illustration.

  • @nikhilnanda5922
    @nikhilnanda5922 Před 2 lety

    This was beautiful.
    This was the best explanation out there. You Sir, are a person of highest quality.

  • @dineshbhosale421
    @dineshbhosale421 Před 3 lety

    This is the best explanation ever! So genius! Need more videos like this

  • @aakashgarg2970
    @aakashgarg2970 Před 4 lety

    Such a lucid explanation it is. Thanks for posting!!

  • @josicoSiete
    @josicoSiete Před 4 lety

    Amazing explanation Michael! Thank you for your time!!!!

  • @Controllerhead
    @Controllerhead Před rokem +5

    Incredible video! I hope you are doing well and find the time to make more, especially with the recent popularity explosion of AI.

  • @revenantnox
    @revenantnox Před rokem

    This was super helpful thank you. I read the original paper and absorbed like 70% of it but this clarified several things.

  • @TimothyParker1
    @TimothyParker1 Před 2 lety

    Great deep dive into transformers. Helped me understand this architecture.

  • @ViratSingh-nq7ok
    @ViratSingh-nq7ok Před 3 lety

    Simple and coherent explanations. Brilliant

  • @morphos2
    @morphos2 Před 2 lety +1

    The best video on this channel Michael. Do you think you can make a bunch more like this... with this visual style (white over black drawings), and clear and calm explanation of the diagrams.

  • @martian.07_
    @martian.07_ Před 2 lety

    Best video ever on transformers, trust me I tried others, just positional encoding is missing, but rest is gold.
    Thank you.

  • @CodeEmporium
    @CodeEmporium Před 4 lety +2

    Nice work! Love the visuals for this abstract topic. Just found your channel. Keep em coming!!

    • @theaihacker777
      @theaihacker777  Před 4 lety +4

      Thanks! Your content is also super helpful as well and has helped me before

  • @iskhwa
    @iskhwa Před 2 lety

    I keep coming back to this video. It's great.

  • @tanveerulmustafa9232
    @tanveerulmustafa9232 Před 2 měsíci

    This explanation is INCREDIBLE!!!

  • @TheUmaragu
    @TheUmaragu Před 6 měsíci +1

    A complex process- I need to listen to this multiple times to fully understand this.

  • @akhileshm8089
    @akhileshm8089 Před 3 lety

    This is the best explaination on transformers anywhere on the web

  • @davefar2964
    @davefar2964 Před rokem

    Thanks, I particularly liked that you went into as much detail for the decoder as for the encoder.

  • @user-sv5vb1mj1q
    @user-sv5vb1mj1q Před 4 lety

    Best examplanation I have seen so far. Great job! You destroyed almost all my questions.

  • @alexanderblumin6659
    @alexanderblumin6659 Před 2 lety

    Very deep explanation, brilliant talent to give somebody an intuition

  • @gkirangk4946
    @gkirangk4946 Před 3 lety

    Wow..one of the best videos I have watched on transformers...so simple to grasp. Please make more videos.

  • @sansin-dev
    @sansin-dev Před 4 lety +1

    Fantastic. Thank you!

  • @danicarovo8818
    @danicarovo8818 Před 10 měsíci

    Thank you for this amazing explanation. It really helped after an insufficient explanation from my DL lecture. The prof did not even mention that the final part is a classification over the vocabulary for an nlp task!

  • @sergeyzaitsev3319
    @sergeyzaitsev3319 Před rokem

    Great video! Thank you! The only small issue with the explanation is you describe how it works and while inference there are no future tokens because you haven't yet generated them. The triangle masking is needed only when learning the transformer and when we actually have these "future tokens". I think it is better to state explicitly because otherwise it instantly produces the questions like "where did we get the future tokens?"

  • @Auditor1337
    @Auditor1337 Před rokem +2

    While I still have some questions, this is a pretty good explanation, I mean I actually have an idea of how this works! Gonna watch it like 2 more times.

  • @parker1981xxx
    @parker1981xxx Před 3 lety

    Perfect explanation of the concept, thank you!

  • @stevey7997
    @stevey7997 Před rokem

    This is by far the best explanation that I have seen.

  • @viniciusmonteirodelira9872

    Great work here!! Thank you for this excellent explanation!

  • @toddwmac
    @toddwmac Před rokem

    If you only knew how relevant this would be 2 years later. Thank you!

  • @rangv733
    @rangv733 Před 3 lety

    Wonderfully explained ! Thank you.

  • @footygods792
    @footygods792 Před 4 lety +1

    Well done, this is brilliant !

  • @MaptaGss
    @MaptaGss Před 3 lety

    hands down the best explation for transformer models !

  • @atirrup3470
    @atirrup3470 Před rokem +3

    12:55 A small mistake: K and V should be encoder stack's output, and Q is the first Multi-headed Attention sublayer's output. Still, this guide is really awesome! Thanks for your effort bro!

  • @JulianHarris
    @JulianHarris Před 5 měsíci

    Amazing. I still don’t really understand how the Q K and V values are calculated but I learnt a lot more about this seminal paper than others provided - thank you! 🙏

  • @dnaphysics
    @dnaphysics Před rokem +15

    Good explanation. What boggles my mind is that this architecture can not only produce reasonable sentences, but there can be some logic going on behind the sequence of sentences as we've seen in chatGPT. It is mind-boggling that there must be some amount of deeper conceptualization represented in the embeddings too! amazing

    • @DS-nv2ni
      @DS-nv2ni Před 8 měsíci

      No, and it's not even understandable how you got to such conclusion.

  • @janeditchfield3976
    @janeditchfield3976 Před rokem

    Your explanation of q k and v is the thing that finally did it for me, I get it!

  • @tabindahayat3492
    @tabindahayat3492 Před 8 dny

    Woah! Exquisite, It's a 15 min video but I spent over an hour taking notes and understanding. You have done a great job, keep it up. Thank you so much! Such explanations are rare. ;)

  • @Matieu666
    @Matieu666 Před 3 měsíci

    Best explanation I've seen - thanks !

  • @stevemurch3245
    @stevemurch3245 Před rokem

    Outstanding explanation and visuals. Well done.

  • @user-jh8yy3vn5y
    @user-jh8yy3vn5y Před 2 lety +2

    This is incredible. I've been watching videos and reading papers about transformer and attention for days, this is the best material so far.

  • @ThaileangSung
    @ThaileangSung Před 3 lety

    Very clear and clean explaining. Thanks.

  • @Priya-dn4jz
    @Priya-dn4jz Před 3 lety +1

    Amazing explanation!! Please make more videos on deep learning, it would be a great help....cheers!!

  • @jinwookchoi3532
    @jinwookchoi3532 Před 3 lety

    Fantastic!
    Thank you for a marvelous presentation.

  • @piyalikarmakar5979
    @piyalikarmakar5979 Před 2 lety

    I must say this is the best explanation I had ever seen...no confusion no doubt left in mind.. Thanks a lot sir.. It will be helpful if you kindly do a vedio on language models like IBM GPT BERT..

  • @jamgplus334
    @jamgplus334 Před 3 lety

    wow! you explained it so clearly and really helps my understanding, thanks

  • @StratosFair
    @StratosFair Před rokem

    Best video on transformers on CZcams, thank you so much

  • @federicaf
    @federicaf Před 3 lety +2

    Amazing! thank you so much - great quality of the video and content

  • @remymarion7663
    @remymarion7663 Před rokem

    Perfect explanation of the Transformes !!! Thanks.

  • @AlainLEGRAND75
    @AlainLEGRAND75 Před 2 lety

    Thank you for this video, it's a great piece of work, so easy to understand, where others are confused in their explanation, and probably me, if I were to do it.

  • @vishwajeetparadkar1420

    This is Brilliant, Thank you for this.

  • @VishalSingh-tm6we
    @VishalSingh-tm6we Před rokem

    Thanks for the effort you put into making the animation on the slide.

  • @TheCJD89
    @TheCJD89 Před 2 lety

    Great breakdown. Really easy to follow

  • @jeremyhofmann7034
    @jeremyhofmann7034 Před 2 lety +2

    This transformer tutorial is more than meets the eye

  • @emeebritto
    @emeebritto Před 8 měsíci

    the best explanation that I've seen. 👏

  • @guillaumehai
    @guillaumehai Před 9 měsíci +1

    This was fantastic, thanks!

  • @RanveerSingh-pm3eg
    @RanveerSingh-pm3eg Před 4 lety

    Best explanation i have ever seen ...thanks for the video

  • @siddhanthhegde227
    @siddhanthhegde227 Před 3 lety

    Brooo you are seriously my god😭😭🙏🙏...thanks a lot for this video...no one... literally no one could teach me transformer and your video just got drilled into my mind...please make other videos like this for bert gpt xlnet xlm etc etc... I'm really thankful to you

  • @Scranny
    @Scranny Před 3 lety +8

    Wow Michael, this is a superb explanation of the transformer architecture. You even went into detail about the meaning of the Q,K,V vectors and masking concepts which were hard for me to grasp. I bounced around through 3-4 videos about the transformer arch, and for each one I claimed it was the best explanation on the topic. But your video takes the cake and explains it in half the time as the others. Thank you for sharing! Also, great job on the visuals which are on par with 3blue1brown's animations.

  • @rramjee1
    @rramjee1 Před 3 lety +1

    Beautifully explained. Thanks for this video. Very methodical. If you could also help with a video that can elaborate on a loss function used in Transformers please.

  • @fabricioarendt.6047
    @fabricioarendt.6047 Před 3 lety

    Really nice high quality video. Much appreciated

  • @TeresaVentaja
    @TeresaVentaja Před rokem

    I am not technically skilled on ML and I understood on a high level how this work. I feel so grateful for this video 🙏

  • @ali_adeeb
    @ali_adeeb Před 3 lety

    Dude you are insanely good! Keep up the good work!

  • @vivekmankar5823
    @vivekmankar5823 Před 2 lety

    This is the best explanation on transformers. Thank you so much for the video.

  • @dustinlucht3133
    @dustinlucht3133 Před rokem

    Thank you so much for this explanation! You saved me a lot of time man

  • @user-yh8xn8vp6d
    @user-yh8xn8vp6d Před rokem

    感谢! Thank you so much, I've looked up all the resources on the Internet but still messed up with the mechanism. It's really a clear and detailed explanation.

  • @ahmedazaz2152
    @ahmedazaz2152 Před 3 lety

    Very well explained ! Keep up the good work.