Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention

Sdílet
Vložit
  • čas přidán 31. 05. 2024
  • Visual Guide to Transformer Neural Networks (Series) - Step by Step Intuitive Explanation
    Episode 0 - [OPTIONAL] The Neuroscience of "Attention"
    • The Neuroscience of “A...
    Episode 1 - Position Embeddings
    • Visual Guide to Transf...
    Episode 2 - Multi-Head & Self-Attention
    • Visual Guide to Transf...
    Episode 3 - Decoder’s Masked Attention
    • Visual Guide to Transf...
    This video series explains the math, as well as the intuition behind the Transformer Neural Networks that were first introduced by the “Attention is All You Need” paper.
    --------------------------------------------------------------
    References and Other Great Resources
    --------------------------------------------------------------
    Attention is All You Need
    arxiv.org/abs/1706.03762
    Jay Alammar - The Illustrated Transformer
    jalammar.github.io/illustrated...
    The A.I Hacker - Illustrated Guide to Transformers Neural Networks: A step by step explanation
    jalammar.github.io/illustrated...
    Amirhoussein Kazemnejad Blog Post - Transformer Architecture: The Positional Encoding
    kazemnejad.com/blog/transform...
    Yannic Kilcher CZcams Video - Attention is All You Need
    www.youtube.com/watch?v=iDulh...

Komentáře • 612

  • @HeduAI
    @HeduAI  Před 3 lety +34

    *CORRECTIONS*
    A big shoutout to the following awesome viewers for these 2 corrections:
    1. @Henry Wang and @Holger Urbanek - At (10:28), "dk" is actually the hidden dimension of the Key matrix and not the sequence length. In the original paper (Attention is all you need), it is taken to be 512.
    2. @JU PING NG
    - The result of concatenation at (14:58) is supposed to be 7 x 9 instead of 21 x 3 (that is to so that the concatenation of z matrices happens horizontally and not vertically). With this we can apply a nn.Linear(9, 5) to get the final 7 x 5 shape.
    Here are the timestamps associated with the concepts covered in this video:
    0:00 - Recaps of Part 0 and 1
    0:56 - Difference between Simple and Self-Attention
    3:11 - Multi-Head Attention Layer - Query, Key and Value matrices
    11:44 - Intuition for Multi-Head Attention Layer with Examples

    • @amortalbeing
      @amortalbeing Před 2 lety +2

      Where's the first video?

    • @HeduAI
      @HeduAI  Před rokem +4

      ​@@amortalbeing Episode 0 can be found here - czcams.com/video/48gBPL7aHJY/video.html

    • @amortalbeing
      @amortalbeing Před rokem

      @@HeduAI thanks a lot really appreciate it:)

    • @omkiranmalepati1645
      @omkiranmalepati1645 Před rokem

      Awesome...So dk value is 3?

    • @jasonwheeler2986
      @jasonwheeler2986 Před rokem +1

      @@omkiranmalepati1645 d_k = embedding dimensions // number of heads

  • @thegigasurgeon
    @thegigasurgeon Před rokem +155

    Need to say this out loud, I saw Yannic Kilcher's video, read tonnes of materials on internet, went through atleast 7 playlists, and this is the first time I really understood the inner mechanism of Q, K and V vectors in transformers. You did a great job here

  • @nitroknocker14
    @nitroknocker14 Před 2 lety +201

    All 3 parts have been the best presentation I've ever seen of Transformers. Your step-by-step visualizations have filled in so many gaps left by other videos and blog posts. Thank you very much for creating this series.

    • @HeduAI
      @HeduAI  Před 2 lety +9

      This comment made my day :,) Thanks!

    • @bryanbaek75
      @bryanbaek75 Před 2 lety

      Me, too!

    • @lessw2020
      @lessw2020 Před 2 lety +1

      Definitely agree. These videos really crystallize a lot of knowledge, thanks for making this series!

    • @Charmente2014
      @Charmente2014 Před 2 lety

      ش

    • @devstuff2576
      @devstuff2576 Před 2 lety

      ​@@HeduAI absolutely awesome . You are the best.

  • @mrkshsbwiwow3734
    @mrkshsbwiwow3734 Před 6 dny

    This is the best explanation of transformers on CZcams.

  • @nurjafri
    @nurjafri Před 3 lety +71

    Damn. This is exactly what a developer coming from other backgrounds need.
    Simple analogies for a rapid understanding.
    Thanks a ton.
    Keep uploadinggggggggggg plss

    • @Xeneon341
      @Xeneon341 Před 3 lety +1

      Agreed, very well done. You do a very good job of explaining difficult concepts to a non-industry developer (fyi I'm an accountant) without assuming a lot of prior knowledge. I look forward to your next video on masked decoders!!!

    • @HeduAI
      @HeduAI  Před 3 lety +4

      @@Xeneon341 Oh nice! Glad you enjoyed these videos! :)

  • @ML-ok9nf
    @ML-ok9nf Před 7 měsíci +6

    Absolutely underrated, hands down one of the best explanations I've found on the internet

  • @rohtashbeniwal9202
    @rohtashbeniwal9202 Před rokem +4

    this channel needs more love (the way she explains is out of the box). I can say this because I have 4 years of experience in data science, she did a lot of hard work to get so much clarity in concepts (love from India)

    • @HeduAI
      @HeduAI  Před rokem +1

      Thank you Rohtash! You made my day! :) धन्यवाद

  • @rishiraj8225
    @rishiraj8225 Před 12 dny

    Coming back after a year, just to revise the basic concepts. It is still the best video on YT. Thanks Hedu AI

  • @adscript4713
    @adscript4713 Před měsícem +1

    As someone NOT in the field reading the Attention paper, after having watched DOZENS of videos on the topic this is the FIRST explanation that laid it out in an intuitive manner without leaving anything out. I don't know your background, but you are definitely a great teacher. Thank you.

    • @HeduAI
      @HeduAI  Před měsícem

      So glad to hear this :)

  • @forresthu6204
    @forresthu6204 Před 2 lety +3

    Self-attention is a villain that has struck me for a long time. Your presentation has helped me to better understand this genius idea.

  • @HuyLe-nn5ft
    @HuyLe-nn5ft Před 9 měsíci +5

    The important detail that set you apart from the other videos and websites is that not only did you provide the model's architecture with numerous formulas but you also demonstrated them in vectors and matrixes, successfully walked us through each complicated and trivial concept. You really did a good job!

  • @kafaayari
    @kafaayari Před 2 lety

    I won't say this is the best explanation so far, but this is the only explanation. Others are just repeating the original paper.

  • @wireghost897
    @wireghost897 Před 10 měsíci

    Finally a video on transformers that actually makes sense. Not a single lecture video from any of the reputed universities managed to cover the topic with such brilliant clarity.

  • @rohanvaidya3238
    @rohanvaidya3238 Před 3 lety +10

    Best explanation ever on Transformers !!!

  • @rayxi5334
    @rayxi5334 Před rokem +1

    Better than the best Berkeley professor! Amazing!

  • @chaitanyachhibba255
    @chaitanyachhibba255 Před 3 lety +10

    Were you the one who wrote transformers in the fist place, because no one explained it like you did. This is undoubtfully the best info I have seen. I hope you please keep posting more videos. Thanks a lot.

    • @HeduAI
      @HeduAI  Před 3 lety +1

      This comment made my day! :) Thank you.

  • @ja100o
    @ja100o Před rokem +1

    I'm currently reading a book about transformers and was scratching my head over the reason for the multi-headed attention architecture.
    Thank you so much for the clearest explanation yet that finally gave me this satisfying 💡-moment

  • @andybrice2711
    @andybrice2711 Před měsícem

    This really is an excellent explanation. I had some sense that self-attention layers acted like a table of relationships between tokens, but only now do I have more sense of how the Query, Key, and Value mechanism actually works.

  • @malekkamoua5968
    @malekkamoua5968 Před 2 lety +11

    I've been stuck for so long trying to get the Transformer Neural Networks and this is by far the best explanation ! The examples are so fun making it easier to comprehend. Thank you so much for you effort !

  • @EducationPersonal
    @EducationPersonal Před 7 měsíci +1

    This is one of the best Transformer videos on CZcams. I hope CZcams always recommends this Value (V), aka video, as a first Key (K), aka Video Title, when someone uses the Query (Q) as "Transformer"!! 😄

  • @alankarmisra
    @alankarmisra Před 7 měsíci

    3 days, 16 different videos, and your video "just made sense". You just earned a subscriber and a life-long well-wisher.

  • @oludhe7
    @oludhe7 Před měsícem

    Literally the best series on transformers. Even clearer than statquest and luis serrano who also make things very clear

  • @sujithkumar5415
    @sujithkumar5415 Před rokem

    This is quite literally the best attention mechanism video out there guys

  • @shubheshswain5480
    @shubheshswain5480 Před 3 lety +1

    I went through many videos from Coursera, youtube, and some online blogs but none explained so clear about the Query, key, and values. You made my day.

    • @HeduAI
      @HeduAI  Před 3 lety

      Glad to hear this Shubhesh :)

  • @krishnakumarprathipati7186

    The MOST MOST MOST MOST ..........................useful and THE BEST video ever on Multi head attention........Thanks a lot for your work

    • @HeduAI
      @HeduAI  Před 3 lety

      So glad you liked it! :)

  • @devchoudhary8892
    @devchoudhary8892 Před rokem +1

    best, best best explanation on transformer, you are adding so much value to the world.

  • @persianform
    @persianform Před rokem

    The best explanation of attention models on the earth!

  • @madhu1987ful
    @madhu1987ful Před rokem

    Wow. Just wow !! This video needs to be in the top most position when searched for content on transformers and their explanation

    • @HeduAI
      @HeduAI  Před rokem +1

      So glad to see this feedback! :)

  • @wayneqwele8847
    @wayneqwele8847 Před 4 měsíci

    Thank you for taking the time explain from a linear algebra perspective what actually happens. Many teachers on youtube are comfortable just leaving it at math symbols and labels. Showing what actually happens to matrice values has sharpened my intuition of what actually happens under the hood. Thank you.🙏

  • @pedroviniciuspereirajunho7244
    @pedroviniciuspereirajunho7244 Před 11 měsíci

    To visualize the matrices helped me to understand better transformers.
    Again, thank you very much!

  • @Srednicki123
    @Srednicki123 Před rokem

    I just repeat what everybody else said: these videos are the best! thank you for the effort

  • @newbie8051
    @newbie8051 Před 12 dny

    Ah this makes everything simple and make sense
    Thanks for the easy to follow explanation !

  • @user-ne2nr2yi1h
    @user-ne2nr2yi1h Před 5 měsíci

    The best video I've ever seen for explaining transformer.

  • @nizamphoenix
    @nizamphoenix Před 7 měsíci

    Being a professional in this field for ~5years can say this is by far the best explanation of attention.
    Amused as to why this doesn't pop up on YT's recommendation for attention at the top. Probably, YT's attention needs some attention to fix its Q, K, Vs

    • @HeduAI
      @HeduAI  Před 7 měsíci

      You made my day :)

  • @MGMG-li6lt
    @MGMG-li6lt Před 3 lety +19

    Finally! You delivered me from long nights of searching for good explanations about transformers! It was awesome! I can't wait to see the part 3 and beyond!

    • @HeduAI
      @HeduAI  Před 3 lety +1

      Thanks for this great feedback!

    • @HeduAI
      @HeduAI  Před 3 lety +2

      “Part 3 - Decoder’s Masked Attention” is out. Thanks for the wait. Enjoy! Cheers! :D
      czcams.com/video/gJ9kaJsE78k/video.html

  • @sebastiangarciaacosta5468
    @sebastiangarciaacosta5468 Před 3 lety +15

    The best explanation I've ever seen of such a powerful architecture. I'm glad of having found this Joy after searching for positional encoding details while implementing a Transformer from scratch today. Valar Morghulis!

    • @HeduAI
      @HeduAI  Před 3 lety +2

      Valar Dohaeris my friend ;)

  • @franzanders7762
    @franzanders7762 Před 2 lety

    I can't believe how good this is.

  • @darkcrafteur165
    @darkcrafteur165 Před rokem

    Never posting but right now I need to thank you, I really don't believe that it exists a better way to understand self attention than watching your video. Thank you !

  • @cracksomeface
    @cracksomeface Před rokem +1

    I'm a grad student currently applying NLP - this is literally the best explanation of self-attention I have ever seen. Thank you so much for a great vid!

  • @shivam6565
    @shivam6565 Před 11 měsíci

    Finally I understood the concept of query, key and value. Thank you.

  • @bendarodes61
    @bendarodes61 Před 2 lety

    I've watched many video series about transformers, this is by far the best.

  • @hubertkanyamahanga2782
    @hubertkanyamahanga2782 Před 8 měsíci

    I am just speechless, this is unbelievable! Bravo!

  • @giridharnr6742
    @giridharnr6742 Před rokem

    Its one of the best explainations of Transformers. Just mind blowing.

  • @vanhell966
    @vanhell966 Před 18 dny

    Amazing work. Really appreciate you, making complex topics into simple language with the touch of anime and series. Amazing.

  • @RafidAslam
    @RafidAslam Před 2 měsíci

    Thank you so much! This is by far the clearest explanation that I've ever seen on this topic

  • @jamesshady5483
    @jamesshady5483 Před rokem

    This explanation is incredible and better than 99% of what I found on the Internet. Thank you!

  • @SuilujChannel
    @SuilujChannel Před rokem

    thanks for these great videos! The visualizations and extra explanations on details are perfect!

  • @Abhi-qf7np
    @Abhi-qf7np Před 2 lety +1

    You are the best😄😄, This is THE Best explanation I have ever seen on CZcams for Transformer Model, Thank you so much for this video.

  • @SOFTWAREMASTER
    @SOFTWAREMASTER Před 9 měsíci

    Most underrated video about transformers. Going to recommend this to everyone. Thankyou

  • @dominikburkert2824
    @dominikburkert2824 Před 3 lety +1

    best transformer explanation on CZcams!

    • @HeduAI
      @HeduAI  Před 3 lety

      So glad to hear this! :D

  • @aaryannakhat1842
    @aaryannakhat1842 Před 2 lety

    Spectacular explanation! This channel is sooo underrated!

  • @JDechnics
    @JDechnics Před rokem

    Holy shit was this a good explanation! Other blogs literally copy what the paper states (which is kinda confusing), but you explained it in such a intuitive and fun way! Thats what I called talent!!

  • @raunakdey3004
    @raunakdey3004 Před rokem

    Really love coming back to your videos and get a recap on multi layered attention and the transformers! Sometimes I need to make my own specialized attention layers for the dataset in question and sometimes i dunno it just helps to just listen to you talk about transformers and attention ! Really intuitive and helps me to break out of some weird loop of algorithm design I might have gotten myself stuck at. So thank you so so much :D

  • @frankietank8019
    @frankietank8019 Před 8 měsíci +1

    Hands down the best video on transformers I have seen! Thank you for taking your time to make this video.

  • @sowmendas812
    @sowmendas812 Před rokem

    This is literally the best explanation for self-attention I have seen anywhere! Really loved the videos!

  • @zhehanhuang4675
    @zhehanhuang4675 Před 3 lety +1

    really good intuition of self-attention and multi-attention

    • @HeduAI
      @HeduAI  Před 3 lety

      I am glad to hear that :)

    • @zhehanhuang4675
      @zhehanhuang4675 Před 3 lety

      ​@@HeduAI hi, thanks for your reply. When I read some papers, they mentioned ”attention map“, is that the same thing as ”attention filter“ mentioned in your video?

  • @jirasakburanathawornsom1911

    Hand down the best transformer explanation. Thank you very much!

  • @cihankatar7310
    @cihankatar7310 Před rokem

    This is the best explanation of transformers architecture with a lot of basic analogy ! Thanks a lot!

  • @wolfie6175
    @wolfie6175 Před 2 lety

    This is an absolute gem of a video.

  • @prashantchauhan1742
    @prashantchauhan1742 Před 2 lety

    This is Gold. I was confused after going through the paper. And boom this cleared it..

  • @geetanshkalra8340
    @geetanshkalra8340 Před 2 lety

    This is by far the best video to understand Attention Networks. Awesome work !!

  • @ghostvillage1
    @ghostvillage1 Před rokem

    Hands down the best series I've found on the web about transformers. Thank you

  • @MCMelonslice
    @MCMelonslice Před 11 měsíci

    This is the best resource for an intuitive understanding of transformers. I will without a doubt point everyone towards your video series. Thank you so much!

  • @1HourBule
    @1HourBule Před rokem

    The best video on Self-attention.

  • @onthelightway
    @onthelightway Před 2 lety

    Incredibly well explained! Thanks a lot

  • @skramturbo8499
    @skramturbo8499 Před rokem

    I really like the fact that you ask questions within the video. In fact those are the same questions one has and first reading about transformers. Keep up the awesome work!

  • @Scaryder92
    @Scaryder92 Před rokem

    Amazing video, showing how the attention matrix is created and what values it assumes is really awesome. Thanks!

  • @clintcario6749
    @clintcario6749 Před rokem

    These videos are really incredible. Thank you!

  • @fernandonoronha5035
    @fernandonoronha5035 Před 2 lety

    I don't have words to describe how much these videos saved me, thank you!

  • @adarshkone9384
    @adarshkone9384 Před 10 měsíci

    have been trying to understand this topic for a long time , glad I found this video now

  • @adityaghosh8601
    @adityaghosh8601 Před 2 lety

    Blown away by your explanation . You are a great teacher.

  • @minruihu
    @minruihu Před rokem

    it is impressive, you explain so complicated topics in a vivid and easy way!!!

  • @mariosconstantinou8271

    These videos are amazing, thank you so much! Best explanation so far!!

  • @jackderrida
    @jackderrida Před rokem

    Holy crap, this tutorial is good! I've had GPT-4 generate me so many analogies to refresh my understanding of the same concepts you perfectly explain here.

  • @hesona9759
    @hesona9759 Před rokem

    The best video I've ever watched, thank you so much

  • @kazeemkz
    @kazeemkz Před 5 měsíci

    Spot on analysis. Many thanks for the clear explanation.

  • @robertco7
    @robertco7 Před rokem

    This is very clear and well-thought out, thanks!

  • @artukikemty
    @artukikemty Před rokem

    Thanks for posting, by far this is the most didactic Transformer presentation I've ever seen. AMAZING!

  • @bhavyaghai1924
    @bhavyaghai1924 Před 11 měsíci

    Educational + Entertaining. Nice examples and figures. Loved it!

  • @jojo01925
    @jojo01925 Před 2 lety

    Thank you for the video. Best explanation i've seen.

  • @pythondev2631
    @pythondev2631 Před rokem

    The best video on multihead attention by far!

  • @danielarul2382
    @danielarul2382 Před rokem

    One of the best explanations on Attention in my opinion.

  • @oliverhu1025
    @oliverhu1025 Před rokem

    Probably the best explanation of transformers I’ve found online. Read the paper, watched Yannic’s video, some paper reading videos and a few others, the intuition is still missing. This connects the dots, keep up the great work!

  • @hewas321
    @hewas321 Před rokem

    No way. This video is insane!! The most accurate and excellent explanation of self-attention mechanism. Subscribed to your channel!

  • @carlosandresrocharuiz2555

    It´s the most incredible channel on youtube and people doesn't appreciate it :(

  • @melihekinci7758
    @melihekinci7758 Před rokem

    This is the best explanation I've ever seen!

  • @PratikChatse
    @PratikChatse Před 2 lety

    Amazing !! loved the explanation! Subscribed

  • @rasyidanakbarf2482
    @rasyidanakbarf2482 Před 10 měsíci

    i love this vid so much, now i understand whole multi head self attention thing very clearly thanks!

  • @binhle9475
    @binhle9475 Před rokem +1

    Your attention to details and information structuring are just exceptional. The Avatar and GoT references on top were hilarious and make things perfect. You literally made a story out of complex deep learning concept(s). This is just brillant.
    You have such a beautiful mind (if you get the reference :D). Please consider making more videos like this, such a gift is truly precious. May the force be always with you. 🤘

  • @jboyce007
    @jboyce007 Před 5 měsíci

    If only I saw your videos earlier. As everyone in the comments says, these are THE BEST videos on the subject matter found anywhere! Thank you so very much for helping us all!

  • @srikanthkarapanahalli

    Awesome analogy and explanation !

  • @haowenjohnwei7547
    @haowenjohnwei7547 Před 9 měsíci

    The best video I ever had! Thank you very much!

  • @maryamkhademi
    @maryamkhademi Před 2 lety

    Thank you for putting so much effort in the visualization and awesome narration of these series. These are by far the best videos to explain transformers. You should do more of these videos. You certainly have a gift!

    • @HeduAI
      @HeduAI  Před rokem

      Thank you for watching! Yep! Back on it :) Would love to hear which topic/model/algorithm are you most wanting to see on this channel. Will try to cover it in the upcoming videos.

  • @bochengxiao1352
    @bochengxiao1352 Před 2 lety +1

    Thank you so much! It's the best Transformer video ever! Really hope more on other models.

    • @HeduAI
      @HeduAI  Před rokem

      Glad to hear that! :) Do let me know if there are certain models that you would like to see covered in future videos.

  • @AdityaRajPVSS
    @AdityaRajPVSS Před 2 lety

    Awesome and hats off to your conceptual knowledge level understanding

  • @markpadley890
    @markpadley890 Před 3 lety

    Outstanding explanation and well delivered, both verbally and with the graphics. I look forward to the next in this series

    • @HeduAI
      @HeduAI  Před 3 lety

      “Part 3 - Decoder’s Masked Attention” is out. Thanks for the wait. Enjoy! Cheers! :D
      czcams.com/video/gJ9kaJsE78k/video.html

  • @jackziad
    @jackziad Před 3 lety +17

    Your videos are so good at getting complex ideas across in an intuited way. You are like the 3Blue1Brown equivalent for AI. Keep it up and keep producing high-quality video content, at your own pace of course 😋

    • @HeduAI
      @HeduAI  Před 3 lety +8

      3Blue1Brown is one of my favorite channels! Therefore, you comparing these videos to that channel is one of the best compliments ever. Thank you! :)

    • @rishiraj8225
      @rishiraj8225 Před 10 měsíci

      @@HeduAI yes.. this is awesome explanation comparable to 3Blue1Brown.. make more..

  • @jinyunghong
    @jinyunghong Před 2 lety

    Great explanation! Thank you so much!

  • @pakaponwiwat2405
    @pakaponwiwat2405 Před 8 měsíci

    This is the best explanation ever. Thank you a lot!

  • @alirezamogharabi8733
    @alirezamogharabi8733 Před 2 lety

    Great explanation and visualization, thanks a lot. Please keep making such helpful videos.