Self-Attention with Relative Position Representations - Paper explained

Sdílet
Vložit
  • čas přidán 29. 08. 2024

Komentáře • 41

  • @nicohambauer
    @nicohambauer Před 3 lety +7

    Great series covering different kinds of positional encodings! Love it!

  • @hkazami
    @hkazami Před rokem +2

    Great explanations of important technical key points in an intuitive way!

  • @WhatsAI
    @WhatsAI Před 3 lety +9

    Thank you for pursuing this series! Love the NLP-related videos as I am not in the field of AI. Always well explained, seriously, props!
    Please do more of them 🙌

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +1

      Hey, thanks! But now ai am confused: What's AI is not in the field of AI? 😅

    • @WhatsAI
      @WhatsAI Před 3 lety +2

      @@AICoffeeBreak I haven't worked with NLP tasks!

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +1

      @@WhatsAI aaa, you mean "in the field of NLP". I think you typed AI instead of NLP by mistake. Unless you think that NLP == AI. 😅

  • @subhadipnandi294
    @subhadipnandi294 Před měsícem +1

    This is incredibly useful. Thank you so much

  • @amphivalo
    @amphivalo Před 2 lety +3

    Such a good explanation! Thank you so much

  • @SuilujChannel
    @SuilujChannel Před 2 lety +3

    Thank you very much for these videos! I like your style too :D

  • @seyeeet8063
    @seyeeet8063 Před 2 lety +4

    Thanks for the video, I have a general suggestion that I think will improve the quality of your videos.
    it would be really helpful if you can provide simple numerical example in addition to the explanation of the video to make the viewer better understand the concept.
    considering your talent in visualization it would not be really hard for you and will add a huge help for understanding

  • @user_2439
    @user_2439 Před 2 lety +1

    So awesome explanation!! It really helped to understand this concept. thank you:)

  • @mkamp
    @mkamp Před 10 měsíci +2

    Great video as always. ❤
    No RoPE video yet, right?

    • @AICoffeeBreak
      @AICoffeeBreak  Před 10 měsíci +3

      No, sorry. So many topics to cover, so little time.

  • @jqd3589
    @jqd3589 Před 2 lety +2

    Really a good job,I learn lot from this series of video. while could you please list some of the paper about relative position using in graph .very gratefull

  • @justinwhite2725
    @justinwhite2725 Před 3 lety +5

    Oh how awesome. I've been thinking about positional encodings for images where I jabe broken the image into grids. I've been wondering exactly whether I should track both the x and y positionals or just treat it like an array and only have one dimension for all segments.
    My hypothesis was that the neural net would figure it out either way.

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +4

      And did they? Ah, you do not have any experimental results yet.
      Plot twist: order does not matter anyway (I am half-joking and referring to those papers in NLP showing that language models care unexpectedly little about word order).
      Reference: "Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little" by Sinha et al. 2021
      arxiv.org/pdf/2104.06644.pdf

  • @ambujmittal6824
    @ambujmittal6824 Před 3 lety +6

    Notification squad, where are you?

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +5

      Quickest comment in the history of this channel. 😂 I pushed the "publish" button just a few seconds ago!

  • @AliAhmad-vm2pk
    @AliAhmad-vm2pk Před 7 měsíci +1

    Great!

  • @DANstudiosable
    @DANstudiosable Před 3 lety +4

    I asked for this first long back but my comment not mentioned in the video😢
    Anyway, great explanation as always🎉🥳

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +5

      Sorry, I could not remember where you made that comment to screenshot it. I only went to the comments of the first video in the positional encoding series where I asked if people want to see relational position representations. But you are the reason I was motivated to do the whole encoding series, so thanks!

  • @hannesstark5024
    @hannesstark5024 Před 3 lety +6

    👌

  • @ScottzPlaylists
    @ScottzPlaylists Před 10 měsíci

    I think I see an error in some slides.
    The slide you show at czcams.com/video/DwaBQbqh5aE/video.html and several times earlier than that, seems to be wrong.
    The token X3 column, seems to have a number pattern for positional embeddings, that doesn't match the patterns in the other columns. It seems it should be a31...a35 instead of a31,a12,a13,a14,a15.
    Am I missing something?

  • @hassenhadj7665
    @hassenhadj7665 Před 2 lety +2

    pleeease can you explain the Dual Aspect Collaborative Transformer

  • @diogoaraujo495
    @diogoaraujo495 Před rokem +2

    Hello! Awesome explanation!
    I just got a small doubt (hope that someone can explain it).
    So, self-attention is itself permutation-invariant unless you use positional encoding.
    It makes sense that absolute positional encoding makes the self-attention mechanism permutation-variant. However, I couldn't figure out if the same happens with relative positional enccoding. Beacause, if the in relative positional encoding we only care about the distance between the tokens, shouldn't this make the self-attention mechanism permutation-invariant ?
    So my question is: Does the use of relative positional encoding make the self-attention mechanism permutation-invariant (unlike if we use absolute positional encoding) ?

    • @AICoffeeBreak
      @AICoffeeBreak  Před rokem +1

      Thanks for this question, I'm happy I finally find some time to respond to this.
      The short answer is: relative positional embeddings do not make / keep the transformer permutation invariant.
      In other words, both absolute and relative positional embeddings make the transformer permutation variant.
      Take for example a sentence of two tokens A and B. Both relative and absolute encodings assign a different value to the two positions. So exchanging A and B will assign them different vectors.

    • @diogoaraujo495
      @diogoaraujo495 Před rokem +1

      @@AICoffeeBreak Okay, thanks!!

    • @ludvigericson6930
      @ludvigericson6930 Před rokem

      They are invariant to isomorphisms of the graph. In a path digraph such as for sequences, there are no isomorphisms. However for cyclic path digraphs of K vertices there are K symmetries. For an undirected path graph, we would have two isomorphisms: forwards and backwards.

    • @ludvigericson6930
      @ludvigericson6930 Před rokem

      For a 2D lattice graph, I think mirroring is symmetrical but I’m not sure. This is assuming that you have an undirected graph.

    • @ludvigericson6930
      @ludvigericson6930 Před rokem

      Undirected implies, in terms of the notation in the video, that a_ij = w_{j-i} = w_{i-j} = a_ji.

  • @gouthaamiramesh714
    @gouthaamiramesh714 Před rokem

    Can you please make video on sensor data with transformer and video on hybrid cnn transformers

  • @wilfredomartel7781
    @wilfredomartel7781 Před 3 dny +1

    🎉❤

  • @neilteng4161
    @neilteng4161 Před rokem

    I think there is typo in the matrix at 5:11

  • @wibulord926
    @wibulord926 Před 2 lety +2

    heello

  • @nurullahates4585
    @nurullahates4585 Před 2 lety

    Good, but you speak too fast

    • @SuilujChannel
      @SuilujChannel Před 2 lety +6

      I think it's the perfect speed. You could slow down the video in the CZcams player settings yourself.