Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Sdílet
Vložit
  • čas přidán 11. 09. 2024
  • Paper here: arxiv.org/abs/...
    The annotated S4: srush.github.i...
    Notes: drive.google.c...

Komentáře • 14

  • @gabrielmongaras
    @gabrielmongaras  Před 9 měsíci +10

    I forgot to mention that this model is trained like a normal transformer and since everything is causal, you should be able to train using the efficient parallel technique that the transformer uses, a single forward pass for an entire sequence of data.

  • @berkk1993
    @berkk1993 Před 9 měsíci +6

    I just opened your channel to ask your for Mamba video and here I see this video. You are awesome dude, I can express how much you contribute to my life. Thank you many times!!!

  • @AM-yk5yd
    @AM-yk5yd Před 9 měsíci +8

    19:50 I think A is DxN because they use diagonal matrix. They mention S4D, and that paper has example of also linear initialization: "A = -0.5 + 1j * np.pi * np.arange(N//2) # S4D-Lin initialization". It's structured after all.

  • @Anonn724
    @Anonn724 Před 9 měsíci +4

    Please don't stop with this videos. They are extremely useful to go through with you. Much love

  • @orrimoch5226
    @orrimoch5226 Před 8 měsíci +1

    Wow Gabrial, Great job!
    I like your calm attitude and simple way of explaining this complex subject!
    As electrical engineer and as a data scientist I highly appreciate your content!

  • @marshallmcluhan33
    @marshallmcluhan33 Před 9 měsíci +3

    Thanks for the vid. I Can't wait to see if it's overhyped or not hehe. TriDao knows his attention mechanisms.

  • @MatterExplained
    @MatterExplained Před 9 měsíci +3

    thx for doing this paper, was a bit lost on state space models

    • @acasualviewer5861
      @acasualviewer5861 Před 9 měsíci +1

      I was a bit lost.. now I'm more lost. ;)

    • @MatterExplained
      @MatterExplained Před 9 měsíci

      @@acasualviewer5861haha, i did watch some lectures by the first author tho

  • @saculzemog
    @saculzemog Před 8 měsíci +3

    shouldn't 24:28 A,B, and C be LxN not LxD ?

  • @ml-ok3xq
    @ml-ok3xq Před 9 měsíci +2

    I think it's independent because you can diagonalize the state transition matrix and then each value only interacts with itself.

  • @grimsk
    @grimsk Před 9 měsíci +1

    점점 물리학의 개념들에 가까워지는 기분이.. 🙂

  • @yccui
    @yccui Před 5 měsíci

    If all the matrices are learnable, I wonder why the authors use the HiPPO matrix to initialize A? What's the point?

    • @gabrielmongaras
      @gabrielmongaras  Před 5 měsíci

      I was actually wrong about the HiPPO "A" matrix being learnable. I think this matrix is actually static, which makes sense as it adds some basic structure to the model.