NLP Demystified 14: Machine Translation With Sequence-to-Sequence and Attention

Sdílet
Vložit
  • čas přidán 2. 08. 2024
  • Course playlist: • Natural Language Proce...
    Whether it's translation, summarization, or even answering questions, a lot of NLP tasks come down to transforming one type of sequence into another. In this module, we'll learn to do that using encoders and decoders. We'll then look at the weaknesses of the standard approach, and enhance our model with Attention. In the demo, we'll build a model to translate languages for us.
    Colab notebook: colab.research.google.com/git...
    Timestamps
    00:00:00 Seq2Seq and Attention
    00:00:37 Seq2Seq as a general problem-solving approach
    00:02:17 Translating language with a seq2seq model
    00:05:53 Machine translation challenges
    00:09:07 Effective decoding with Beam Search
    00:13:04 Evaluating translation models with BLEU
    00:16:23 The information bottleneck
    00:17:56 Overcoming the bottleneck with Attention
    00:22:39 Additive vs Multiplicative Attention
    00:26:47 [DEMO] Neural Machine Translation WITHOUT Attention
    00:50:59 [DEMO] Neural Machine Translation WITH Attention
    01:04:53 Attention as information retrieval
    This video is part of Natural Language Processing Demystified --a free, accessible course on NLP.
    Visit www.nlpdemystified.org/ to learn more.

Komentáře • 41

  • @BuddingAstroPhysicist
    @BuddingAstroPhysicist Před rokem +4

    First of all thanks a lot for these videos , I think they are one of the best on the internet. I have one doubt at 22:15 isn't the input to the scoring function should be h1,s1 instead of h1,s0 for calculating attention for the second output?

    • @futuremojo
      @futuremojo  Před rokem

      Yep! Nice catch. That's a mistake in the diagram. It should be s1.

    • @BuddingAstroPhysicist
      @BuddingAstroPhysicist Před rokem +1

      @@futuremojo Ok thought so , Thanks a lot again :)

    • @byotikram4495
      @byotikram4495 Před 11 měsíci

      @futuremojo In the similar context I want to ask a question. So in this period, you are showing how model will generate outputs in the inference time right ? Since from earlier examples we know that we have to apply teacher forcing while training. Am I correct ?

  • @MadhukaraPhatak-xb4op
    @MadhukaraPhatak-xb4op Před rokem +1

    Really love this series. Thank you for sharing theses videos and notebook.

  • @ueihgnurt
    @ueihgnurt Před 4 měsíci +1

    My god this video is genius.

  • @cheridhanlissassi8716

    Thank you a lot for sharing. Words can't express my gratitude now. The explanation, the illustration are very good. Wish you all the best and thanks once again.

  • @klausschmidt982
    @klausschmidt982 Před rokem +3

    I love your clear and succinct explanations. I really appreciate the effort you put in these videos. Thank you.

    • @futuremojo
      @futuremojo  Před rokem

      Thank you, Klaus. I'm glad you're getting value from it.

  • @futuremojo
    @futuremojo  Před rokem +4

    Timestamps
    00:00:00 Seq2Seq and Attention
    00:00:37 Seq2Seq as a general problem-solving approach
    00:02:17 Translating language with a seq2seq model
    00:05:53 Machine translation challenges
    00:09:07 Effective decoding with Beam Search
    00:13:04 Evaluating translation models with BLEU
    00:16:23 The information bottleneck
    00:17:56 Overcoming the bottleneck with Attention
    00:22:39 Additive vs Multiplicative Attention
    00:26:47 [DEMO] Neural Machine Translation WITHOUT Attention
    00:50:59 [DEMO] Neural Machine Translation WITH Attention
    01:04:53 Attention as information retrieval

  • @mahmoudreda1083
    @mahmoudreda1083 Před rokem +1

    You are the BEST, Thank you.

  • @vipulmaheshwari2321
    @vipulmaheshwari2321 Před 11 měsíci

    In the future, when I launch my own company, your series will be an essential foundation for NLP. Your teaching is top-notch, presentations are engaging, and your exceptional clarity in explanations truly stands out! YOU ARE BEST! Long Live Brother

  • @exxzxxe
    @exxzxxe Před 5 měsíci

    Very well done! Thanks.

  • @ungminhhoai4510
    @ungminhhoai4510 Před rokem

    khóa học của bạn đúng là khai sáng cho những người muốn bắt đầu học ML

  • @johnbarnes7485
    @johnbarnes7485 Před rokem +1

    Loving this series.

    • @futuremojo
      @futuremojo  Před rokem +1

      Thanks, John. Working hard on the last module on transformers.

    • @johnbarnes7485
      @johnbarnes7485 Před rokem

      @@futuremojo Great!

  • @anujsolanki5588
    @anujsolanki5588 Před 6 měsíci

    Best channel

  • @horoshuhin
    @horoshuhin Před rokem +3

    I can't express how great this series on NLP is. Every video is like a Christmas present. I'm really interested in how you approached your learning about NLP. What have you found helped you along the way? thank you Nitin

    • @futuremojo
      @futuremojo  Před rokem +9

      Thanks for the kind, motivating message! My approach to NLP is similar to most technical things. The top things:
      1) Use multiple resources (books, videos, blog posts, etc). We're in a golden age of autodidactism and we can blend multiple resources to get a cohesive picture of a subject. And we don't even need to go through every resource in its entirety. Perhaps this book explains one concept more intuitively than the other, but the other one fleshes out the math better, and this other resource shows one approach to implementing the concept. Another benefit is that multiple resources act like a voting mechanism. A *lot* of materials out there, particularly the blog posts, have conflicting facts. Trying to get at what's true and what's wrong really forces one to dig deep at the fundamentals.
      2) Start at the right level of difficulty, then expand outwards. In my case, it was important to cover both theory and practice, so I'd start with a resource that I could understand. If it was too difficult, I looked for something more practical. Once I was comfortable with that, I revisited the more theoretical stuff, but just enough to serve the goal of helping others gain a solid grounding in the subject. And the more I learned, the further I could expand.
      3) Lean into the pain. Learning this stuff was often frustrating because so much out there is hand-wavy or opaque or doesn't answer "why?". Implementing this stuff was even more painful and frustrating. But I kept going because I knew I would eventually get it. And it's important that one believes one can learn anything if one persists long enough.
      4) Few things solidify your understanding and act as a reality check than teaching it to others.
      5) And of course, actually putting things into practice through code.

  • @rabailkamboh8857
    @rabailkamboh8857 Před rokem +1

    thats a brilliant way to explain these difficult topics. Thankyou so much . Also please make a video on transformer models for neural machine translation. thats quite a hot topic .much needed also

    • @futuremojo
      @futuremojo  Před rokem +1

      Thank you, Rabail. Transformers (along with pretraining and fine-tuning) is coming up next. Sign up at the course site for updates.

  • @CSKdataLab
    @CSKdataLab Před rokem +1

    Your videos are one of the best. Flow of topics and concise language. But, when explaining attention theory and code, two terms are beings used interchangeably --> "encoder output sequence" (y1,y2,y3 and so on) and "encoder hidden states" (h1, h2, h3 and so on). This is creating lot of confusion, making it difficult to follow along the tutorial .... maybe something wrong with my understanding.
    Can you please make a similar series explaining "Generative Models for Images" and "Reinforcement Learning".

  • @SaikatDas-jd9xd
    @SaikatDas-jd9xd Před 8 měsíci

    Hi there! Loved the series on NLP. Can you please share any link or resource on how to code up the accuracy function like you did with loss? I would like to calculate accuracy of the epochs.

  • @ateyashuborna1554
    @ateyashuborna1554 Před rokem

    hey, the video is really amazing, however, I was hoping if you could share how to implement and use the BLEU score in your model?

  • @ristaaryantiwi3795
    @ristaaryantiwi3795 Před rokem

    when I try to execute method translator_trainer.fit()
    NameError: name 'encoder' is not defined

  • @aliabasnezhad7872
    @aliabasnezhad7872 Před rokem +1

    great playlist! are you planning to add more videos to this playlist? thanks!

    • @futuremojo
      @futuremojo  Před rokem +1

      Thanks, Ali. Yep, there is one more module that's going to be released this month. It's going to cover transformers, pre-training, and transfer learning. We'll go over transformers in depth, code one from scratch, and then learn how to use pre-trained transformers for our own projects. Sign up for updates on the course homepage at nlpdemystified.org.

    • @aliabasnezhad7872
      @aliabasnezhad7872 Před rokem +1

      @@futuremojo Great, looking forward to it!

  • @curdyco
    @curdyco Před rokem

    Why the padding (0) is converted to at 34:57 ????
    The index for is 1 when i print source_tonkenizer.word_index so why 0 is converted to ?
    Doea this mean that 0 and 1 index both are reserved for ???

  • @onlysainaa5764
    @onlysainaa5764 Před 9 měsíci

    what is this model accuracy? or bleu score? how to solve it brother?

  • @amparoconsuelo9451
    @amparoconsuelo9451 Před rokem

    Where, when and how can I download the corresponding source code your clear explanation of NLP together with the libraries? I am halfway towards watching all your videos I have downloaded in my cell phone. I will watch them again with the source code.

  • @sebastianbejarano350
    @sebastianbejarano350 Před rokem

    UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4706: character maps to :(

  • @jenibites
    @jenibites Před rokem +2

    How does it can translate “my name is udi” if it never saw the word Udi”

    • @futuremojo
      @futuremojo  Před rokem

      If it's a word-level model, it can't. If your model uses characters or subwords, *maybe* it can translate the individual components of the word to get something sensible, but there needs to be enough data for that to happen. Names, especially from low-resource languages, are a special case which are hard to translate without prior exposure.

    • @jenibites
      @jenibites Před rokem

      @@futuremojo what about CopyNet? Could it help?

    • @futuremojo
      @futuremojo  Před rokem

      @@jenibites I don't know. I don't know what CopyNet does and haven't looked at it.
      How would a system translate the Hungarian name "Andras" to its English-equivalent of "Andrew" without data? And in practical cases, I imagine one wouldn't want to translate the name at all but keep it the same.