NLP Demystified 14: Machine Translation With Sequence-to-Sequence and Attention
Vložit
- čas přidán 2. 08. 2024
- Course playlist: • Natural Language Proce...
Whether it's translation, summarization, or even answering questions, a lot of NLP tasks come down to transforming one type of sequence into another. In this module, we'll learn to do that using encoders and decoders. We'll then look at the weaknesses of the standard approach, and enhance our model with Attention. In the demo, we'll build a model to translate languages for us.
Colab notebook: colab.research.google.com/git...
Timestamps
00:00:00 Seq2Seq and Attention
00:00:37 Seq2Seq as a general problem-solving approach
00:02:17 Translating language with a seq2seq model
00:05:53 Machine translation challenges
00:09:07 Effective decoding with Beam Search
00:13:04 Evaluating translation models with BLEU
00:16:23 The information bottleneck
00:17:56 Overcoming the bottleneck with Attention
00:22:39 Additive vs Multiplicative Attention
00:26:47 [DEMO] Neural Machine Translation WITHOUT Attention
00:50:59 [DEMO] Neural Machine Translation WITH Attention
01:04:53 Attention as information retrieval
This video is part of Natural Language Processing Demystified --a free, accessible course on NLP.
Visit www.nlpdemystified.org/ to learn more.
First of all thanks a lot for these videos , I think they are one of the best on the internet. I have one doubt at 22:15 isn't the input to the scoring function should be h1,s1 instead of h1,s0 for calculating attention for the second output?
Yep! Nice catch. That's a mistake in the diagram. It should be s1.
@@futuremojo Ok thought so , Thanks a lot again :)
@futuremojo In the similar context I want to ask a question. So in this period, you are showing how model will generate outputs in the inference time right ? Since from earlier examples we know that we have to apply teacher forcing while training. Am I correct ?
Really love this series. Thank you for sharing theses videos and notebook.
My god this video is genius.
Thank you a lot for sharing. Words can't express my gratitude now. The explanation, the illustration are very good. Wish you all the best and thanks once again.
I love your clear and succinct explanations. I really appreciate the effort you put in these videos. Thank you.
Thank you, Klaus. I'm glad you're getting value from it.
Timestamps
00:00:00 Seq2Seq and Attention
00:00:37 Seq2Seq as a general problem-solving approach
00:02:17 Translating language with a seq2seq model
00:05:53 Machine translation challenges
00:09:07 Effective decoding with Beam Search
00:13:04 Evaluating translation models with BLEU
00:16:23 The information bottleneck
00:17:56 Overcoming the bottleneck with Attention
00:22:39 Additive vs Multiplicative Attention
00:26:47 [DEMO] Neural Machine Translation WITHOUT Attention
00:50:59 [DEMO] Neural Machine Translation WITH Attention
01:04:53 Attention as information retrieval
You are the BEST, Thank you.
In the future, when I launch my own company, your series will be an essential foundation for NLP. Your teaching is top-notch, presentations are engaging, and your exceptional clarity in explanations truly stands out! YOU ARE BEST! Long Live Brother
Thank you!
Very well done! Thanks.
khóa học của bạn đúng là khai sáng cho những người muốn bắt đầu học ML
Loving this series.
Thanks, John. Working hard on the last module on transformers.
@@futuremojo Great!
Best channel
I can't express how great this series on NLP is. Every video is like a Christmas present. I'm really interested in how you approached your learning about NLP. What have you found helped you along the way? thank you Nitin
Thanks for the kind, motivating message! My approach to NLP is similar to most technical things. The top things:
1) Use multiple resources (books, videos, blog posts, etc). We're in a golden age of autodidactism and we can blend multiple resources to get a cohesive picture of a subject. And we don't even need to go through every resource in its entirety. Perhaps this book explains one concept more intuitively than the other, but the other one fleshes out the math better, and this other resource shows one approach to implementing the concept. Another benefit is that multiple resources act like a voting mechanism. A *lot* of materials out there, particularly the blog posts, have conflicting facts. Trying to get at what's true and what's wrong really forces one to dig deep at the fundamentals.
2) Start at the right level of difficulty, then expand outwards. In my case, it was important to cover both theory and practice, so I'd start with a resource that I could understand. If it was too difficult, I looked for something more practical. Once I was comfortable with that, I revisited the more theoretical stuff, but just enough to serve the goal of helping others gain a solid grounding in the subject. And the more I learned, the further I could expand.
3) Lean into the pain. Learning this stuff was often frustrating because so much out there is hand-wavy or opaque or doesn't answer "why?". Implementing this stuff was even more painful and frustrating. But I kept going because I knew I would eventually get it. And it's important that one believes one can learn anything if one persists long enough.
4) Few things solidify your understanding and act as a reality check than teaching it to others.
5) And of course, actually putting things into practice through code.
thats a brilliant way to explain these difficult topics. Thankyou so much . Also please make a video on transformer models for neural machine translation. thats quite a hot topic .much needed also
Thank you, Rabail. Transformers (along with pretraining and fine-tuning) is coming up next. Sign up at the course site for updates.
Your videos are one of the best. Flow of topics and concise language. But, when explaining attention theory and code, two terms are beings used interchangeably --> "encoder output sequence" (y1,y2,y3 and so on) and "encoder hidden states" (h1, h2, h3 and so on). This is creating lot of confusion, making it difficult to follow along the tutorial .... maybe something wrong with my understanding.
Can you please make a similar series explaining "Generative Models for Images" and "Reinforcement Learning".
Hi there! Loved the series on NLP. Can you please share any link or resource on how to code up the accuracy function like you did with loss? I would like to calculate accuracy of the epochs.
hey, the video is really amazing, however, I was hoping if you could share how to implement and use the BLEU score in your model?
when I try to execute method translator_trainer.fit()
NameError: name 'encoder' is not defined
great playlist! are you planning to add more videos to this playlist? thanks!
Thanks, Ali. Yep, there is one more module that's going to be released this month. It's going to cover transformers, pre-training, and transfer learning. We'll go over transformers in depth, code one from scratch, and then learn how to use pre-trained transformers for our own projects. Sign up for updates on the course homepage at nlpdemystified.org.
@@futuremojo Great, looking forward to it!
Why the padding (0) is converted to at 34:57 ????
The index for is 1 when i print source_tonkenizer.word_index so why 0 is converted to ?
Doea this mean that 0 and 1 index both are reserved for ???
what is this model accuracy? or bleu score? how to solve it brother?
Where, when and how can I download the corresponding source code your clear explanation of NLP together with the libraries? I am halfway towards watching all your videos I have downloaded in my cell phone. I will watch them again with the source code.
The notebooks are here: github.com/nitinpunjabi/nlp-demystified
@@futuremojo Thanks.
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4706: character maps to :(
Adding encoding='utf-8' to open solved it
How does it can translate “my name is udi” if it never saw the word Udi”
If it's a word-level model, it can't. If your model uses characters or subwords, *maybe* it can translate the individual components of the word to get something sensible, but there needs to be enough data for that to happen. Names, especially from low-resource languages, are a special case which are hard to translate without prior exposure.
@@futuremojo what about CopyNet? Could it help?
@@jenibites I don't know. I don't know what CopyNet does and haven't looked at it.
How would a system translate the Hungarian name "Andras" to its English-equivalent of "Andrew" without data? And in practical cases, I imagine one wouldn't want to translate the name at all but keep it the same.