Future Mojo
Future Mojo
  • 15
  • 213 544
NLP Demystified 15: Transformers From Scratch + Pre-training and Transfer Learning With BERT/GPT
CORRECTION:
00:34:47: that should be "each a dimension of 12x4"
Course playlist: czcams.com/play/PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS.html
Transformers have revolutionized deep learning. In this module, we'll learn how they work in detail and build one from scratch. We'll then explore how to leverage state-of-the-art models for our projects through pre-training and transfer learning. We'll learn how to fine-tune models from Hugging Face and explore the capabilities of GPT from OpenAI. Along the way, we'll tackle a new task for this course: question answering.
Colab notebook: colab.research.google.com/github/futuremojo/nlp-demystified/blob/main/notebooks/nlpdemystified_transformers_and_pretraining.ipynb
Timestamps
00:00:00 Transformers from scratch
00:01:05 Subword tokenization
00:04:27 Subword tokenization with byte-pair encoding (BPE)
00:06:53 The shortcomings of recurrent-based attention
00:07:55 How Self-Attention works
00:14:49 How Multi-Head Self-Attention works
00:17:52 The advantages of multi-head self-attention
00:18:20 Adding positional information
00:20:30 Adding a non-linear layer
00:22:02 Stacking encoder blocks
00:22:30 Dealing with side effects using layer normalization and skip connections
00:26:46 Input to the decoder block
00:27:11 Masked Multi-Head Self-Attention
00:29:38 The rest of the decoder block
00:30:39 [DEMO] Coding a Transformer from scratch
00:56:29 Transformer drawbacks
00:57:14 Pre-Training and Transfer Learning
00:59:36 The Transformer families
01:01:05 How BERT works
01:09:38 GPT: Language modelling at scale
01:15:13 [DEMO] Pre-training and transfer learning with Hugging Face and OpenAI
01:51:48 The Transformer is a "general-purpose differentiable computer"
This video is part of Natural Language Processing Demystified --a free, accessible course on NLP.
Visit www.nlpdemystified.org/ to learn more.
zhlédnutí: 66 374

Video

NLP Demystified 14: Machine Translation With Sequence-to-Sequence and Attention
zhlédnutí 13KPřed rokem
Course playlist: czcams.com/play/PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS.html Whether it's translation, summarization, or even answering questions, a lot of NLP tasks come down to transforming one type of sequence into another. In this module, we'll learn to do that using encoders and decoders. We'll then look at the weaknesses of the standard approach, and enhance our model with Attention. In the d...
NLP Demystified 13: Recurrent Neural Networks and Language Models
zhlédnutí 9KPřed rokem
Course playlist: czcams.com/play/PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS.html We'll learn how to get computers to generate text through a technique called recurrence. We'll also look at the weaknesses of the bag-of-words approaches we've seen so far, how to capture the information in word order, and in the demo, we'll build a part-of-speech tagger and text-generating language model. Colab notebook: ...
NLP Demystified 12: Capturing Word Meaning with Embeddings
zhlédnutí 8KPřed 2 lety
Course playlist: czcams.com/play/PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS.html We'll learn a method to vectorize words such that words with similar meanings have closer vectors (aka "embeddings"). This was a breakthrough in NLP and boosted performance on a variety of NLP problems while addressing the shortcomings of previous approaches. We'll look at how to create these word embeddings and how to use...
NLP Demystified 11: Essential Training Techniques for Neural Networks
zhlédnutí 6KPřed 2 lety
Course playlist: czcams.com/play/PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS.html In our previous deep dive into neural networks, we looked at the core mechanisms behind how they learn. In this video, we'll explore all the additional details when it comes to effectively training them. We'll look at how to converge faster to a minimum, when to use certain activation functions, when and how to scale our f...
NLP Demystified 10: Neural Networks From Scratch
zhlédnutí 13KPřed 2 lety
Course playlist: czcams.com/play/PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS.html Neural Networks have led to incredible breakthroughs in all things AI, but at the core, they're pretty simple. In this video, we'll learn how neural networks work and how they "learn". By the end, you'll have a clear understanding of how neural networks work under the hood. We'll take a bottom-up approach starting with sim...
NLP Demystified 9: Automatically Finding Topics in Documents with Latent Dirichlet Allocation
zhlédnutí 9KPřed 2 lety
Course playlist: czcams.com/play/PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS.html What do you do when you need to make sense of a pile of documents and have no other information? In this video, we'll learn one approach to this problem using Latent Dirichlet Allocation. We'll cover how it works, then build a model with spaCy and Gensim to automatically discover topics present in a document and to search ...
NLP Demystified 8: Text Classification With Naive Bayes (+ precision and recall)
zhlédnutí 9KPřed 2 lety
Course playlist: czcams.com/play/PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS.html In this module, we'll apply everything we've learned so far to a core task in NLP: text classification. We'll learn: - how to derive Bayes' theorem - how the Naive Bayes classifier works under the hood - how to train a Naive Bayes classifier in scikit-learn and along the way, deal with issues that come up. - how things can...
NLP Demystified 7: Building Models (ML modelling overview, bias, variance, evaluation)
zhlédnutí 6KPřed 2 lety
Course playlist: czcams.com/play/PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS.html Through a high-level overview of modelling, we'll - clearly define "machine learning" - look at the different types of machine learning - learn how to evaluate model performance - learn what bias and variance are - see what to do about overfitting and underfitting - explore practical concerns for model deployment. If you'r...
NLP Demystified 6: TF-IDF and Simple Document Search
zhlédnutí 8KPřed 2 lety
Course playlist: czcams.com/play/PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS.html We look at the problems of the previous bag-of-words approach, then use an improved technique (TF-IDF) to overcome them. In the demo, we'll use spaCy and scikit-learn to build TF-IDF vectors and build a simple document search engine. Colab notebook: colab.research.google.com/github/futuremojo/nlp-demystified/blob/main/note...
NLP Demystified 5: Basic Bag-of-Words and Measuring Document Similarity
zhlédnutí 10KPřed 2 lety
Course playlist: czcams.com/play/PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS.html After preprocessing our text, we take our first step in turning text into numbers so our machines can start working with them. We'll explore: - a simple "bag-of-words" (BoW) approach. - learn how to use cosine similarity to measure document similarity. - the shortcomings of this BoW approach. In the demo, we'll use a combi...
NLP Demystified 4: Advanced Preprocessing (part-of-speech tagging, entity tagging, parsing)
zhlédnutí 11KPřed 2 lety
Course playlist: czcams.com/play/PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS.html We'll look at tagging our tokens with useful information including part-of-speech tags and named entity tags. We'll also explore different types of sentence parsing to help extract the meaning of a sentence. In the demo, we'll explore how to get these things done with spaCy and how to use the library's "matchers" and other...
NLP Demystified 3: Basic Preprocessing (case-folding, stop words, stemming, lemmatization)
zhlédnutí 10KPřed 2 lety
Course playlist: czcams.com/play/PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS.html Depending on our goal, we may preprocess text further. We'll cover case-folding, stop word removal, stemming, and lemmatization. We'll go over their use cases, their tradeoffs, and how to get them done using spaCy. Colab notebook: colab.research.google.com/github/futuremojo/nlp-demystified/blob/main/notebooks/nlpdemystifie...
NLP Demystified 2: Text Tokenization
zhlédnutí 13KPřed 2 lety
Course playlist: czcams.com/play/PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS.html The usual first step in NLP is to chop our documents into smaller pieces in a process called Tokenization. We'll look at the challenges involved and how to get it done. Colab notebook: colab.research.google.com/github/futuremojo/nlp-demystified/blob/main/notebooks/nlpdemystified_preprocessing.ipynb Timestamps: 00:00 Tokeni...
NLP Demystified 1: Introduction
zhlédnutí 23KPřed 2 lety
Course playlist: czcams.com/play/PLw3N0OFSAYSEC_XokEcX8uzJmEZSoNGuS.html In this introduction, we learn what makes NLP useful, what makes it challenging, and what we'll learn in this course. Timestamps: 00:00:00 Introduction 00:00:27 Applications of NLP 00:01:24 What makes NLP challenging 00:03:58 The evolution of NLP 00:05:16 What you'll get from this course 00:06:04 What we'll cover in this c...

Komentáře

  • @thecubeguy2087
    @thecubeguy2087 Před dnem

    First of all, Thank you so much for this course. I have understood each and everything till this point. I have a couple of questions though. I am still confused about the relationship between Latent Dirchlet Allocation and Collapsed Gibbs Sampling. You talked about Latent Dirchlet Allocation for a long time but then kind of shifted to CGS. I am kind of having trouble understanding the similarities. (i get the multiplying thing and finding out the probabiities and shooting the dart thing). I'd love an explanation. Thanks

  • @Shirley_BK
    @Shirley_BK Před 5 dny

    why aren't university teachers like you?

  • @nisargpatel1443
    @nisargpatel1443 Před 26 dny

    Concise and easily understandable. Thanks a lot for the series.

  • @weeb9133
    @weeb9133 Před měsícem

    Just completed the entire playlist. It was an absolute delight to watch, this last lecture was a favorite of mine because of you explained it in the form of a story. Thank you so much for sharing this knowledge with us and hope to learn more from you :D

  • @prashlovessamosa
    @prashlovessamosa Před měsícem

    Where are you buddy cook something please

  • @theindianrover2007
    @theindianrover2007 Před 2 měsíci

    Awesome

  • @Engineering_101_
    @Engineering_101_ Před 2 měsíci

    11

  • @MachineLearningZuu
    @MachineLearningZuu Před 2 měsíci

    Ma bro just drop the "Best NLP Course" on Planet Earth and disappeared.

  • @basiaostaszewska7775
    @basiaostaszewska7775 Před 2 měsíci

    Thank you it was very well done!

  • @novantha1
    @novantha1 Před 2 měsíci

    What was provided: A high quality, easily digestible, and calm introduction to Transformers that could take almost anyone from zero to GPT in a single video. What I got: It will probably take me longer than I'd like to get good at martial arts.

  • @Momiji1998
    @Momiji1998 Před 3 měsíci

    This series is incredible! I can't believe we get to access such content for free online... what an era

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Před 3 měsíci

    This is really high quality content. Why did it take so long for CZcams to recommend this.

  • @peace-it4rg
    @peace-it4rg Před 3 měsíci

    bro really made transformer video with transformer

  • @SawanSalhotra
    @SawanSalhotra Před 3 měsíci

    Best explanation. Crisp and exhaustive.

  • @varshapandey_daily_lifestyle

    It's really amazing work of NLP

  • @chrs2436
    @chrs2436 Před 3 měsíci

    the code in the notebook doesnt work 😮‍💨

  • @ueihgnurt
    @ueihgnurt Před 4 měsíci

    My god this video is genius.

  • @exxzxxe
    @exxzxxe Před 4 měsíci

    Very well done! Thanks.

  • @AshishMishra-rk4df
    @AshishMishra-rk4df Před 4 měsíci

    Great work 👍

  • @gauravmalik3911
    @gauravmalik3911 Před 4 měsíci

    This is one of the best video I enjoyed ever while learning machine learning. Explaining conditional probability to naive Bayes demo in detailed and still in concise way is art. Wow, this is excellent playlist.

  • @gauravmalik3911
    @gauravmalik3911 Před 4 měsíci

    Excited for this course

  • @kevinoudelet
    @kevinoudelet Před 5 měsíci

    Thank you !

  • @kevinoudelet
    @kevinoudelet Před 5 měsíci

    Thank you !!!

  • @kevinoudelet
    @kevinoudelet Před 5 měsíci

    thx so much

  • @LakshmiDevi_jul31
    @LakshmiDevi_jul31 Před 5 měsíci

    I'm a research Scholar and came across your channel. I was truly amazed at how you broke through the concepts and explained them.

  • @user-qj3ig7qz3y
    @user-qj3ig7qz3y Před 5 měsíci

    don't understand why always 512 as inputtokens.. how to make it bigger size..

  • @xuantungnguyen9719
    @xuantungnguyen9719 Před 5 měsíci

    Like what the hell. You made it so simple to learn. I kept consuming and taking notes, adding thoughts, perspective, feeling super productive. (I'm using Obsidian to link concepts). About three years ago the best explanation I could get is probably from Andrew Ng and I have to admit yours is so much better. My opinion might be biased since I was going back and forth in NLP times after times, but looking at the comment secion I'm pretty sure my opinion is validated

  • @puzan7685
    @puzan7685 Před 5 měsíci

    Hello goddddddddddddd. Thank you so much

  • @arrekusua
    @arrekusua Před 5 měsíci

    Thank you so much for these videos!! Definitely one of the best videos on the NLP out there!

  • @AhonaCreatorUnicorn2024
    @AhonaCreatorUnicorn2024 Před 5 měsíci

    Excellent video

  • @anujsolanki5588
    @anujsolanki5588 Před 6 měsíci

    Best channel

  • @techaztech2335
    @techaztech2335 Před 6 měsíci

    I am bit confused about the cosine similarity metric. I thought the cosine similarity range is from -1 to 1, instead of 0 to 1. I've seen 0 to 1 threshold being used elsewhere as well but I do notice more popular embedding models generate -ve vector elements and naturally the normalized versions produce ranges from -1 to 1. Can you please clarify this? Cuz I've been struggling to wrap my head around this.

  • @chenw1923
    @chenw1923 Před 6 měsíci

    You kind of sound like Casually Explained

  • @kazeemkz
    @kazeemkz Před 6 měsíci

    Manh thanks for the detailed explanation. Your video has been helpful.

  • @lochanaemandi6405
    @lochanaemandi6405 Před 6 měsíci

    In SGNS, when you are talking about matrices of context and target embeddings (10000 * 300), what do these matrices have/contain before the training has started (collection of one hot encodings or arbitrary numbers)? At 17:00, I also did not understand how only taking the target word embeddings would be sufficient to capture similarity between words.

  • @wilsonbecker1881
    @wilsonbecker1881 Před 6 měsíci

    Best ever

  • @lochanaemandi6405
    @lochanaemandi6405 Před 6 měsíci

    omggg, kudos to your efforts!!!!! I really wish you have more subscribers

  • @daryladhityahenry
    @daryladhityahenry Před 6 měsíci

    Hi! I'm on my first episode currently in this lesson, I really excited and hope to learn much. Did you will create another tutorial on these kind of topics? Or only these 15 videos will kind of transform me into some expert ( remember, "kinda" expert ) in NLP and transformers so I can do pretrained my self and finetune it perfectly ? ( Assuming I have capability to gather the data? ) Thankkssss

  • @SatyaRao-fh4ny
    @SatyaRao-fh4ny Před 6 měsíci

    Very helpful set of videos. However, it is unclear how is it that the weights determined for one set of input values X1 and the corresponding expected output value Y1, will hod for any other set of input values X2 and their corresponding output value Y2? In your example, the weights computed for inputs x1=2, x2=3 and expected output y=0, maybe different for any other inputs and expected output.

  • @youmna4045
    @youmna4045 Před 6 měsíci

    There really aren't enough words to express how thankful I am for this awesome content. It's amazing that you've made it available to everyone for free. Thank you so much May Allah(GOD) help you like you help other

  • @SatyaRao-fh4ny
    @SatyaRao-fh4ny Před 6 měsíci

    These are very helpful videos, thank you! There are still a few concepts that are unclear. You have mentioned that documents are segmented to a list of sentences, and each sentence segmented into a list of tokens. This implies that the list of tokens is empty to begin with, and after tokenization, we end up with a list of tokens(token vocabulary?) specific to the corpus we provide. But later, when you start the tokenization using spaCy, you are loading some db??? What is this doing? Shouldn't spaCy just be a program/tool that has some "advanced rules" to tokenize a document that we provide, and create a new token vocabulary from scratch, and not use it's own db/list created from some unknown corpus as some starting point? And finally, why tokenize a sentence at a time- because a document size can be large? Could it have read in a fixed number of words at a time, say 100 words, and then tokenized them? A "sentence" should have no meaning for the tokenizer, is this right? Actually, how does a tokenizer even "know" when a sentence starts/ends?!? Thanks for any clarifications!

    • @nebvoice
      @nebvoice Před měsícem

      the db you are referring is the statistical model that was trained on some annotated data(forgot the name here). That is the thing that tokenizes the given document or sentences. Spacy is just a module that helps us tokenize our data according to those statistical model. ... All this, I think so. Just a beginner....

  • @SnoozeDog
    @SnoozeDog Před 7 měsíci

    Fantastic sir

  • @haitrieunguyen0203
    @haitrieunguyen0203 Před 7 měsíci

    Thanks for your vids

  • @AradAshrafi
    @AradAshrafi Před 7 měsíci

    What an amazing tutorial. Thank you

  • @anissahli-gl9ud
    @anissahli-gl9ud Před 7 měsíci

    Je viens de France et je viens juste de tomber sur cette superbe playlist qui est pour moi la plus complète sur youtube ! Merci, un grand merci à vous ! C'est difficile de trouver des formations d'une telle qualité.

  • @mostafaadel3452
    @mostafaadel3452 Před 7 měsíci

    can you share the slides? please.

  • @SaikatDas-jd9xd
    @SaikatDas-jd9xd Před 7 měsíci

    Hi there! Loved the series on NLP. Can you please share any link or resource on how to code up the accuracy function like you did with loss? I would like to calculate accuracy of the epochs.

  • @onlysainaa5764
    @onlysainaa5764 Před 8 měsíci

    what is this model accuracy? or bleu score? how to solve it brother?

  • @CC-nz2oc
    @CC-nz2oc Před 8 měsíci

    Hello. Thank you for such a detailed course. I have a question about using pre-trained language models. My language (Azerbaijani) is not yet available in the library. Are you covering this topic further or is it not worth wasting time on learning without this model?

  • @malikrumi1206
    @malikrumi1206 Před 8 měsíci

    Why is there no 'end of entity' tag? I'm sure some might say it's redundant and unnecessary because when you come to the 'o' you are *obviously* at the end of the entity. But it is just as possible that the 'o' is a mistake, especially in a long multi word name. An end tag would be both more explicit and eliminate any ambiguity. But maybe that's just me....