Part 2 | Python | Training Word Embeddings | Word2Vec |

Sdílet
Vložit
  • čas přidán 27. 07. 2024
  • In this video, we will about training word embeddings by writing a python code. So we will write a python code to train word embeddings. To train word embeddings, we need to solve a fake problem. This problem is something that we do not care about. What we care about is the weights that are obtained after training the model. These weights are extracted and they act as word embeddings.
    This is part 2/2 for training word embeddings. In part 1 we understood the theory behind training word embeddings. In this part, we will code the same in python.
    ➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖
    📕 Complete Code: github.com/Coding-Lane/Traini...
    ➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖
    Timestamps:
    0:00 Intro
    2:13 Loading Data
    3:25 Removing stop words and tokenizing
    5:11 Creating Bigrams
    7:37 Creating Vocabulary
    9:29 One-hot Encoding
    14:41 Model
    19:35 Checking results
    21:57 Useful Tips
    ➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖
    Follow my entire playlist on Recurrent Neural Network (RNN) :
    📕 RNN Playlist: • What is Recurrent Neur...
    ➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖
    ✔ CNN Playlist: • What is CNN in deep le...
    ✔ Complete Neural Network: • How Neural Networks wo...
    ✔ Complete Logistic Regression Playlist: • Logistic Regression Ma...
    ✔ Complete Linear Regression Playlist: • What is Linear Regress...
    ➖➖➖➖➖➖➖➖➖➖➖➖➖➖➖
    If you want to ride on the Lane of Machine Learning, then Subscribe ▶ to my channel here: / @codinglane

Komentáře • 39

  • @sampathcse16
    @sampathcse16 Před 21 dnem

    Super good explanation. Very indepth insights given. Thank you for taking time and explaining it

  • @ribamarsantarosa4465
    @ribamarsantarosa4465 Před 11 měsíci +2

    When I see 13k subscribers; 2.5k views and only 85 likes I see why good programmers and good data scientists are rare. You can't get a more concise hands on presentation that teaches you how to create a language model.

    • @CodingLane
      @CodingLane  Před 11 měsíci +1

      Hehe… really appreciate your comment 🥺

  • @karunesharora3302
    @karunesharora3302 Před rokem +2

    good and understandable explanation. It would be great if you may upload a video wherein character level embeddings are also learnt where we may split a word in characters- and do the process.

  • @mustafatuncer4780
    @mustafatuncer4780 Před 11 měsíci

    I really liked. Wonderful tutorial.

  • @matousmacak7467
    @matousmacak7467 Před 2 měsíci

    Thank you for the video as well as the whole code 😊 You could set a random seed to obtain always the same randomly initialized weights.

  • @fersilvil
    @fersilvil Před rokem

    Thank you for your teaching

  • @Kapilwankhede22
    @Kapilwankhede22 Před 2 měsíci +1

    Thanks jay, the whole playlist is awesome.. Thank you so much for the creating these wonderful videos and educating us...

  • @freedmoresidume
    @freedmoresidume Před 2 lety +2

    Am happy that I found this channel. You have a gift of teaching.

    • @CodingLane
      @CodingLane  Před 2 lety

      Thank you! It means a lot to me 😇

    • @akshaysaxena7920
      @akshaysaxena7920 Před 11 měsíci

      ​@@CodingLaneHello, I found your video very informative and clear fundamentals. Just a quick question, what if I have a corpus of total 37million sentences and 150 thousand unique words. How do I train my model without going into complexity of having 150thousand nodes in input for one_hot encoding?

  • @NkembehBenjaminSCC
    @NkembehBenjaminSCC Před měsícem

    thanks i really enjoy the playlist

  • @vigneshmathivanan6052
    @vigneshmathivanan6052 Před 2 lety +1

    Very Informative. Thanks for doing this.

  • @abcdedcba561
    @abcdedcba561 Před měsícem

    When creating the bigrams (for each of two adjacent words), the order should matter. But why you insert all the possible combinations in the bigrams list?
    I think the order of the words as it appear in the corpus is important to capture the relationship between adjacent words.

  • @Movies-iz5em
    @Movies-iz5em Před 9 měsíci

    wonderful explanation, please attach your linkedin profile in YT about section

  • @DHAtEnclaveForensics
    @DHAtEnclaveForensics Před 4 měsíci +1

    unique_words = set(filtered_data.flat)

  • @user-hi3en2gj3q
    @user-hi3en2gj3q Před 2 lety +1

    thanks , that was very useful

  • @pavanparvathanenii4471
    @pavanparvathanenii4471 Před rokem +1

    man you are doing really well. you stopped uploading videos on machine learning algorithms. when will u resume bro.

    • @CodingLane
      @CodingLane  Před rokem

      Hi Pavan, sorry for not uploading videos. I will upload the next video very soon in the upcoming few days. But might take some time to upload more other videos. Thanks for supporting the channel. And I am glad you find my channel useful 🙂

  • @ernestmodise4953
    @ernestmodise4953 Před měsícem

    Hi Jay "Don't laugh at me, I have always been programing of a external compiler (Java / Matlab) - After launching your github template, the jupyter blank code is not in edit mode, what must I do

  • @alantan3004
    @alantan3004 Před 11 měsíci

    How do you make a prediction after u trained your own model ?

  • @michaelvangulik85
    @michaelvangulik85 Před 2 lety +1

    Hello, sir, and same to you. Thank you for this video, I show you the same. India is now Bhaarat Aatmanirbhar and have popular searching engines looking for talent to engage! Indian is in need of popular searching engines and any help is good.

  • @aliathar891
    @aliathar891 Před rokem

    hello bro, when we apply cosine similarity function?

  • @lilitkharatyan2372
    @lilitkharatyan2372 Před 5 měsíci

    Is this feasible for a very large size vocabulary?

  • @ameybikram5781
    @ameybikram5781 Před rokem

    King is appearing as a target word twice so two vectors will be created for king ? And then we take average????

  • @MrJdude39
    @MrJdude39 Před 2 měsíci

    You never stated what Python packages are needed to do the creation of the bigrams, the tokenization, and the removal of stopwords. Are you using NLTK?

    • @matousmacak7467
      @matousmacak7467 Před 2 měsíci +1

      He coded his own function. See the whole code in Jupiter notebook provided in the description

  • @AnkitGupta-fm7pd
    @AnkitGupta-fm7pd Před rokem

    You are using wrong weights for plotting. weights[0] refers to the weights of the first layer, weights[1] refers to the biases of the first layer, weights[2] refers to the weights of the second layer, and weights[3] refers to the biases of the second layer. You can confirm this by running the following code:
    for layer in model.layers:
    print(layer.get_config(), layer.get_weights())
    Alternatively, you can also use the following where the index [1] refers to the second layer (layer index starts from 0) and the second index [0] refers to weights.
    weights = model.layers[1].get_weights()[0]

  • @roopeshroope2026
    @roopeshroope2026 Před 2 lety +1

    is this video word2vec from scratch

  • @alidakhil3554
    @alidakhil3554 Před 2 lety +1

    Can you please give a private mentor support? Paid service

    • @CodingLane
      @CodingLane  Před 2 lety

      Hi Ali, I would love to… but currently I am jamm packed in my schedule, so won’t have time to give private mentorship.

    • @alidakhil3554
      @alidakhil3554 Před 2 lety +1

      @@CodingLane it is just one our, questions about multivariate linear regression.

    • @CodingLane
      @CodingLane  Před 2 lety

      @@alidakhil3554 Alright, can you msg me on whatsapp/mail?

    • @alidakhil3554
      @alidakhil3554 Před 2 lety

      @@CodingLane could you please share your contact info? Thanks