A Tutorial on Semi Supervised Learning

Sdílet
Vložit
  • čas přidán 10. 11. 2020
  • In this video tutorial, we discuss
    1) What is semi-supervised learning its application cases
    2) A simple SVM based strategy for semi-supervised learning
    3) Confidence as a strategy to propagate labels
    Notebook Link:- www.kaggle.com/saptarsi/a-sim...
    Blog Link: - towardsdatascience.com/a-simp...

Komentáře • 21

  • @startrek3779
    @startrek3779 Před 2 lety +1

    Very informative and clear. Thank you for your effort!
    The following are the steps for the self-learning algorithm.
    1. Train a supervised classifier on the labelled data.
    2. Use the resulting classifier to make predictions on the unlabelled data.
    3. Add the most confident of these predictions to the labelled data set.
    4. Re-train the classifier on both the original labelled data and the newly obtained pseudo-labelled data.
    5. Repeat steps 2-4 until no unlabelled data remain.
    There are two hyperparameters to set, the maximum number of iterations and the number of unlabelled examples to add at each iteration.
    One issue of self-learning is if we add many examples with incorrect predictions to the labelled data set, the final classifier may be worse than the classifier only trained on the original labelled data.
    I hope this answer may help someone interested in semi-supervised learning.

  • @awon3
    @awon3 Před rokem +1

    What ist X_train1 and y_train1? you use it but it was never defined

  • @Jack-dx7qb
    @Jack-dx7qb Před 3 lety +1

    Very informative and crystal clear!

  • @subhasreebose8178
    @subhasreebose8178 Před 3 lety +1

    Very clear and vivid explanation Sir.

    • @SaptarsiGoswami
      @SaptarsiGoswami  Před 3 lety

      Thanks a lot Subhasree, the other Semi Supervised techniques are intense, may be some other time. Keep watching and do share with people who you feel will be interested.

  • @LAKXx
    @LAKXx Před 2 lety +1

    was very helpful thank you good sir

  • @val5778
    @val5778 Před 3 lety

    Thank you so much!!! Very nice and clear explanation

    • @SaptarsiGoswami
      @SaptarsiGoswami  Před 3 lety +1

      Thanks a lot, glad it was helpful, please share with your colleagues, friends.

  • @wazibansar1683
    @wazibansar1683 Před 3 lety

    Very nice Sir as always.

  • @arghyakusumdas54
    @arghyakusumdas54 Před 2 lety

    Thanks Sir, for the video which was very easy to understand.
    However I was thinking if the labelled dataset contains sample of 2 classes only(does not contain a sample of a possible 3rd class) and the unlabeled sample contains that specific sample of 3rd class (without the class), then I think the classifier trained on the labeled data cannot properly predict and its confidence for both classes would be low. Can anything or strategy be adopted in this case?

  • @chrisleivon8567
    @chrisleivon8567 Před 2 lety

    why is there 22 in acc = np.empty(22). I mean can we put some lower no instead of 22?
    i am stuck in re-training labelled and pseudo-labelled data

  • @RupshaliDasgupta
    @RupshaliDasgupta Před 2 lety

    Can anyone please provide the link of the datatset

  • @hamidawan687
    @hamidawan687 Před 3 lety +1

    Can you please tell the formula/equation of predict_probability function? Hiow is unlabeled data used in this function?

    • @SaptarsiGoswami
      @SaptarsiGoswami  Před 3 lety +1

      Any classifier gives probability, from the unlabeled data, we can see where we are more confident and then add them to training set. There is a trade-off though, if you add the ones you are super confident they will add similar observations like those present in training set, not too much diversity, on the other hand if you comprise on confidence you can add noise

    • @hamidawan687
      @hamidawan687 Před 3 lety

      @@SaptarsiGoswami Thanks a lot for your response. I am confused what confidence is. Is it a user-defied parameter to find similarity like for example find the majority class of training labeled instances which have similarity measure of confidence (e.g. 75%) or higher?
      Your help wall save my weeks of effort. Thanks in advance.

    • @SaptarsiGoswami
      @SaptarsiGoswami  Před 3 lety

      @@hamidawan687, well confidence is a term, I have used here. Let's say it's a three-class classification problem. Output for one observation is (0.7,0.2,0.1) where indicates class probabilities of the three classes. The maximum here is 0.7 and at a crude level, I can say this is the confidence of the decision. If in another case we get this output as (0.4,0.3,0.3), the max is 0.4. Of course, my confidence level is low over here.

    • @hamidawan687
      @hamidawan687 Před 3 lety

      @@SaptarsiGoswami
      Got your point. By confidence, you mean wha I call it the majority class in the labeled training instances. Thank you very much for your kind positive response. Actually I have been searching for implementation details of pseudo labeling. This is one of a few techniques I know of. One is known as Expected Maximization (EM). But still I'm unable to find what does it exactly mean. But at least your response has clarified my concept about confidence. Thanks again.