How to use SMOTE, Borderline SMOTE, ADASYN to handle class imbalance

Sdílet
Vložit
  • čas přidán 6. 07. 2024
  • We discuss about
    1) The problem of class imbalance
    2) Basic approaches of resampling and augmentation
    3) Synthetic data generation using SMOTE
    4) Issues with SMOTE and solving with Borderline SMOTE
    5) Improvement using ADASYN

Komentáře • 23

  • @sajadms4121
    @sajadms4121 Před 2 lety

    thank you so much for the video but i have a question in adasyn we choose must far instance to have higher chance of being sampled to avoid over fitting ? if yes but what if was a noisy one ?

  • @pallaviroy6631
    @pallaviroy6631 Před 3 lety +1

    This session creates a clear concept on class imbalance... Well explained... Thank You

  • @Theroadrunner2002
    @Theroadrunner2002 Před 3 lety

    Just used SMOTE for a project recently. Nice session and thanks for the share🙏🏻

    • @SaptarsiGoswami
      @SaptarsiGoswami  Před 3 lety

      Thanks, sci kit by default allows to use borderline too

  • @wazibansar1683
    @wazibansar1683 Před 3 lety +1

    Excellent video Sir. It has been systematically presented. The video starts from the discussion about imbalanced class problem to solutions using undersampling, oversampling along with its limitations. After that SMOTE is explained and how it has been improved using Borderline SMOTE. Finally, the concept of ADASYN is presented with an example.
    Based on the video, I have one question- while selecting the nearest neighbours of minority class in SMOTE, do we check which class the neighbours belong to?

    • @SaptarsiGoswami
      @SaptarsiGoswami  Před 3 lety +1

      The neighbors to be taken needs to be from minority observation s

  • @pcooi7811
    @pcooi7811 Před rokem +1

    Thank you sir.

  • @suyashspeaks97
    @suyashspeaks97 Před 2 lety +2

    Great video sir! Thanks a ton. Just had a very fundamental doubt.
    SMOTE algorithm seems to be creating a linearly dependant set of new data points. Shouldn't this mean that these points are adding no new information for the ML algorithm to learn from? Thanks in advance for your response.

    • @SaptarsiGoswami
      @SaptarsiGoswami  Před 2 lety

      Intersting question, it is surely not possible to add completely new information we can only enhance it, smote is trying to just increase some more points often around the boundary

  • @memoonashehzadi9660
    @memoonashehzadi9660 Před 11 měsíci

    In SMOTE, on which bases do we identify a point from minority class, in step 1?

  • @aritrabrahma2117
    @aritrabrahma2117 Před 3 lety +2

    Well explained sir. Just few questions to ask.
    1. How many synthetic points to select between the points of original minority class?
    2. Why point bridge creates problem in smote?

    • @SaptarsiGoswami
      @SaptarsiGoswami  Před 3 lety

      Answer to Question 1 is how much balance you want to create. In the example given if you have 10 such minority points and you want to create 20 synthetic points, you choose two neighbors.
      2. Bridge is a problem because, these points are more similar to majority classes.

  • @solwanmohamed9400
    @solwanmohamed9400 Před rokem

    i need the material

  • @sinan_islam
    @sinan_islam Před rokem

    Did anyone had a case where SMOTE made ML models performance even worse?

  • @debjit08
    @debjit08 Před 3 lety

    Nice session sir❤️, can you share some more information (via link or videos) about adasyn concept ??

    • @SaptarsiGoswami
      @SaptarsiGoswami  Před 3 lety +1

      Yes Debjit, I went there little bit quickly please towardsdatascience.com/class-imbalance-smote-borderline-smote-adasyn-6e36c78d804

  • @rebeenali4317
    @rebeenali4317 Před 3 lety

    regarding to ADASYN: why more samples will be created for the sample that has a high ri? if we create more sample for this then we will kindly reduce the gap between classes and it is risky to create samples in this area because its nearest neighbours are majority class

    • @SaptarsiGoswami
      @SaptarsiGoswami  Před 3 lety

      Hi Rebeen, thanks for watching the video, till the end. You are right, but that's a risk you will have to take. Because if you create samples from a dense region of the minority class, it will not add any knowledge there.