Imbalanced Data in Machine Learning | Undersampling | Oversampling | SMOTE
Vložit
- čas přidán 27. 06. 2024
- Imbalanced data refers to datasets where the distribution of classes is heavily skewed, with one class significantly outnumbering the others. Dealing with imbalanced data is crucial as it can lead to biased models that perform poorly on minority classes. Addressing Class Imbalance with Undersampling, Oversampling, SMOTE, and Ensemble Methods. Imbalanced datasets pose challenges for machine learning models, but techniques like undersampling (reducing majority class samples), oversampling (increasing minority class samples), SMOTE (Synthetic Minority Over-sampling Technique), and ensemble methods (combining multiple models) help mitigate bias and improve predictive performance on minority classes.
Code - colab.research.google.com/dri...
============================
Did you like my teaching style?
Check my affordable mentorship program at : learnwith.campusx.in
DSMP FAQ: docs.google.com/document/d/1O...
============================
📱 Grow with us:
CampusX' LinkedIn: / campusx-official
CampusX on Instagram for daily tips: / campusx.official
My LinkedIn: / nitish-singh-03412789
Discord: / discord
E-mail us at support@campusx.in
✨ Hashtags✨
#Datascience #Machinelearning #Imbalanceddata #CampusX
⌚Time Stamps⌚
00:00 - Intro
00:54 - What is Imbalanced Data?
04:10 - Problems with Imbalanced Data
08:00 - Imbalanced Data Demo
11:13 - Why studying imbalanced data is important?
16:58 - Undersampling
25:56 - Oversampling
31:06 - SMOTE
42:43 - Ensemble Learning
47:06 - Cost Sensitive Learning
51:30 - Other techniques
I had to reupload this video because I forgot to include the part on ensemble techniques due to an editing error in the previous upload. Check timestamps.
Sir, please make a video on the difference between encoding and embedding
Sir please make a video on AB testing
I am writing to request your assistance in creating videos that delve into metaheuristic approaches, such as genetic algorithms, ant colony optimization, and others. It has come to my attention that there is a noticeable scarcity of resources covering these topics on platforms like CZcams.
Sir make video about AB testing
Thank you sir for the best series on CZcams, I just completed it in 2 months by watching 4 hr daily at 1.5x speed
Another fantastic video by Nitish! Wonderful!!!
THANK you so much Nitish 😊u are the best in everything.🎉 Thanks for being my teacher 😊🙏
I understood everything sir
Thank you so much
You are the best
Awesome Content
Thankyou so much for this video, very helpful sir 🤌
Thank you very much sir
very helpful video
Thanks Sir
Hi Sir,
Big Fan!!
I was searching for class imbalance video and you have uploaded it on right time.
I am training an ANN model for customer churn prediction where my dataset has class imbalance issues 96:4. I have used Upsampling, Downsampling, SMOTE, SMOTE-ENN, Class Weight but neither of them gave promising results and fail to predict well on minority class the recall value is very low. What should be done in such case where the model is not predicting well on minority class. I have also trained XGBoost classifier but that model also did not perform well.
please continue your llm transformers series.and also please upload nlp ner and topic modeling
Sir pls do a session on cross validation.... There's no sperate video on cross validation in the ml playlist
Thanks
Sir, I truly admire your work and love all of your videos, learning so much from them. Thank you!!!
I have one question: at the end of the video you said that in spam filtering false positive is the critical one but if one msg is spam and classified as not spam(false negative) that will be the critical case isn't it? false negatives are generally considered to be more dangerous in this case because they can expose the recipient to potential harm.
I think false positive is more critical because it may send your important mail in spam which is more harmful rather than showing some spam mails as important mail.
Sir Also make video on multi label classification problem.
Sir, when will you start a new batch for DSMP?
Nitishi Sir please update your Machine Learning Roadmap and add links of your new videos (We want more and more videos of yours)
I am writing to request your assistance in creating videos that delve into metaheuristic approaches, such as genetic algorithms, ant colony optimization, and others. It has come to my attention that there is a noticeable scarcity of resources covering these topics on platforms like CZcams.
Please make a new video on transformers 🙏
Dear Nitish sir, plz make video on how to fine tune our custom data using LLama llm.
Nitish :- At 7:00 It will be "Testing data" for determining the accuracy. Am I correct ?
Sir please do some working on MLOps playlist
Hello sir, how can i connect with you ? Need urgent help please
Is this series complete or anything remaining sirm
🖤
Playlist ke end tak aate aate aisa lag rha ki aap jawan se budhe ho gye.
it's better
Nitesh bhai aapka knowledge perfect hai but video itne long hote h ki chahke bhi pura nahi dekh pate.. please try to make video in short way🙏🤝👍
Sab LLM ki bat Kar Rahe hai app Machine learning par ruke hai
Bhai LLM ke bhi videos cover kar Rahe hai nitesh sir. To us, these concepts are still gold and they are used everywhere.
@@abhinavkale4632 bhai sir ke sare video mere laptop me hai all total video LLM ka history padhe hai abhi tak
From an interviewer's perspective, an imbalanced dataset is a common topic in interviews. Focusing on simple topics can increase your chances of success in cracking the interview.
Sir app bahut peeche hai