How To Handle Missing Values in Categorical Features

Sdílet
Vložit
  • čas přidán 18. 08. 2019
  • Hello All here is a video which provides the detailed explanation about how we can handle the missing values in categorical values
    You can buy my book on Finance with Machine Learning and Deep Learning from the below url
    amazon url: www.amazon.in/...
    Buy the Best book of Machine Learning, Deep Learning with python sklearn and tensorflow from below
    amazon url:
    www.amazon.in/...
    Connect with me here:
    Twitter: / krishnaik06
    Facebook: / krishnaik06
    instagram: / krishnaik06
    Subscribe my unboxing Channel
    / @krishnaikhindi
    Below are the various playlist created on ML,Data Science and Deep Learning. Please subscribe and support the channel. Happy Learning!
    Deep Learning Playlist: • Tutorial 1- Introducti...
    Data Science Projects playlist: • Generative Adversarial...
    NLP playlist: • Natural Language Proce...
    Statistics Playlist: • Population vs Sample i...
    Feature Engineering playlist: • Feature Engineering in...
    Computer Vision playlist: • OpenCV Installation | ...
    Data Science Interview Question playlist: • Complete Life Cycle of...
    You can buy my book on Finance with Machine Learning and Deep Learning from the below url
    amazon url: www.amazon.in/...
    🙏🙏🙏🙏🙏🙏🙏🙏
    YOU JUST NEED TO DO
    3 THINGS to support my channel
    LIKE
    SHARE
    &
    SUBSCRIBE
    TO MY CZcams CHANNEL

Komentáře • 117

  • @soumikchakraborty90
    @soumikchakraborty90 Před 5 lety +14

    You are just awesome bro. Please make a video on AIC, AUC, ROC curve.

  • @pallabsaha4098
    @pallabsaha4098 Před 5 lety +30

    Very well explained. If you could show the same on a dataset and code that would be very helpful. Thank you sir for your videos. Love them all.

  • @gabrielburgos2533
    @gabrielburgos2533 Před rokem +1

    You are the MVP, when no one has the answer, you do.

  • @aksontv
    @aksontv Před 4 lety +1

    Finally got right man to learn data science and ML. Thank you sir!

  • @duvanmartinez8586
    @duvanmartinez8586 Před 5 lety +4

    Great work, you're awesome, you're the best youtuber I've found.

  • @mohitupadhayay1439
    @mohitupadhayay1439 Před 2 lety

    This was such an amazing life saver. I didn't even knew I had this question and the video just popped up.
    Didn't find this tutorial anywhere else.

  • @shivambhayre5056
    @shivambhayre5056 Před 5 lety +2

    I have no words to say just a thanks🙏

  • @AmitYadav-ig8yt
    @AmitYadav-ig8yt Před 4 lety +15

    Sir, U took data set which has a missing value in just one column. You told about Predicting missing value my using other columns as Training set. Let's say we have a data set in which every columns have some missing values..In such case which columns should be use to predict missing values?

  • @doop9134
    @doop9134 Před rokem

    I was stuck for days trying to figure out how to predict missing data using ML. This helped me understand so so so much better! 😍 Thank you so much!! 🙏💚

  • @abinashkumarsinha8958
    @abinashkumarsinha8958 Před 2 lety

    This helped me a lot in my project work. Very useful and very well explained.

  • @abdulhakeem4715
    @abdulhakeem4715 Před 2 měsíci

    clean explaination

  • @mohiuddinshojib2647
    @mohiuddinshojib2647 Před rokem

    that is really informative

  • @divyaharshad9985
    @divyaharshad9985 Před 6 měsíci +1

    For technique 3 will it lead to multicollinearity in the data?

  • @tumul1474
    @tumul1474 Před 4 lety +1

    thank you sir ! amazing video as always

  • @Nursin-rg1ey
    @Nursin-rg1ey Před rokem

    thanks very much sir

  • @hv3300
    @hv3300 Před 5 lety +1

    Excellent video, as usual.

  • @amedyasar9468
    @amedyasar9468 Před 3 lety

    it was quite short explaination and nice points to undersdtand.
    Tanks!

  • @AmitYadav-ig8yt
    @AmitYadav-ig8yt Před 4 lety +1

    One more question- in some data set we find columns with many categories like Cars name column will have many cars name..In such case if we use this Unsupervised technique to create clusters, Won't it be too many clusters ?

  • @out_aloud
    @out_aloud Před 3 lety +2

    Hello sir, maybe I am here too late but I still hope that you would acknowledge this question as it might be of immense value. I have a disputed question which basically revolves around knn imputer, scaling and the concept of data leakage. As the knn imputer works on the principles same as knn algo, it does share the pros and cons of knn algo, right. So wont it be better to simply scale the data first ? Also, in case I am separating out the train and test data in order to avoid data leakage, should I split the data and then scale, impute ? Or should I impute and then split,scale it ? In case I split first...which is the most common preference which stats should I use for the user input. And lastly how should I handle the label encoded columns if any ? Nobody is discussing on this when it is one of the most imp problems a person would likely face. Can you please make a video on this ?

  • @lukaszmichalak9985
    @lukaszmichalak9985 Před 4 lety +3

    Don't you increase correlation between features with those methods? If so - what that will bring to the output model - to the prediction?

  • @sandyjust
    @sandyjust Před 4 lety +2

    Great explanation of the concept. With unsupervised technique we might be in situation that both male and female falls under group 2. Then what would our approach?

    • @kaustabhmandal7483
      @kaustabhmandal7483 Před 4 lety

      I have also observed that in this video. You can put the the category with
      max frequency in that cluster.

  • @andyjackson4563
    @andyjackson4563 Před 2 lety

    Thanks for explaining these methods

  • @Geethu_Mohan_DA
    @Geethu_Mohan_DA Před rokem

    Easy to understand. Thank you

  • @ankurbanerji6605
    @ankurbanerji6605 Před 3 lety +1

    Great explanation sir! Can you explain how to handle the missing values for multiple columns in a dataset

  • @saurabhpathare4157
    @saurabhpathare4157 Před 3 lety +1

    I am always reluctant to delete or use mode for categorical values. This video explains a lot. Good approach! In technique 3, which classifier do you recommend for best efficiency?

  • @Saikrishna-lx9it
    @Saikrishna-lx9it Před 5 lety +2

    Hi bro can you make one end to end chatbot video using rasa nlu, which is useful for all who are interested in nlp.

  • @hindajjouri9151
    @hindajjouri9151 Před 7 měsíci

    thank you

  • @thatguyadarsh
    @thatguyadarsh Před 3 lety

    Amazing !! Use ML model to predict the NaN values.. That is clever sir.

  • @MegaJaivardhan
    @MegaJaivardhan Před 5 lety +1

    love you bro.. could you make a video AUC and ROC curve?

  • @sandeepnallala48
    @sandeepnallala48 Před 2 lety

    doing a great work Krish. thanks a lot. Loved your Videos : )

  • @keshavbansal5148
    @keshavbansal5148 Před 4 lety

    started this playlist today, loving it

  • @AutitsicDysexlia
    @AutitsicDysexlia Před 3 lety

    This is what I did in DAX, but I did it in a more complex way... because I was using DAX. But it's effectively a RandomForest method that I used.

  • @Susa270
    @Susa270 Před 2 lety

    Hello
    @ Krish Naik
    Hope you are doing well 🙂
    First of all would like to thank you for such knowledgable videos. Most of the times your videos are really beam of hope.
    Can you please let me know where can I check the actual coding for the above mentioned concepts. It is a little difficult to get it in live scenario.
    Please guide, a humble request.

  • @shivambhayre5056
    @shivambhayre5056 Před 5 lety +3

    If it is in quantitative variables we can replace missing value by mean

    • @AmitYadav-ig8yt
      @AmitYadav-ig8yt Před 4 lety +1

      Is it a question?, If yes, Then Yep You can take mean to replace Quantitative missing values

  • @aronpollner
    @aronpollner Před rokem

    Is there a Multivariate Imputer implementation for categorical values like a class from sklearn?

  • @anandacharya9919
    @anandacharya9919 Před 5 lety

    Thank you for this video. Please also make video how to handle missing value and Outlier in continues variables.

  • @AmitYadav-ig8yt
    @AmitYadav-ig8yt Před 4 lety +2

    Sir, Can we get code for Create a classifier algorithm method for Missing value?

  • @shaileshsahu9551
    @shaileshsahu9551 Před 4 lety

    Please add a video in the Data Science and ML playlist of how to create our own predictor or estimator classifier algorithm to predict both categorical and continuous variables.

  • @madunishant6052
    @madunishant6052 Před 5 lety +1

    Thanks! 😊

  • @anuragmishra6262
    @anuragmishra6262 Před 4 lety +1

    Can you please show practical implementation of the same.
    Thanks 😊

  • @user-vy4jo3lt2v
    @user-vy4jo3lt2v Před 10 měsíci

    If we want to apply classifier algorithm on multiple columns then its possible ?

  • @pankajkar2008
    @pankajkar2008 Před 5 lety +1

    pure concepts

  • @fahimekheradmand5880
    @fahimekheradmand5880 Před 5 lety

    Excellent, Thank you

  • @tahamansoor599
    @tahamansoor599 Před 4 lety

    its great it would be better if u show us a hands on the dataset

  • @madhurchaudhary5109
    @madhurchaudhary5109 Před 3 lety

    Hi Krish, This is well explained!! I have an ID column which has unique value but for some records, ID is null how I can handle this type of data.

  • @itsmoolya
    @itsmoolya Před 4 lety

    This is a good explanation!

  • @muzamilshah8028
    @muzamilshah8028 Před 4 lety

    lets consider i want to predict value for f1 & row 2 as you have mention but what if we have also missing value in f2,f3 but not in same row ..what will we do in that scenario ????

  • @ashwinkrishnan4285
    @ashwinkrishnan4285 Před 4 lety

    If we apply classifier algorithm to predict the Gender feature if it is male or female through other features including output feature as well, in training dataset and get the missing values of gender feature (Test dataset), and then finally when we go for the model to predict the classification of output hope it would be influenced or the data leakage would have happened as we considered that to fill missing column values?
    Please clarify on this point Krish..

    • @chirathabey7729
      @chirathabey7729 Před 3 lety +1

      It won't as much because even though we are training including the output feature, it only used for predicting the missing samples ONLY. Considering the fact that there is much less missing samples as compared to rest of the samples. If the missing samples are considerably high and have in many other features then it will certainly create a bias on the final prediction.

  • @chirumadderla8129
    @chirumadderla8129 Před 2 lety

    If there are several missing values in the solar radiation data during the night times and early morning hours how to handle them .The dataset I considered is of one year

  • @preetnandeshwar5331
    @preetnandeshwar5331 Před 3 lety

    which missing catgorial method suit for which data set and why?or we just have to use it like HIT AND TRIAL METHOD?
    Plz anyone help me .I am begineer

  • @ZUBINABRAHAM
    @ZUBINABRAHAM Před 3 lety

    Thanks for the video it was informative.
    Can we use KNN?

  • @CheeseKransky12
    @CheeseKransky12 Před 4 lety

    Thanks krish

  • @napoleonx5259
    @napoleonx5259 Před rokem

    كفو كريشنا ❤

  • @mitultank7872
    @mitultank7872 Před 2 lety

    If I have the missing values in numerical column, and I want to fill that based on other categorical variable column . Then how can I handle that?

  • @amitjajoo9510
    @amitjajoo9510 Před 4 lety

    sir thanks for making feature engineering playlist.

  • @daniellazarolazaro1033

    Thank you so much, this video actually helps a lot when you just got started like me hahahha, as I was saying, thank you so much for this great great great work!!!

  • @raghavkumar8333
    @raghavkumar8333 Před 4 lety

    Sir, I have a student attrition dataset where I need to predict the reasons for student dropping out in 2nd year who got admission in 1st year. An year consist of 2 terms and I have grades of student (a,b,c,d) in 6 different courses in 1st and 2nd terms now most of these grade columns of 6 different courses in 2nd term are missing. Intuitive I think it could be a reason for dropping out. My question is
    1) Should I impute missing values in this case because it is possible that it is not missing those students already dropped out. So, should I create dummy variables
    2) If I impute missing value what technique should I use to impute those missing categorical variables

  • @jaiminshah143
    @jaiminshah143 Před 3 lety

    How to handle missing(NaN) values in column having binary data values i.e Just 0 or 1 ?

  • @ommehta4501
    @ommehta4501 Před 2 lety

    If we have date categorical feature and have some missing values, please tell me how to do with this

  • @sadikbilal5149
    @sadikbilal5149 Před 2 lety

    Nice , plz u have code to implement that techniques?

  • @RK-un6ou
    @RK-un6ou Před 3 lety

    Why do we fill NaN values with mean or median? And why does it won't effect the dataset
    Can you explain a bit in this?

  • @theoutlet9300
    @theoutlet9300 Před 3 lety

    since we are using output to predict our feature and then feature to predict our output, wouldnt it cause problems in prediction?

  • @nasiksami2351
    @nasiksami2351 Před 3 lety

    Amazing!

  • @Raja-tt4ll
    @Raja-tt4ll Před 4 lety

    very nice video

  • @ashokpalivela311
    @ashokpalivela311 Před 4 lety

    thank you😍

  • @ele_wings7521
    @ele_wings7521 Před 4 lety

    thank you sir...

  • @sandipansarkar9211
    @sandipansarkar9211 Před 2 lety

    finished watching

  • @VikasSharma-ye7pu
    @VikasSharma-ye7pu Před 4 lety

    Hi krish ... Pls make video on in explaining 2 kaggle competition projects ...

  • @AmitYadav-ig8yt
    @AmitYadav-ig8yt Před 4 lety +1

    Just a request...May you please upload codes for this also..-, I saw in many videos codes are missing for techniques..it will be very helpful if you provide us code. Thanks a lot

  • @shaikhkashif9973
    @shaikhkashif9973 Před rokem

    Sir pehle outliers fill yah null values fill karna chahiye ols answer

  • @aditya_baser
    @aditya_baser Před 4 lety

    Here, you only had one categorical column. What if you have multiple categorical columns, how do you go about with the missing value treatment in that case?

  • @RajaKumar-ne9bt
    @RajaKumar-ne9bt Před 2 lety

    Why we are skipping the output when doing clustering?

  • @1a17890
    @1a17890 Před 5 měsíci

    Sirji can you kindly show how it's done

  • @sachinborgave8094
    @sachinborgave8094 Před 5 lety

    Hello sir...
    Please make a video that how to fill missing categories using logistic regression...

  • @abhipraydumka8587
    @abhipraydumka8587 Před 4 lety

    Can you tell me how to assign a unique cateogry lets say U(undefined ) to missing cateogrical data

  • @Analystmind
    @Analystmind Před rokem

    What if my model's missing values are not categorically it's number

  • @janinajochim1843
    @janinajochim1843 Před 4 lety

    Thank you for the video!
    Would you happen to know what to do in cases where the value is"Missing by design".
    I have a case where I am using the variable "Father's reaction to pregnancy" -- it has missing values for participants who did not know the father of the child because they didn't get this question :/

    • @sawradipsaha5377
      @sawradipsaha5377 Před 4 lety

      May be you can consider that as a different catagory.

  • @RAJI11000
    @RAJI11000 Před 4 lety

    Sir how can impute if feature value like 100 mbps

  • @chandrasekarank8583
    @chandrasekarank8583 Před 4 lety

    Sir what if i can label encode the data then i can do a simple imputer which will replace the nan values by the mean or median as i wanted.
    Sir please tell me whether this is a way to do

  • @sriraj8392
    @sriraj8392 Před 2 lety

    sir will u teach offline classes ...?

  • @bismeetsingh352
    @bismeetsingh352 Před 4 lety

    What do you do when you have missing values in textual data?

  • @analistaremoto
    @analistaremoto Před 3 lety

    Niiiiiice!

  • @chinmaybhat9636
    @chinmaybhat9636 Před 4 lety

    Can you Share the Same thing by taking one dataset and showcase the same

  • @RishikeshGangaDarshan
    @RishikeshGangaDarshan Před 3 lety

    How to handel in regression oroblem

  • @Justme-dk7vm
    @Justme-dk7vm Před 3 měsíci

    Sir why do you have the same voice as my college chairman? 😩💓

  • @sachinborgave8094
    @sachinborgave8094 Před 5 lety

    Excellent Sir, can you please provide a python source code i.e. how to fill missing category data using logistics reg

  • @AmitYadav-ig8yt
    @AmitYadav-ig8yt Před 4 lety

    You said to Create a classifier to predict the missing values. What to do if we have Linear regression problem and Missing values there?, Should we create classifier for that too? Please response

    • @chirathabey7729
      @chirathabey7729 Před 3 lety

      Yes, if you are trying to predict the missing value which belongs to a Categorical variable. Because when you are predicting missing value, your output variable will be the missing value variable and rest of the variables will become the input variables. You can think of you are trying to solve an entirely independent problem.

  • @cutyoopsmoments2800
    @cutyoopsmoments2800 Před 5 lety +4

    Bro I want to make my career in Machine Learning. Kindly guide...

  • @192Kiran
    @192Kiran Před 4 lety

    Krish . could please do with datasets

  • @clivefernandes5435
    @clivefernandes5435 Před 4 lety

    Is method 3 widely used ? Never heard of it

  • @kumarraju2923
    @kumarraju2923 Před 4 lety

    How the initial clusters are selected for missing values

  • @akshayvilayatkar7985
    @akshayvilayatkar7985 Před 4 lety

    How we can handle alphanumeric missing values in dataset. I can not got out of this problem ,Please help krish

  • @dineshkumar-kc7vt
    @dineshkumar-kc7vt Před 4 lety

    im unable to overcome this problem. I have initially done is get_dummies for the Dataset and i want to handle the missing values but i'm getting error so as TypeError: '(slice(None, None, None), slice(0, 2, None))' is an invalid key
    Please Help Me

    • @chirathabey7729
      @chirathabey7729 Před 3 lety

      Before you apply One-Hot-Encoding, do the missing value treatment first

  • @archanapereira1333
    @archanapereira1333 Před 4 lety

    How to identify dependent n independent variables in a dataset ?

    • @chirathabey7729
      @chirathabey7729 Před 3 lety +1

      It depends on the problem description. It describes what the problem is. So, your output variable / dependent variable will give the answers to your problem. Rest of the features will become your independent variables

  • @nhprml6324
    @nhprml6324 Před 4 lety

    we can replace missing values with corresponding feature's mean value.

  • @junaidlatif2881
    @junaidlatif2881 Před rokem

    But how to apply!

  • @jaypatil4786
    @jaypatil4786 Před 4 lety

    I have one easy question ...but I not remember it now please tell me to view how many missing values in dataset

  • @vasusharma1773
    @vasusharma1773 Před 4 lety

    sir if you could just show this in a code, it will be very helpful

  • @arjyabasu1311
    @arjyabasu1311 Před 4 lety

    Sir please upload the implementation of these methods !!

    • @harshtiwari8765
      @harshtiwari8765 Před 4 lety

      can u send me the notes for feature enginerring which was given by Krish naik ?
      Help is appreciated

  • @martinlyuba5105
    @martinlyuba5105 Před 11 měsíci

    Great tutorila. your email please