Video není dostupné.
Omlouváme se.

Handling Categorical Data in Machine Learning: Easy Explanation for Data Science Interviews

Sdílet
Vložit
  • čas přidán 26. 07. 2024
  • Handling categorical data in machine learning projects is a very common topic in data science interviews. In this video, I’ll cover the difference between treating a variable as a dummy variable vs. a non-dummy variable, how you can deal with categorical features when the number of levels is very large, and the pros and cons of various strategies.
    Feature hashing
    en.wikipedia.org/wiki/Feature...
    🟢Get all my free data science interview resources
    www.emmading.com/resources
    🟡 Product Case Interview Cheatsheet www.emmading.com/product-case...
    🟠 Statistics Interview Cheatsheet www.emmading.com/statistics-i...
    🟣 Behavioral Interview Cheatsheet www.emmading.com/behavioral-i...
    🔵 Data Science Resume Checklist www.emmading.com/data-science...
    ✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: www.emmading.com/coaching
    // Comment
    Got any questions? Something to add?
    Write a comment below to chat.
    // Let's connect on LinkedIn:
    / emmading001
    ====================
    Contents of this video:
    ====================
    00:00 Introduction
    00:48 Categorical Data
    02:22 Ordinal Features & Class Labels
    03:38 One-Hot Encoding
    05:32 Dummy Encoding
    06:30 Problems of One-Hot & Dummy Encoding
    07:26 Feature Hashing

Komentáře • 16

  • @emma_ding
    @emma_ding  Před rokem +1

    Many of you have asked me to share my presentation notes, and now… I have them for you! Download all the PDFs of my Notion pages at www.emmading.com/get-all-my-free-resources. Enjoy!

  • @linghaoyi
    @linghaoyi Před rokem

    Thank you. Merry Christmas and Happy New Year!

  • @qingxiawang161
    @qingxiawang161 Před rokem

    Hi, Emma, thank you very much for the informative video, I really learned a lot from it! Keep up the good work❤

  • @junlizhou7167
    @junlizhou7167 Před rokem

    Thanks for the informative video Emma! Love the Notion notes you created

    • @emma_ding
      @emma_ding  Před rokem +1

      So glad you enjoyed it! Thank you for watching. 😊

  • @hsuya3925
    @hsuya3925 Před rokem +3

    Hi Emma, very informative video. Thanks for working on all these types of videos and sharing with us. Wanted to know is your notion page public? or can you share if possible.

    • @Doctor_monk
      @Doctor_monk Před rokem

      I have been waitiing for these as well. :)

    • @emma_ding
      @emma_ding  Před rokem

      Of course! I'm working on getting all notes organized and sharable in one location, will let you know as soon as they are ready! :)

    • @emma_ding
      @emma_ding  Před rokem

      @sukumargv @hsuya3925 Here you go! You can now download all the PDFs of my Notion pages at www.emmading.com/get-all-my-free-resources. Enjoy!

  • @nitishjambhurkar7990
    @nitishjambhurkar7990 Před rokem

    Hi Emma, thank you soo much for this insight. Addition to this i also want to know how to handle large datasets like very large datasets because i was asked in an interview but i was unable to answer it correctly. So wanted to know from you how to handle very huge datasets and how to load ? what steps you would take to load these datasets. If you can make one video on this topic that would be great.

  • @jet3111
    @jet3111 Před rokem

    Hi Emma, thank you for the very informative video. It would be great to discuss embedding methods for handling categorical data.

    • @emma_ding
      @emma_ding  Před rokem

      Great suggestion! I've added it to my list of content ideas. 😊 Thanks for watching!

  • @rakeshkumarsharma2250
    @rakeshkumarsharma2250 Před rokem +1

    How I convert pincode /postal code

  • @saudiorchestra6443
    @saudiorchestra6443 Před 11 měsíci

    How do we deal with a category that appears for the first time in the test data? For examples, I the training data I have a column for the jobs. The training data contains these jobs:
    Doctor, Nurse, Lab technician, Administrator
    I used one hot encoding for the job column. What if the test data has an additional job Surgeon? How do we handle this situation?

  • @sruthimallarapu7662
    @sruthimallarapu7662 Před rokem +1

    Hi Emma, Can decision trees handle string categorical values (For example "gender" column takes "M" or "F"). Is it not necessary to convert the strings to numericals?

    • @georgezevallos
      @georgezevallos Před 7 měsíci

      All ML algorithms require to convert the strings into numerical values. Even NLP does it. Hope it helps.