Feature Selection in Python | Machine Learning Basics | Boston Housing Data

Sdílet
Vložit
  • čas přidán 3. 07. 2024
  • Hey Everyone! I'm a first year machine learning PhD student. My research focuses on recommender systems applications in sports science including case-based reasoning techniques to support marathon runners. I have a CZcams channel about my experience doing a PhD and productivity tips but I also have made a few videos related to my research. This video is about feature selection in Python using an example of a kNN regressor on the Boston Housing Data. There are timestamps in the description as well as the python notebook available for download. I hope this will be of interest to some of you and be sure to subscribe if you would like to see more.
    Previous videos on Machine Learning:
    Supervised vs Unsupervised Learning: • Supervised vs Unsuperv...
    Case-Based Reasoning: • Case-Based Reasoning |...
    kNN: • Introduction to kNN: k...
    Link to downloadable python notebooks: drive.google.com/open?id=10jx...
    Timestamps:
    00:00 - intro
    0:55 - what is feature selection / why
    3:30 - description of the Boston Housing Dataset
    6:00 - recap of the last model
    7:47 - filtering features by variance
    10:20 - filtering features by correlation
    15:40 - feature selection using a wrapper - sequential feature selection
    21:24 - relationships between variables
    Purchase my PhD Student and Productivity Notion Templates store.phdandproductivity.com
    Try the Notion template before you buy: www.notion.so/PhD-Planning-VI...
    Join my email list for regular PhD and Productivity advice and a 10% discount on my Products: www.phdandproductivity.com/
    Support My Channel: If you would like my content and want to support my channel and get access to exclusive content, then join the channel membership here, starting from €1.99 a month: / @phdandproductivity
    Connect with Me
    Instagram: / phdandproductivity
    Twitter: / phdproductivity
    For business enquiries only: ciara@phdandproductivity.com
    Shop my Favourites for Working from Home and Productivity: www.amazon.co.uk/shop/ciaraxf...
    Check out my Startup Daysier
    website:
    Instagram: / daysier.co
    ** Disclaimer ** Some of the links are affiliate links meaning that if you make a purchase using the link, I earn a small commission with no extra charge to you. If you do decide to use one of these links then thank you for you support. ******
    machine learning in python,feature selection,machine learning,what is feature selection,boston housing data eda,correlation analysis and feature selection,data science,what is feature selection in machine learning,feature selection in python,recursive feature elimination,feature selection in machine learning,scikit learn,sequential feature selection,stepwise feature selection,knn regression,stepwise regression,python tutorial,machine learning basics,beginner
  • Jak na to + styl

Komentáře • 71

  • @chantis1794
    @chantis1794 Před 4 lety +5

    Thanks for this video. It is very helpful to me especially for filtering the most important features.

  • @shannonbytelaar2918
    @shannonbytelaar2918 Před rokem +1

    This really is an informative video! Thanks for the concise tutorial on feature selection.

  • @da_ta
    @da_ta Před 3 lety

    Apart low sound great explanation and clear steps thank you for doing this!

  • @zeeshan3703
    @zeeshan3703 Před 3 lety +1

    Great Insights, and now I have SUBSCRIBED, and can't wait to see more from you. Huge Thank You!!!

  • @rafidhaque2722
    @rafidhaque2722 Před 2 lety

    thank you so much for providing the python scripts...your video was very helpful for me

  • @caraf6562
    @caraf6562 Před 2 lety +1

    I skipped this video when it was posted because I wasn't really doing anything related to machine learning, but lately I've been looking more into math modeling and linear regression to find ways it can support my experimental research. Excellent tutorial and I'm excited to try to implement some of these things into my own data sets! Still on the data collection & cleaning phase but making progress :)

    • @PhDandProductivity
      @PhDandProductivity  Před 2 lety

      That's great! It's seeming like more and more no matter what science (or even subject generally) you're involved with machine learning techniques will creep in!

  • @muhammadjamalahmed8664
    @muhammadjamalahmed8664 Před 4 lety +2

    Really helpful.. Thank you..

  • @maythamsaeed533
    @maythamsaeed533 Před rokem

    Thanks very much for this informative post

  • @uzhankocaman7457
    @uzhankocaman7457 Před 3 lety

    very informative. thanks!

  • @arindamjain6892
    @arindamjain6892 Před 3 lety +1

    Loved it. Thanks

  • @musicbeast2079
    @musicbeast2079 Před 3 lety +1

    best video I have ever come across

  • @data_first
    @data_first Před 2 lety

    This video is great!

  • @user-qc3vf9uo9g
    @user-qc3vf9uo9g Před 3 lety +1

    Really helpful, Thanks!

  • @info-dawg
    @info-dawg Před 3 lety +1

    Amazing! Thanks

  • @dashsingh30095
    @dashsingh30095 Před 3 lety

    Doing great job

  • @zuhramajid
    @zuhramajid Před 3 lety +1

    thank you so much😍 pleaseee make more machine learning videos

  • @hussainsalih3520
    @hussainsalih3520 Před 2 lety

    amazing keep it up

  • @narothamreddy9310
    @narothamreddy9310 Před 2 lety +4

    I have a question should we remove outliers from data before feature selection process or not?

  • @xmine9077
    @xmine9077 Před 2 lety +1

    Thanks a lot for this

  • @simon-4530
    @simon-4530 Před 4 lety +1

    Great video 😊 thanks

  • @riccardosecci2637
    @riccardosecci2637 Před 4 lety +1

    Thank you for this videos :)

    • @PhDandProductivity
      @PhDandProductivity  Před 4 lety +1

      Riccardo Secci glad you like it!

    • @riccardosecci2637
      @riccardosecci2637 Před 4 lety +1

      @@PhDandProductivity I am soon to be PhD student in a field similar to yours (Bioinformatics) 😊 so I'm trying to deepen my understanding of Machine learning! Your videos are very clear and explanatory

  • @chanellioos
    @chanellioos Před 2 lety

    This 🔥 Kira

  • @markgoh8302
    @markgoh8302 Před 3 lety

    Thanks for the tutorial. I am starting out in ML. I have a question. I notice in the video you used r2 to score your feature selection. When are you concerned about in the process that you over fitted the model?

  • @ajaysingh001
    @ajaysingh001 Před 4 lety +2

    thanks its really nice

  • @saeedhassiny7454
    @saeedhassiny7454 Před 3 lety

    please how i can doing feature selection on 1D feature vector that represent the features extracted from image

  • @harshithbangera7905
    @harshithbangera7905 Před 3 lety +2

    Thanks 👍...

  • @alifiaz7792
    @alifiaz7792 Před 3 lety +1

    Well Explained

  • @finnzhang1323
    @finnzhang1323 Před 3 lety

    what should we use instead of R_squared in classification problems

  • @lionelshaneyfelt5607
    @lionelshaneyfelt5607 Před 3 lety +1

    you're awesome tank u

  • @mengshuangfu8206
    @mengshuangfu8206 Před 3 lety +6

    Thanks for the detailed explanation! I have some questions, I'm running a unsupervised learning study. what if I have 200+ features, and most of them don't seem to have correlation with each other, how do I reduce features? And what's the best way to determine how many clusters I should choose?

    • @adeshinajohn3988
      @adeshinajohn3988 Před 2 lety

      Hi, Fu. Im sure you would have solved your challenge. can you state what you did? thanks

  • @xiaodongyang4841
    @xiaodongyang4841 Před rokem

    Hi, your video is beneficial for my study. But I have a problem. Do your methods apply to financial data analysis? For instance, financial market predictions.

  • @abhinavvivek3259
    @abhinavvivek3259 Před 2 lety +2

    I really appreciate this tutorial. Its really very helpful. However, I do have one quick question. Why haven't you use the train_test_split function to split the dataset to predict Y on the test dataset?

    • @PhDandProductivity
      @PhDandProductivity  Před 2 lety +4

      Thanks! Because I used cross validation instead which splits into training and test several times then averages the error over the runs.

  • @m.randayandika3779
    @m.randayandika3779 Před 3 lety

    Thanks for the video! Its. really good. But i have problem with you code in wrapper section because i have different dataset. Can you help me to solve my problem?
    Thanks a lot

  • @vigneshpadmanabhan
    @vigneshpadmanabhan Před 5 měsíci

    We could look into data transformations as a way to enhance the model right…. I mean applying the right transformations to each column, we have the potential to maximize their effectiveness right??

  • @manishbolbanda9872
    @manishbolbanda9872 Před 4 lety +2

    EDA ,Feature Engg and Feature selection are these things irrespective of Machine learning Algo to be used??
    any answer would be appreciated. Thanks.

    • @PhDandProductivity
      @PhDandProductivity  Před 4 lety +1

      manish bolbanda it can depends which procedure you use because different features will work better in different learning algorithms. So often wrapper feature selection is preferred over filtering when you are using a particular algorithm.

  • @bouabdellahtahar5126
    @bouabdellahtahar5126 Před 3 lety

    how to select features using swarm in python

  • @dataaholic
    @dataaholic Před 4 lety +1

    Really nice video. Loved it. Can you make videos on implementing Kernels and Gaussian Processes in python?
    Thanks in advance

    • @PhDandProductivity
      @PhDandProductivity  Před 4 lety +2

      Shubham Shakya unfortunately that’s not something that I’m knowledgable about.

    • @hiruki8
      @hiruki8 Před 2 lety

      @@PhDandProductivity is it something you've since become more knowledgable about? 👀

  • @siddharthchauhan8285
    @siddharthchauhan8285 Před 3 lety

    will it be helpful to join this community for someone who is new / beginner to machine learning ?

  • @voshark7586
    @voshark7586 Před 2 lety

    Sorry . Can you tell me the roles of Feature Selection, please???
    Thank you so much.

  • @saikiranalagatham3555
    @saikiranalagatham3555 Před 3 lety +1

    Can we select features based on gain ratio or information gain? Like a tree based approach

  • @pranavipatel9259
    @pranavipatel9259 Před 2 lety

    try to put codes in the webpage and put the link of it in discriptions

  • @andisupriadichan5188
    @andisupriadichan5188 Před 4 lety

    Hi kira ,thank you for your video, i' chan ,but can i ask some question about u research phd, i'm still little bit confusing about novelty in phd research, i'm concern in data mining and ML. can u tell me or make a video about this ?

    • @PhDandProductivity
      @PhDandProductivity  Před 4 lety +1

      Andi Supriadi Chan hi thanks for commenting. I’m not really sure what you are asking.

  • @akashprabhakar6353
    @akashprabhakar6353 Před 3 lety +1

    Thanks for this video...I hv several doubts..
    1.Shouldn't we exclude features based on their correlation with other features and not the dependent variable??
    2.I am working on a similar housing dataset with 61 features...so will this forward and backward wrapper methods work? How will i find the best combination of features to select as u did here as the dataset has less no. of features.
    3.Could u pls explain that pipeline code line with StandardScalar() and Kneighbors....is it automatically standardizing the numerical features and then implementing KNN?
    4. My dataset has many categorical variables and after getting dummy variables, no. of features have gone up. So, how do I select using forward or backward selection as selecting levels of categories will not make any sense...so how do I select important categorical features?
    Your reply is highly appreciated :)

    • @PhDandProductivity
      @PhDandProductivity  Před 3 lety +3

      Hi Akash, sorry for the delay.
      1. So the point of removing variables that have low correlation with the dependent variable is that they are variables that do not predict the dependent variable and so won't be useful to the model. The risk of only looking at correlation with other variables is that you could remove a potentially useful variable because of its correlation with a useless variable.
      2. It works the same with larger number of features.
      3. The pipeline uses cross validation with a StandardScaler so first it will divide the data into 5 splits of training and test data, train a scaler on the training data and apply to the test data, then train the Knn to the training data and test on the test data.
      4. I would recommend with categorical features, if some feature levels are not useful then group some together e.g. levels 1-3, 4, 5-9
      hope that helps.

    • @akashprabhakar6353
      @akashprabhakar6353 Před 3 lety

      @@PhDandProductivity thank you very much for this detailed reply

    • @tommy626
      @tommy626 Před 3 lety

      @@PhDandProductivity thanks for your video and reply! just a quick follow-up on this question, so for those variables that have low correlation with the dependent variable, conclusion is keep or remove them? thx!

  • @akshitmiglani5419
    @akshitmiglani5419 Před 2 lety

    Hi there,
    Thank you for sharing your knowledge. I have a question:
    When we check the correlation, should we not check it only with continuous variables? I'm not sure what "correlation" tells us with categorical variables.
    Doubt(in a broad sense) : Should we not first check if our variables are continuous/categorical and then choose the methods?
    For example : t-test/correlation for continuous and chi-sq test for categorical variables after doing label encoding.
    This has always bothered me. Looking forward to find out where the gap in my understanding is. Thank you!

  • @XX-vu5jo
    @XX-vu5jo Před 4 lety

    PhD but doing basic stuff lol

    • @cgmiguel
      @cgmiguel Před 3 lety +9

      That’s why she’s teaching. These are not PhD related.