🚀 Data Cleaning/Data Preprocessing Before Building a Model - A Comprehensive Guide

SdĂ­let
VloĹľit
  • ÄŤas pĹ™idán 14. 11. 2023
  • Welcome to Learn_with_Ankith! đź“Š In this tutorial, we'll delve into the crucial steps of data preprocessing to ensure your datasets are in prime condition before feeding them into your machine learning models. A clean and well-prepared dataset is the foundation for accurate and reliable model predictions.
    Data_set link: www.kaggle.com/datasets/kumar...
    đź“Ś Topics Covered:
    🚀 Data Cleaning/Data Preprocessing Before Building a Model - A Comprehensive Guide
    Import Necessary Libraries: Learn the essential libraries required for efficient data manipulation and analysis.
    Read File: Understand how to import data from various sources and formats into your Python environment.
    Sanity Check:
    Identify and handle missing values effectively.
    Explore the dataset's shape, information, and spot duplicates.
    Conduct a garbage check to maintain data integrity.
    Exploratory Data Analysis (EDA):
    Dive into descriptive statistics for a deeper understanding of your data.
    Visualize data distributions with histograms and box plots.
    Uncover patterns and relationships with scatter plots and correlation heatmaps.
    Missing Value Treatment:
    Implement strategies using mode, median, and KNNImputer to handle missing data.
    Outlier Treatment:
    Explore methods to detect and deal with outliers that can impact model performance.
    Encoding of Data:
    Convert categorical variables into a format suitable for machine learning algorithms.
    🔧 Whether you're a beginner or seasoned data scientist, mastering these preprocessing techniques is fundamental for building robust and accurate machine learning models..#DataPreprocessing, #DataCleaning, #MachineLearning, #DataScience, #DataAnalysis, #PythonProgramming, #Tutorial, #ExploratoryDataAnalysis, #OutlierDetection, #MissingValueTreatment, #DataVisualization, #Programming, #DataManipulation, #CodingTips, #FeatureEngineering, #DataQuality, #Pandas, #NumPy, #Matplotlib, #Seaborn, #DataInsights, #TechTutorial, #DataEngineering, #MachineLearningModels, #AIProgramming, #DataAnalytics, #DataWrangling, #TechEducation, #PythonTips, #Statistics, #DataSkills, #ProgrammingLife, #Algorithm, #TechTalk, #CodingCommunity, #DataPrep, #CodeNewbie, #DataQualityCheck, #LearnDataScience, #ProgrammingJourney

Komentáře • 45

  • @gloomyday4524
    @gloomyday4524 PĹ™ed 3 mÄ›sĂ­ci +12

    you dont know how much this video help clueless students like me, you did such a good thing bro, i hope everything will always goes easy in your life!

  • @bombasticiti
    @bombasticiti PĹ™ed 7 mÄ›sĂ­ci +1

    Nice, Thank you for feeding my mind!🙂

  • @yasink18
    @yasink18 PĹ™ed mÄ›sĂ­cem +2

    Thank you so much for making simple video ..
    Can you make more video on just handling different outliers type and how to understand only what type of outliers we need to handle or ignore

  • @vrishabhbhonde6899
    @vrishabhbhonde6899 PĹ™ed 2 mÄ›sĂ­ci +2

    Thanks a lot sir. Very helpful and very clear steps

  • @percidaman4409
    @percidaman4409 PĹ™ed 2 mÄ›sĂ­ci +1

    Thanks man this was so great, you really helped me

  • @kiruthickagp
    @kiruthickagp PĹ™ed 7 mÄ›sĂ­ci +3

    Very clearly explained

  • @AmahaGebretsadikan
    @AmahaGebretsadikan PĹ™ed 4 mÄ›sĂ­ci +1

    I like it the organisation and contents of the presentation

  • @mitchellyula4447
    @mitchellyula4447 PĹ™ed 10 dny

    Thank you for this walkthrough. This will help me on my next project for school.

  • @anurag17091977
    @anurag17091977 PĹ™ed 2 mÄ›sĂ­ci

    stupendous video. keep it up bro.

  • @onlyguitars
    @onlyguitars PĹ™ed 7 mÄ›sĂ­ci

    Hi! Great video, very helpful and love how each step is clearly outlined! Just a question. In the outliers why change the value to the UW and LW, and not just drop those rows? Thank you!

  • @melissameeker3189
    @melissameeker3189 PĹ™ed 25 dny

    Thank you so much you helped me understand

  • @nabinbk1065
    @nabinbk1065 PĹ™ed 2 mÄ›sĂ­ci

    thank you sir. you are great

  • @hiteshsharma8368
    @hiteshsharma8368 PĹ™ed 2 mÄ›sĂ­ci

    Nice vedio thanks brother ❤

  • @alfredturkson1319
    @alfredturkson1319 PĹ™ed mÄ›sĂ­cem +1

    How did you set up your jupyter notebook? the settings to make mine look like yours please

  • @Akash-us3mo
    @Akash-us3mo PĹ™ed 3 mÄ›sĂ­ci +1

    Thankyou

  • @Balaji-wb7cp
    @Balaji-wb7cp PĹ™ed 2 mÄ›sĂ­ci

    Superb bro

  • @rekhamalik3663
    @rekhamalik3663 PĹ™ed 7 mÄ›sĂ­ci

    Amazing!
    Can you please make video with complex json files i.e stock market data?

  • @bhaskarmondal7461
    @bhaskarmondal7461 PĹ™ed 8 mÄ›sĂ­ci +1

    Thank you so much Sir,
    For providing this particular Kind of tutorial!, which is specifically targeted for Machine Learning rather than Data Analysis. Also, I was looking for something just like this for last few days

    • @learnwithankit383
      @learnwithankit383  PĹ™ed 8 mÄ›sĂ­ci +2

      "Great to hear that you found the tutorial helpful! "

    • @bhaskarmondal7461
      @bhaskarmondal7461 PĹ™ed 8 mÄ›sĂ­ci

      Again, Thank you for your efforts :) @@learnwithankit383

  • @AB51002
    @AB51002 PĹ™ed 8 mÄ›sĂ­ci +4

    Could you also make a video exploring and cleaning text data? Something like what LLMs train on, but obviously much smaller. Something like 1GB of text perhaps. I can't find any online resources targeting that specifically, and it could help many people learn how to better filter text dataset for higher quality datasets. Thank you in advance!

  • @yasinimudy8688
    @yasinimudy8688 PĹ™ed 3 mÄ›sĂ­ci

    Nice video, however I would like if ".fit_transform" method of KNNImputer does not cause data leakage when applied to fill null values.

  • @raghavendraraodk7855
    @raghavendraraodk7855 PĹ™ed 2 mÄ›sĂ­ci

    Sooper

  • @mohitjoshi8984
    @mohitjoshi8984 PĹ™ed 7 mÄ›sĂ­ci

    Hello
    Help in correlation part it showing NaN and 0.0
    Please help

  • @maskedvillainai
    @maskedvillainai PĹ™ed 4 mÄ›sĂ­ci +1

    You can skip literally every step here by uploading your data to hugging face and opening the auto train data viewer tool that’s auto generated for you. It includes the answers to all of these problems already with no code or time spent making it a task you don’t need to be focused on

  • @gayathrikrishnamoorty4243
    @gayathrikrishnamoorty4243 PĹ™ed 2 mÄ›sĂ­ci

    what will we do if we find duplicates in dataset??

  • @muhammadsamir2243
    @muhammadsamir2243 PĹ™ed mÄ›sĂ­cem

    Please share the notebook link

  • @cryptofile4002
    @cryptofile4002 PĹ™ed 27 dny

    @Learn with Ankith can you pls offer the code for this?

  • @amanagrawal1976
    @amanagrawal1976 PĹ™ed mÄ›sĂ­cem +1

    Pls provide jupyter notebook code

  • @ayushjaiswal350
    @ayushjaiswal350 PĹ™ed 21 dnem

    okay video

  • @bhushansonawane5915
    @bhushansonawane5915 PĹ™ed mÄ›sĂ­cem

    Hello sir, how can i connect with you ? Need urgent help please

  • @devanshupatnaik_video6387
    @devanshupatnaik_video6387 PĹ™ed mÄ›sĂ­cem

    Is this is data cleaning method??

  • @user-yk9zr4ud5q
    @user-yk9zr4ud5q PĹ™ed mÄ›sĂ­cem

    Normalization?

  • @iizrael
    @iizrael PĹ™ed 2 mÄ›sĂ­ci

    Please how can I install pandas and the rest to my notebook because mine is showing me error if I try importing as you did yours

    • @learnwithankit383
      @learnwithankit383  PĹ™ed 2 mÄ›sĂ­ci

      Try to execute : !pip install pandas in Jupyter Notebook.

  • @nguyenthiyenhuong2344
    @nguyenthiyenhuong2344 PĹ™ed 3 mÄ›sĂ­ci

    where is Normalization? pls

  • @davidprayogo3944
    @davidprayogo3944 PĹ™ed 6 mÄ›sĂ­ci

    adding code script to next time, please

  • @user-pu7ye8lu3c
    @user-pu7ye8lu3c PĹ™ed 3 mÄ›sĂ­ci +1

    WORTH VARMA WORTH

  • @lilaclove1709
    @lilaclove1709 PĹ™ed 2 mÄ›sĂ­ci

    🙂

  • @prabhatkumar-0145
    @prabhatkumar-0145 PĹ™ed 8 mÄ›sĂ­ci

    provide a csv file also

    • @learnwithankit383
      @learnwithankit383  PĹ™ed 8 mÄ›sĂ­ci +1

      www.kaggle.com/datasets/kumarajarshi/life-expectancy-who

  • @mayfield7835
    @mayfield7835 PĹ™ed dnem

    700th like

  • @bevg1
    @bevg1 PĹ™ed 7 mÄ›sĂ­ci

    slow down a bit...