How to Do Data Exploration (step-by-step tutorial on real-life dataset)

Sdílet
Vložit
  • čas přidán 5. 06. 2024
  • 🐼 All you need to know about Pandas in one place! Download my Pandas Cheat Sheet (free) - misraturp.gumroad.com/l/pandascs
    👇Learn how to complete your first real-world data science project
    Hands-on Data Science course
    www.misraturp.com/hods
    In this video we learn how to explore a real-life dataset from NYC using Python and Pandas. We will dive deep into the data and find out potential problems, issues to fix and extract insights from it.
    If you’d like to follow along, find the data here: data.cityofnewyork.us/Environ...
    NYC Open Data - opendata.cityofnewyork.us/
    00:00 Welcome
    00:34 Some notes on data exploration
    01:15 Dataset explanations
    05:26 First look into our dataset
    06:58 Understanding columns
    09:22 Filtering out the unnecessary columns
    11:45 Missing value check
    12:49 Numerical values check
    15:08 Outliers check
    19:57 Categorical values check
    24:57 Explore distribution of binary columns
    26:40 Summary
    👋 Keep in touch?
    ==========================
    🐥 Twitter - / misraturp
    🔗 LinkedIn - / misraturp
    📹 CZcams - / @misraturp
    🌎 Website - misraturp.com/
    Courses & resources
    ============================
    📙 Fundamentals of Deep Learning in 25 pages
    misraturp.gumroad.com/l/fdl
    👩‍💻 Hands-on Data Science: Complete your first portfolio project
    www.misraturp.com/hods
    📥 Streamlit template
    misraturp.gumroad.com/l/stemp
    🤖 Deep Learning 101 with Python and Keras (FREE)
    • 50 Days of Deep Learning
    🏃‍♀️ Data Science Kick-starter mini-course (FREE)
    misraturp.gumroad.com/l/kick-...
    🐼 Pandas cheat sheet (FREE)
    misraturp.gumroad.com/l/pandascs
    📝 NNs hyperparameters cheat sheet (FREE)
    misraturp.gumroad.com/l/hcs
  • Krátké a kreslené filmy

Komentáře • 148

  • @prinjeshshah9128
    @prinjeshshah9128 Před rokem +22

    I was searching for a video where someone explains their reasoning in data exploration from last 15 days. This video has now become bookmark for my future data exploration works. Thank you very much.

    • @misraturp
      @misraturp  Před rokem +2

      That's so nice to hear! Keep up the good work Prinjesh!

  • @santdayalverma8255
    @santdayalverma8255 Před rokem

    Hi Misra, I love your videos. The way you explain topics in a simple manner is really helpful. Thank you so much!

  • @SFW7
    @SFW7 Před rokem +1

    Thank you so much for making this video! I'm just starting out with using python for data analysis and this video is so informative and inspiring. :)

  • @user-up8uc9zn4x
    @user-up8uc9zn4x Před 7 měsíci

    Honestly your the most honest and humble trader on CZcams!!

  • @stephanielynn9808
    @stephanielynn9808 Před rokem +1

    Found you as I start to work on a project for my MS in analytics! This has definitely helped me make better progress than before.

  • @lianyan1412
    @lianyan1412 Před rokem +1

    Really like how you have explained the thought process when comes to data understand and exploration. Learnt a lot!

  • @T4ngerin0
    @T4ngerin0 Před 3 lety

    It‘s a pleasure to watch your Videos. Thank you Misra! 🙏🏽

    • @misraturp
      @misraturp  Před 3 lety +1

      That's very nice to hear, thank you!

  • @swadhikarc7858
    @swadhikarc7858 Před 6 měsíci

    Very useful indeed.. learnt how to think like a data analyst as a beginner

  • @Ol16511
    @Ol16511 Před měsícem

    Wonderful explanation, thank you!

  • @djmd808
    @djmd808 Před rokem +6

    Just wanted to let you know that your videos breathed new life into my Udacity Nanodegree experience. I have really been stalling on the last two classes, but this video and the cleaning one (along with others) were so much easier to watch and understand than the Udacity instruction provided in the course. Thank you!

    • @misraturp
      @misraturp  Před rokem

      Wow, thank you. That's great to hear! Best of luck with your courses. :)

  • @lara-rosetadman3499
    @lara-rosetadman3499 Před 2 lety +19

    Hi Misra, this video was so helpful! I'm starting my master's in Data Science this September and you've honestly been such a role model for me! Thank you and keep up the great work!

    • @misraturp
      @misraturp  Před 2 lety +1

      That's so nice to hear Lara-Rose, thank you! Best of luck in your master's. I'm sure you'll do great!

    • @zeinomadikizela4783
      @zeinomadikizela4783 Před rokem

      Which University

    • @lara-rosetadman3499
      @lara-rosetadman3499 Před rokem

      @@zeinomadikizela4783 University of the West of England (Bristol)

    • @Ha-mb4yy
      @Ha-mb4yy Před rokem

      @@lara-rosetadman3499 im at cdf

  • @martynafay
    @martynafay Před rokem

    thank you for this video! i was very overwhelmed by a dataset that i was looking at and didn't know how to start. great video

  • @siddhantnaik
    @siddhantnaik Před 2 lety

    Very nicely done!
    Looking forward to more data cleaning videos.
    Keep the good work going on.
    Thanks!!

  • @jbruges
    @jbruges Před 2 lety

    this video was very helpful, I'm going to watch all of them

  • @cloudnatives9105
    @cloudnatives9105 Před rokem

    Excellent video.. thanks for sharing! Learned a lot on the data exploration.
    Ms. Misra.. Do you have a video on just plotting data while doing the exploration?

  • @ExtraKanin
    @ExtraKanin Před rokem +2

    Hi Misra. I'm 18 minutes into the video and I'm still able to follow. thank you for this! Most of the articles I see on Google just jump straight into data cleaning and I don't even know how they detected the data errors they're trying to clean in the first place.

  • @abhishekjadhav7340
    @abhishekjadhav7340 Před 2 lety

    Thank you Misra, I personally like your videos a lot. You really teach Good .

  • @massoudkadivar8758
    @massoudkadivar8758 Před rokem

    I love the way you teaching!
    Thank you

  • @swapnildeshpande21
    @swapnildeshpande21 Před 2 lety

    Best explanation and video to kickstart thinking about datasets...want more quality video like this... Keep it up

  • @aureliensimon8685
    @aureliensimon8685 Před 8 měsíci

    Thank you so much for this great video !

  • @gotitgotya
    @gotitgotya Před rokem

    its very informative ... thank you so much for uploading this❣

  • @gunhild1951
    @gunhild1951 Před 2 lety +6

    Great video! Very informative and helpful 😃 I would love if you can upload more videos like this where you go through the steps of exploratory data analysis. I love that you really bring up every little thing of how it could be when doing EDA. Like for example where you talked about the data explanation and how it could be in the real life. Just the thing about documentation of the dataset fields and how it sometimes is not so obvious and that you therefore need to talk to someone responsible etcetera etcetera. This kind of information that you brought up helps me to picture how I could be. Maybe it's silly but a simple thing like helped me big time. Overall, a perfect how-to-do-video. More videos like this would be appreciated! 😄

    • @misraturp
      @misraturp  Před 2 lety

      This is super useful feedback for me Gunhild thank you! Personally, I also really like little details like that that can really give one a feeling for what to expect from a job so nice to hear you thought it was helpful. I will prioritize similar videos in the future!

    • @gunhild1951
      @gunhild1951 Před 2 lety

      ​@@misraturp Hi again,
      Thanks for replying Misra! :) Well of course, I have to show my appreciation when the content is so good. Exactly, it's the feeling and to be able to visualize things that helps the most when you're to trying to familiarize with a new subject matter. Thanks, that would be perfect if you could upload more :)

  • @ramvenkatachalam8153
    @ramvenkatachalam8153 Před rokem

    Wow . Great DataSet . just what i was looking for . I was searching for a video where someone explains all important things on datascience . This video has now become bookmark for my future data exploration works. Thank you very much Misra . ur channel is the best in the world.

    • @misraturp
      @misraturp  Před rokem +2

      Thank you Ram! That's very nice to hear. :)

    • @ramvenkatachalam8153
      @ramvenkatachalam8153 Před rokem +1

      Hi Misra . Plz put more videos like this to help everyone in their career growth . Thanks a lot .

  • @erfanmoosavi9428
    @erfanmoosavi9428 Před rokem

    Thank you so much Mısra! You explain so gooood!

  • @leonbeler2711
    @leonbeler2711 Před 2 lety +1

    Thank you so much for making this video! I am just getting started in the field and your video has given me lots of tips and tricks that would've taken me months to figure out by myself. Also: yay for more women in Data-Science.

    • @misraturp
      @misraturp  Před 2 lety

      Thank you! And I'm glad it was helpful. :)

  • @udaykumar4B3
    @udaykumar4B3 Před rokem

    Thank you so much 🙏 your video helped me a lot, keep doing this🤞

  • @mehdismaeili3743
    @mehdismaeili3743 Před 2 lety

    Hi , you are a good teacher.thanks for your useful videos.

    • @misraturp
      @misraturp  Před 2 lety

      Thank you, that's great to hear!

  • @sergiopellitero4136
    @sergiopellitero4136 Před 10 měsíci

    This is the kind of videos that I am looking for. Let's see if it is interesting.

  • @Labbsatr1
    @Labbsatr1 Před 2 lety

    Çok verimli bi videoydu mısra çok teşekkür ederiz

  • @onurerdogan2236
    @onurerdogan2236 Před 3 lety

    Very useful video.Thanks for sharing

  • @diegomartins7214
    @diegomartins7214 Před 7 měsíci

    Thank you!

  • @felixisnr1
    @felixisnr1 Před 2 lety

    really helpful. big thanks!

    • @misraturp
      @misraturp  Před 2 lety

      You're welcome! Glad it helped. :)

  • @mr.abu-baker8bp460
    @mr.abu-baker8bp460 Před 2 lety

    So much helpful , thankk alot!!

  • @KeithEmsAndrew-pf2pn
    @KeithEmsAndrew-pf2pn Před 3 měsíci

    thanks to your videos

  • @arpangoyal7337
    @arpangoyal7337 Před rokem

    Hi Misra, just came across your channel and absolutely loved this video, so crisp and informative!
    I have just 1 suggestion: it would be so much more helpful if the video was like "Raw" (maybe a separate, longer video which includes the difficulties you came across and how you solved those?). That said, subscribed to your channel and hoping to learn more about Analytics!

    • @misraturp
      @misraturp  Před rokem

      Hey Arpan, thank you for your nice words and also taking the time to give your feedback. :) That actually is a good idea. I might do a blind data exploration with a dataset I've never seen before soon!

  • @Olumasei
    @Olumasei Před 2 lety

    You teach better than my MSc lecturers.
    Thank you

    • @misraturp
      @misraturp  Před 2 lety

      You are very welcome Samuel. That's nice to hear that you like the videos. :)

  • @pimpirisnais
    @pimpirisnais Před 2 lety

    Thanks, very practical

  • @jeevan1409
    @jeevan1409 Před rokem

    You saved my project ❤️

  • @axelrasmussen5365
    @axelrasmussen5365 Před 2 lety

    Great video. Thanks

  • @pliniado
    @pliniado Před 2 lety

    Thanks Misra, I'm Python student (intermediate level, I think) and this video was just what I was lookin for.

  • @dpratte
    @dpratte Před rokem

    Very nice job! In my experience over many years in data warehousing, good luck finding the 'data dictionary' type document you describe. Most orgs don't have the discipline to maintain that. So you'll need to develop relationships with the people who can help you! Thanks Again!

  • @Kristina_Tsoy
    @Kristina_Tsoy Před rokem

    Great video, thank a lot!

  • @arnabroy4870
    @arnabroy4870 Před rokem

    Mam, First And Foremost you are very Beautiful,and Secondly your tutorial is awesome, it gave me a lot of insights about how to do data cleaning..
    Thank You Mam,Lots of Love and Respect from India❤️

  • @arindammaji4685
    @arindammaji4685 Před rokem

    a definite bookmark video for data analysts

  • @andrespino8552
    @andrespino8552 Před 3 lety

    Awesome. Thank you so much :)

  • @harikishan437
    @harikishan437 Před 2 lety

    First of all tq a lot , I took coaching in data science but i never had an xact idea what we need to do in pre-processing and what are things we need to see and what are enough for us.....after watching this i got the clarity on preprocessing , that what should i do in that step. Again tq a lot @Misra Turp 🤝🤝🤝🤝❣❣❣

    • @misraturp
      @misraturp  Před 2 lety

      You are very welcome Hari! I'm glad it was helpful!

  •  Před 9 měsíci

    Hi Misra, thank for your helpful video, you're so good at Data but you may gain some about tree 😄

  • @ziaurrahman626
    @ziaurrahman626 Před 4 měsíci

    Thanks for ur nice explanation. Could u plz share dataset and code?

  • @froylanrodriguez7624
    @froylanrodriguez7624 Před rokem

    I am a year late but this video is GREAT!! Thanks a bunch

  • @hrithiksingh5131
    @hrithiksingh5131 Před 2 lety

    thanks, great informative video. Please make videos on creating portfolio projects using python pandas,matplotlib

    • @misraturp
      @misraturp  Před rokem

      Great suggestion! If you're interested I have a course where I teach how to build a portfolio project: www.soyouwanttobeadatascientist.com/hods

  • @KapitanAliRidho
    @KapitanAliRidho Před 2 lety

    Thanks for the video Misra. Done subs

  • @govindant8360
    @govindant8360 Před 3 lety

    Awesome Mam 👌👌

  • @ziaurrahman626
    @ziaurrahman626 Před 2 měsíci

    Thank u so much. Plz share the code and dataset..

  • @ueto1985
    @ueto1985 Před rokem

    It worked!
    Thank you..

  • @rangabharath4253
    @rangabharath4253 Před 3 lety

    Awesome

  • @aleh3627
    @aleh3627 Před 6 měsíci

    We have veey wide trees in Argentina. Easily over 3 meters in diameter. They are not officially classified as trees, but the trunk is definitely of that diameter. The roots are also huge and tend to extend to the surface. Usually they become playgrounds for kids. I used to play around them as a kid all of the time.

    • @misraturp
      @misraturp  Před 6 měsíci

      That's crazy. What are those trees called?

    • @aleh3627
      @aleh3627 Před 6 měsíci

      @@misraturp we call them "gomero". Not sure about the proper name. If you Google "gomero argentina", you can find them.

  • @chauhermione137
    @chauhermione137 Před 2 lety

    thank you so much! Your video is really helpful

  • @billionairepodcast
    @billionairepodcast Před 2 měsíci

    hi @misraturp, why you said ID is categorical value ? Why you said it's continuous when it's a whole number, a discrete. thank you

  • @peaceandlove8862
    @peaceandlove8862 Před 3 měsíci

    Can you please help how to get the data set thanks

  • @shayp20
    @shayp20 Před rokem

    Hi Misra. Thank you for this video. Is there a chance you can post the code you write in this video?

    • @misraturp
      @misraturp  Před rokem

      I do not have the code I developed in this video yet but I might make a similar video again soon and share the code. Stay tuned!

  • @marypazcuessy3004
    @marypazcuessy3004 Před 15 dny

    tree_census_subset isn't loading after I remove all the unneeded columns, not sure what I'm doing wrong

  • @superawesomecaptainmcfluff9506

    Actually a really funny video. At around 14:39, you said you didn't know what an inch was. I was amused and then I remembered that you were Dutch haha. Great video! Love it! P.S. Imperial units suck!

    • @misraturp
      @misraturp  Před 2 lety +1

      Have to say I agree. Can't really see any reason to use the imperial system. :D

    • @superawesomecaptainmcfluff9506
      @superawesomecaptainmcfluff9506 Před 2 lety

      @@misraturp Oh wow! Having you reply is so cool. I did a project on the NYC Collisions dataset using your Streamlit templates.
      Love it. Thanks for the help! I can't wait to see your channel grow to 100k+ subs especially with DS/ML being a rapidly growing field.
      I have one Q if you don't mind answering: Do you have any tips on making my GitHub profile more attractive to recruiters, and really making sure the projects done properly showcase my skills?

    • @misraturp
      @misraturp  Před 2 lety +1

      Hey ​@@superawesomecaptainmcfluff9506 ,
      Thanks a lot for your support!
      For the github account, I would make sure to include a bunch of things:
      * in the readme mention all libraries, programming language, technologies, ML algorithms you used for that project. recruiters are looking for keywords, give them as many keywords as you can.
      * Have a lessons learned, or future work kind of analysis of your work. Doesn't have to be long. This will serve to show that you are aware of the shortcomings and what can be done better in projects.
      * Make sure you have headings, comments and small notes that structure and explain your code.
      This should be a good start. Just dumping code in github unfortunately doesn't work. Data science is more about understanding and explaining your code than the code itself.
      This question inspired me to write this up a bit longer though. So I will go send my email subscribers an email about this now. :D

    • @superawesomecaptainmcfluff9506
      @superawesomecaptainmcfluff9506 Před 2 lety

      @@misraturp Wow, you've been so incredibly helpful! I follow your newsletter a lot so hope to see your email soon!
      Thanks for all the tips, I especially liked the one about "lessons learned" and improving upon that. Thanks again!

    • @misraturp
      @misraturp  Před 2 lety +1

      @@superawesomecaptainmcfluff9506 You are very welcome! And thank you for the question. I love your username by the way, is it okay if I mention it in the email?

  • @janewade5619
    @janewade5619 Před 3 lety

    Also for outliers, max sidewalk width in NYC is 30 ft (360 inches). So the width of a tree max would be a quarter of that (90 inches)?

    • @misraturp
      @misraturp  Před 3 lety

      Hey, I haven't even thought about including that information, that's a great idea! Kudos!

    • @csanchez9536
      @csanchez9536 Před 2 lety

      Actually the biggest known tree diameter is like 24 meters or something like that. 450 inches is very reasonable

  • @HarishKumar-qt3mr
    @HarishKumar-qt3mr Před rokem

    23:49 to 23 : 51 just listen it really 😍

  • @paragandozdroch3791
    @paragandozdroch3791 Před rokem

    Can someone please let me know , how the search bar drop down on 23:47 min? thank you

  • @rajsonawane8607
    @rajsonawane8607 Před 2 lety

    Hi Misra, Thank you so much for this video. Can you please demonstrate using JSON instead of CSV from the same website?

    • @misraturp
      @misraturp  Před 2 lety

      Honestly, if I were working with JSON files, I would first make them into dataframes and then do the data exploration. So the approach would not change. :)

  • @CaribouDataScience
    @CaribouDataScience Před 2 lety

    What playlist is this video part of?

    • @misraturp
      @misraturp  Před rokem

      Here it is: czcams.com/video/qxpKCBV60U4/video.html

  • @trend758
    @trend758 Před rokem

    How many of you watching this video for mentor in video 😂😂😂 I am watching for her.

  • @nagasai5243
    @nagasai5243 Před 9 měsíci

    Pls tell me how to unnest

  • @yunisahmed4772
    @yunisahmed4772 Před rokem

    a good Teacher and a beautiful girl ! i love you

  • @lubin2764
    @lubin2764 Před 3 lety

    Any idea what is om.datasets ?

    • @misraturp
      @misraturp  Před 2 lety

      Could you elaborate Lubin about what you mean by om.datasets?

  • @misraturp
    @misraturp  Před 3 lety +11

    👉 Get real world data science experience by doing hands-on work
    www.misraturp.com/hods

  • @jpgunlukleri
    @jpgunlukleri Před rokem

    Türkçe videolar yüklediğiniz kanalımız da var mı????

  • @asjadnaeem2557
    @asjadnaeem2557 Před 2 lety

    Hey! Can i get a copy of code?

    • @misraturp
      @misraturp  Před 2 lety

      I don't have it uploaded anywhere unfortunately.

  • @ramsri992
    @ramsri992 Před 8 měsíci

    You look so beautiful mam and nice explanation

  • @travisfubu9053
    @travisfubu9053 Před rokem +1

    I kind like the way you explain thing in the tutorial but I think if you worked on income data or maybe some cancer research data would have been simple I mean trees is kinda not interesting enough to fully engage with the set lol

    • @misraturp
      @misraturp  Před rokem +1

      Fair enough. It's just sometimes a bit tricky to find datasets that allow one to work on it publicly like this. In the latest videos I've been using a dataset on open positions in New York. Maybe that'd be better suited.

    • @travisfubu9053
      @travisfubu9053 Před rokem +1

      @@misraturp yes definitely. Maybe it's just me I'm an economics graduate so a set on trees kind made me like "meh why trees"

    • @misraturp
      @misraturp  Před rokem

      @@travisfubu9053 Hahah alright, next time something a bit more real-life-like for you!

  • @mahammad4051
    @mahammad4051 Před rokem

    türkçe de gelsin

  • @ataires
    @ataires Před rokem

    You are so pretty and so good at explaining the process. Thumbs up!

  • @lysedeborah2179
    @lysedeborah2179 Před 9 měsíci

    You are so beautiful

  • @magicmedia7950
    @magicmedia7950 Před 2 měsíci

    Great video . But she moves too fast.!

  • @dimakapranov7634
    @dimakapranov7634 Před rokem

    en.wikipedia.org/wiki/Sequoiadendron_giganteumac