Complete Python Pandas Data Science Tutorial! (Reading CSV/Excel files, Sorting, Filtering, Groupby)

Sdílet
Vložit
  • čas přidán 15. 05. 2024
  • Practice your Python Pandas data science skills with problems on StrataScratch!
    stratascratch.com/?via=keith
    Data & code used in this Tutorial: github.com/KeithGalli/pandas
    Python Pandas Documentation: pandas.pydata.org/pandas-docs/...
    Let me know if you have any questions!
    In this video we walk through many of the fundamental concepts to use the Python Pandas Data Science Library. We start off by installing pandas and loading in an example csv. We then look at different ways to read the data. Read a column, rows, specific cell, etc. Also ways to read data based on conditioning. We then move into some more advanced ways to sort & filter data. We look at making conditional changes to our data. We also start doing aggregate stats using the groupby function. We finished the video talking about how you would work with a very large dataset (many gigabytes)
    I realized as I upload this video there are some additional things I want to talk about in a later video. The first thing that comes to mind immediately is using the apply() function on a dataframe to alter the data using a custom or lambda function. If you have questions on this or anything else before I get around to making a part 2, feel free to write me a note in the comments.
    If you enjoyed this video, be sure to throw it a like and make sure to subscribe to not miss any future videos!
    Thanks for watching friends! Happy coding! :)
    Join the Python Army to get access to perks!
    CZcams - / @keithgalli
    Patreon - / keithgalli
    ---------------------------------------------
    Follow me on social media!
    Instagram | / keithgalli
    Twitter | / keithgalli
    ---------------------------------------------
    Link to original source of data from Kaggle: www.kaggle.com/abcsds/pokemon
    ---------------------------------------------
    Video Outline!
    0:00 - Why Pandas?
    1:46 - Installing Pandas
    2:03 - Getting the data used in this video
    3:50 - Loading the data into Pandas (CSVs, Excel, TXTs, etc.)
    8:49 - Reading Data (Getting Rows, Columns, Cells, Headers, etc.)
    13:10 - Iterate through each Row
    14:11 - Getting rows based on a specific condition
    15:47 - High Level description of your data (min, max, mean, std dev, etc.)
    16:24 - Sorting Values (Alphabetically, Numerically)
    18:19 - Making Changes to the DataFrame
    18:56 - Adding a column
    21:22 - Deleting a column
    22:14 - Summing Multiple Columns to Create new Column.
    24:14 - Rearranging columns
    28:06 - Saving our Data (CSV, Excel, TXT, etc.)
    31:47 - Filtering Data (based on multiple conditions)
    35:40 - Reset Index
    37:41 - Regex Filtering (filter based on textual patterns)
    43:08 - Conditional Changes
    47:57 - Aggregate Statistics using Groupby (Sum, Mean, Counting)
    54:53 - Working with large amounts of data (setting chunksize)
    -------------------------
    If you are curious to learn how I make my tutorials, check out this video: • How to Make a High Qua...
    *I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.

Komentáře • 2K

  • @KeithGalli
    @KeithGalli  Před 3 lety +271

    Hey ya'll! I created a second channel with more Python content (including additional Pandas tips & tricks).
    Please consider subscribing 😊
    czcams.com/users/techtrekbykeithgalli

    • @sam7250ii
      @sam7250ii Před 3 lety +2

      You cleverly edited the code between 25:50 to 25:59 list(df.columns.values) to list(df.columns)😉👍

    • @Vribejs
      @Vribejs Před 2 lety +1

      Error:Cannot mask with non-boolean array containing NA / NaN values - gives me error when usinf df.loc (on 40:49 in video)?
      df.loc[df['Our Global Company'].str.contains('Smith', regex=True)]: this is code, I imported another .xlsx table when practising.

    • @yidizhou9899
      @yidizhou9899 Před 2 lety +5

      @@Vribejs go google it... you can't expect him to do it for you. He checked the documentation just to give us a good overview of pandas.... google out your error if not you will not learn.

    • @chiraggupta1897
      @chiraggupta1897 Před 2 lety

      i have been working on a excelworkbook having 8 worksheet and i m performing operations on data nd want to place dataframe in the 6 sheet in place of its data .but everytime i do all other sheets gets vanished nd a single gets get formed with the dataframe .plzz help me in appending df into an existing excel

    • @benten5018
      @benten5018 Před 2 lety +2

      Hey Keith , can can please help me to download the csv.file on an android tablet.
      sorry for bad english.

  • @jcspaziano
    @jcspaziano Před 3 měsíci +61

    I know this is 5 years old but I learned more about using Pandas from this one video than all the other videos ive watched on the topic combined! Just awesome! Thank you!

    • @KeithGalli
      @KeithGalli  Před 3 měsíci +2

      Glad that it is still helpful!!

  • @_Nelyen
    @_Nelyen Před 6 měsíci +78

    This video was super helpful, thank you Keith!
    In case anyone gets to the end of this video, around 48:00, Keith talks about the groupby operator and starts to go over the section "Aggregate Statistic using Groupby (Sum, Mean Counting)". You might run into errors due something that changed after Pandas version 2.0.0.
    Instead of writing: df.groupby(["Type 1"]).mean()
    Try writing: df.groupby(["Type 1"]).mean(numeric_only=True)
    After version 2.0.0 the numeric_only value was changed to False versus True as it's default, causing errors such as "can not convert strings". Hope this is helpful, have a good one!

  • @RisingLoaf
    @RisingLoaf Před rokem +241

    This 1 hour video did more for me than entire semester of my Data Analysis course... Amazing

  • @Orion3000k
    @Orion3000k Před 3 lety +56

    Mannnn your one of the best Python go-tos PERIOD. Straight to the point and easy to understand. thanks for teaching us all!

  • @LureUnitFtw
    @LureUnitFtw Před 5 lety +89

    One of the best tutorial that I've ever seen in CZcams! Thumbs UP!

  • @nimaonta1725
    @nimaonta1725 Před 3 lety +49

    Dude you deserved all the subs for this video alone. You explained everything so good. keep it up :)

  • @brandongarza1366
    @brandongarza1366 Před 2 lety +3

    I haven't started this yet, but based on your previous videos I know this is going to be great. Thanks Keith, you are a great teacher.

  • @not_proton
    @not_proton Před 3 lety +33

    Wasted an hour watching a completely useless video on pandas, didn't understand a thing......
    Then found this pure gold of a video, it really helped me a lot. Why didn't I click it earlier............

    • @KeithGalli
      @KeithGalli  Před 3 lety +16

      lol you had me in the first half 😂

    • @KeithGalli
      @KeithGalli  Před 3 lety +8

      glad it helped!

    • @not_proton
      @not_proton Před 3 lety +4

      @@KeithGalli yeah, really nice job explaining it
      Currently watching the other pandas video (real life problems)

  • @klauscheang7063
    @klauscheang7063 Před 4 lety +105

    Excellent!! I like the way you organize the videos on different topics and functions of working with data. Please make more videos on how to work data science in Python. E.g. Statistical analysis (descriptive statistics, t-test, linear regression) or data processing tutorial (like what we do in SQL).

  • @nikluz3807
    @nikluz3807 Před 4 lety +11

    this is an excellent tutorial, especially the filtering/conditional changes section. I have always loved how google sheets has built in queries, and I wanted to be able to do a lot of the same things using pandas. This essentially gave me all of the power I needed! thanks!

  • @dicspringdkz8234
    @dicspringdkz8234 Před 2 lety +20

    Keith
    You are more than a teacher. Your level of simplicity in explaining Python in details is out of the moon. Keep up the good work. Your video is always my “go to” any time.
    Again, thanks a lot for using your skills as a blessing to people around the world.

  • @DennisGorshteyn
    @DennisGorshteyn Před 3 lety +9

    You break down all the details in a way that I can't believe this is for free. Very high quality stuff. I was up and running with this library in short order

  • @shayonghoshroy7208
    @shayonghoshroy7208 Před 4 lety +349

    Best Pandas tutorial on CZcams, especially 24:25

  • @bharathianjeneya2111
    @bharathianjeneya2111 Před 3 lety +6

    On point Keith. 5 hrs worth training covered in an hour. Made my day.

  • @nikithroumpari2553
    @nikithroumpari2553 Před 2 lety +75

    A strugling biologist here thanks you! We are mostly dealing with big data and it can get a little overwhelming, but you made it a lot easier!

  • @cindyshaw2485
    @cindyshaw2485 Před 3 lety +6

    Thank you, Keith, for making this super helpful tutorial. You're a great teacher!

  • @pivo6499
    @pivo6499 Před 5 lety +593

    I can't believe I watched this for free, thank you so much!

    • @johnwiley1221
      @johnwiley1221 Před 4 lety +3

      This was pretty good. I would also check udemy or r/learnpython for other free resources. Found a 30 hour FREE pandas course there the other day

    • @johnwiley1221
      @johnwiley1221 Před 4 lety

      www.udemy.com/course/the-ultimate-pandas-bootcamp-advanced-python-data-analysis/?couponCode=FF041817B54B4BC9EB6B

    • @quartercast
      @quartercast Před 3 lety +8

      @@johnwiley1221 It's not free now, unfortunately :(

    • @musclemusic123
      @musclemusic123 Před 3 lety

      ki

    • @shambhav9534
      @shambhav9534 Před 3 lety +3

      The documentation is also free.

  • @nutrathriveyoutube7056
    @nutrathriveyoutube7056 Před 5 lety +29

    This is an amazing tutorial! Please keep publishing like this. very well explained!
    I would love to see about matplotlib, numpy and if you can get inside machine learning

  • @ProdMGD
    @ProdMGD Před 2 lety +3

    Great video to get people up and running. It took me two hours to watch, take notes, and test out some examples. I feel like this was time very well spent. Thank you for this.

  • @piotr5830
    @piotr5830 Před 10 měsíci +15

    Hi Keith - not sure you will read this but wanted to sincerely thank you for this tutorial. 3 years ago this was the first python video I ever watched after graduating from unrelated subject. Today I'm typing this from a business class lounge at JFK, on my way to London where I just got a job as a quant developer at a hedge fund, building pricing models and infra for trading. Worked hard for this but if not for your videos I could be at a very different place. Thank you from the bottom of my heart, your work means a lot to many people. Cheers!

  • @orfeaspapaioannou2755
    @orfeaspapaioannou2755 Před 4 lety +30

    dude this is an amazing introduction to pandas. Really helpful, thanks a lot

  • @bensondube5646
    @bensondube5646 Před 5 lety +8

    Excellent Tutorial Keith. Very clear, at the right speed and interesting to learn from. This material is very suitable for a self learner. Keep it up.

  • @RockIT1
    @RockIT1 Před 3 lety +22

    I like the way he interacts with his viewers

  • @kanstantsinhupalau6337
    @kanstantsinhupalau6337 Před rokem +1

    Saved my day! I started learning Pandas, but when I missed several months during circumstances and this video about basics helped me quick comeback. Thank you!

  • @MiguelMusic123
    @MiguelMusic123 Před 4 lety +11

    This video helped my massively! Been learning through online python courses with people trying to act and saying unnatural jokes, but your video felt super natural and easy to watch. Many thanks!

  • @amiliavachford183
    @amiliavachford183 Před 5 měsíci +4

    thanks for useful video
    If anybody have a problem with calculating the mean of Type 1 grouped data, use this:
    df= pd.read_csv('modified.csv')
    df.groupby(['Type 1']).mean(numeric_only=True)
    instand of this:
    df= pd.read_csv('modified.csv')
    df.groupby(['Type 1']).mean()
    That way, it won't include string-type data in the mean and sum functions.

    • @vissokis
      @vissokis Před 3 měsíci

      thanks it helped a lot...can't understand the error while all the values are numreic already

    • @llamaland1737
      @llamaland1737 Před měsícem

      so is it got updated now, since you can only perform the method on int or float columns ...

  • @faizalimuhammadzoda4731
    @faizalimuhammadzoda4731 Před 2 lety +45

    There is something to the way Keith teaches that keeps me coming back.
    Besides being a good teacher and utilizing techniques which help people grasp the material quickly and remember for long time, he sends forth a wave of positivism. He is such a positive, energetic person.
    Thanks for sharing your knowledge. May it grow and enable you to bless more people with it.

  • @zacharyyarost5804
    @zacharyyarost5804 Před 3 lety

    This is such high effort content. I was amazed that you actually went back and sped up the video where you said you would. 11/10 great tutorial. Thanks!

  • @takakosuzuki2514
    @takakosuzuki2514 Před 4 lety +3

    Been looking for a complete tutorial on Pandas. This is amazing! Thank you.

  • @viveknayak9899
    @viveknayak9899 Před 4 lety +7

    Comprehensive, perfectly paced.... Lovely tutorial!

  • @adedokunagunbiade5324
    @adedokunagunbiade5324 Před rokem +2

    I watched the entire video in 30 minutes and learned more than I did with hours of video content. Amazing work.

  • @Chuukwudi
    @Chuukwudi Před 3 lety

    From the bottom of my heart, Thank you very much. May you never lack. May the elements, forces, and the entire Creation align itself for your own good.

  • @bijoysaraf650
    @bijoysaraf650 Před 4 lety +5

    Very simple yet comprehensive tutorial on Pandas. You had my attention throughout. I do use Pandas for data analytics along with numpy. That said I learnt quite a few tips and tricks.
    Thank you for sharing your knowledge. Way to go Keith!
    Liked and subscribed.

  • @bentrash7885
    @bentrash7885 Před 4 lety +26

    Awesome tutorial! One advice I'd have for any python developers is to get in practice of working within virtual environments. Really helps to avoid conflicts when you're working on a project which may require some older versions of a library but your other projects may require latest ones, stuff like that.

  • @paulblades2325
    @paulblades2325 Před 2 lety +4

    Thank you so much for your time and effort. This is the best python tutorial I have watched. Straight forward and well organized. I appreciate the time stamps.

  • @AndrewMann205
    @AndrewMann205 Před 5 lety +5

    Between jobs for the first time in decades I wanted to learn data science using software other than just Excel and Access. Your video was well explained and frankly better than anything else I have seen so far involving Python and Pandas. Thank you for a job well done.

  • @andyn6053
    @andyn6053 Před 4 lety +7

    WOW! This was just what I have been looking for! Fantastic tutorial! You explained everything very well and clear from start to finish. Best Pandas tutorial on youtube for sure! Thanks man :)

  • @vzntoup
    @vzntoup Před 3 lety

    Like, seriously, The best of the best Pandas Course I have done so far! Starts off easily and basically and the explodes!

  • @remy0705
    @remy0705 Před 4 měsíci +1

    This 1 hour course is all I need for my data analysis course. This is the best video I found on CZcams. Thanks ❤️❤️❤️

  • @micsierra806
    @micsierra806 Před 5 lety +26

    Excellent tutorial; exactly what I was looking for. Liked and subbed. Thank you for sharing your expertise.

  • @prubin18
    @prubin18 Před 2 lety +7

    Great video! One of the best pandas tutorials I've seen.
    I have one comment though. When you run (at 40:00)
    df.loc[df['Name'].str.contains('Mega')])
    You are actually including Meganium in this filter, even though it is not a Mega pokemon. So, one needs to include a space after Mega, such as:
    df.loc[df['Name'].str.contains('Mega ')])
    One can see that this makes a difference because when you run
    len(df.loc[df['Name'].str.contains('Mega')])) and len(df.loc[df['Name'].str.contains('Mega ')])), to know the number of rows, there are two distinct outputs (respectively 49 and 48)

  • @martistarti2374
    @martistarti2374 Před 2 lety

    Omgeeeeee!!!! Thank you so much!!! I've searched sooooo many videos trying to help with the delimiter problem I've had (i didn't know that was the problem) and you're the ONLY one I've found that even mentions it!!! 🙌🏾🙌🏾🙌🏾🙌🏾🙌🏾🙌🏾🙌🏾🙌🏾🙌🏾🙌🏾🙌🏾

  • @sarashafiee1973
    @sarashafiee1973 Před 2 měsíci

    wow! this is amazing, Thanks a lot! I love how things don't go as planned and you just find a way, it adds so much to the video. Amazing tutorial!

  • @jamesdonly518
    @jamesdonly518 Před 5 lety +19

    Ok I've been learning Pandas for a while now, over many different sources, and this one video has shown me much more helpful little hints and tips than all of the other material I've looked at previously!!! Thannnnnk you! Please do more Pandas stuff as this has been so awesome =]

  • @rutzyco
    @rutzyco Před 3 lety +106

    Coming from the R environment, I must say this is an excellent tutorial to learn about Pandas. I'm very happy to learn that the tools I use in R for data management can be implemented in a similar way in Python. Thanks for taking the time to put this together! Great job.

    • @konata_fan
      @konata_fan Před 2 lety +1

      Same here

    • @bretfolger631
      @bretfolger631 Před rokem +1

      I agree - coming to Python from RStudio and after looking at videos all day this is definitely the most helpful and intuitive video!

    • @ratansharma8026
      @ratansharma8026 Před rokem +1

      sometimes the syntax may be getting confused for python and r right? if you use both

    • @manan-543
      @manan-543 Před 8 měsíci

      can someone tell me why is r so encouraged in the data science/analysis circle when python can do everything and more and it is so intuitive

    • @rutzyco
      @rutzyco Před 8 měsíci +1

      @@manan-543 I think Python is far more general and overall can do a lot more, but in my field, packages associated with statistical models are far more abundant in R than in Python. For example, I'm not sure Python comes even close to R for the implementation of Bayesian hierarchical models, GLMMs, GAMMs, etc. Also, methods papers often publish packages in R, so it seems to remain the default for statistics. Until the statisticians start switching in large numbers I'm not sure this is gonna change anytime soon; and when it does, it probably will be Julia, not Python.

  • @nuclearhotel2172
    @nuclearhotel2172 Před 3 lety

    Your iterative approach is very effective to expand concepts without overloading. Great job. On to the next one.

  • @woaq4486
    @woaq4486 Před 3 lety

    I have my final exam in my data structures course soon, this was a great way to study and work through things my class covered months ago, thanks so much!

  • @rehanbaig71
    @rehanbaig71 Před 3 lety +3

    The best pandas tutorial, best mentor having strong grip on subject

  • @bidhanbhattarai8863
    @bidhanbhattarai8863 Před 4 lety +6

    Makes me want to play the old Emerald games again, wonderful tutorial, keep them coming

  • @8rameshb
    @8rameshb Před 3 lety +1

    The best tutorial I have seen so far on data analytics. I now see how python/pandas helps in data analytics. Thank you very much for making and sharing this video.

  • @gegao3198
    @gegao3198 Před rokem +1

    Keith, you are the best Python instructor! Very easy to follow. Thank you!

  • @hughjazz8416
    @hughjazz8416 Před 3 lety +57

    I have bought multiple Udemy courses on pandas and this one blows them all out of the water, and it’s free! I’m deff subbing!

  • @joelprestonsmith
    @joelprestonsmith Před 4 lety +3

    Great tutorial. I'm just starting with Python, and this is a great video for picking up a lot of knowledge fast. You asked for suggestions about other videos. I'd definitely like to see more tutorials that are about cleaning data. That's the hardest part, I think. The most laborious and time consuming. I'm learning the re module (regular expressions) for Python, but it's going SLOWLY.

  • @HansOnProduction1984
    @HansOnProduction1984 Před 2 lety

    Keith, I stumbled across your video from random search on deeper understanding of pandas . I felt like you did a great job presenting the material. Well done man, it was easy to follow and understand. I did appreciate the part at the end with the chunk size and group by - would like further explanation of those concepts. Thanks.

  • @crtnnn
    @crtnnn Před rokem +2

    Started my PhD in hydrogeology and learning Python from the scratch. I love your work, keep it up!

  • @DavidWhitt
    @DavidWhitt Před 5 lety +7

    Dude... you should make more videos... you are a natural born teacher!!

  • @Diegtz555
    @Diegtz555 Před 2 lety +10

    Wow, thanks for this tutorial. I'm starting on python and took a course of udemy, but it was confusing, with your explanations many doubts are cleared up. Thanks Keith:)

  • @MichaelPeterDalsgaard
    @MichaelPeterDalsgaard Před 3 lety +2

    I swear this is the most useful python channel on CZcams. Top stuff.

  • @gillesderoo2027
    @gillesderoo2027 Před 2 lety

    You are the GOAT. Your explanations using Pokemon makes so much sense.

  • @takako230
    @takako230 Před 2 lety +5

    Awesome video Keith! I'm a beginner programmer but your explanation is super clear! Thanks for the videos:)

  • @skyblue021
    @skyblue021 Před 4 lety +17

    Thank you Keith for this video, absolutely amazing and valuable for many! THANK YOU!

  • @stephenbouldin8163
    @stephenbouldin8163 Před 2 lety +2

    What an excellent video. I have watched so so many tutorials, but this is definitely one of the very best.

  • @garthhorne617
    @garthhorne617 Před 2 lety

    I have been learning python and using pandas for about 3 months now and done innumerable searches on the internet with questions regarding use of specific statements and coding. I wished I had come across your video earlier! You are a born teacher and know how to layout and explain complex terms and concepts. How can someone that looks so young have such a strong grasp on presentation and user needs? The concepts you explain are the same things I have sought information on for 3 months but all in one place and succinctly explained. Thank you for all your work.

  • @jiangxu3895
    @jiangxu3895 Před 4 lety +6

    I just went through your numpy tutorial. And that's the reason I come here. Thumb up!

  • @disagio9517
    @disagio9517 Před 3 lety +4

    I came for the tutorial, stayed for the cutesy pokemon stuff, really warmed my heart

  • @bencole8301
    @bencole8301 Před 3 lety

    This was such a good walk through for Pandas covering so much information. Thank you so much, I hope you continue to do more videos.

  • @budwhyy9016
    @budwhyy9016 Před rokem

    No BS, To the point!
    Man, probably the best tutorial out there. 🔥🔥
    You have a sub right here. Thank you so very much for explaining this like no one ever has!

  • @mdhidayat5706
    @mdhidayat5706 Před 2 lety +3

    Awesome tutorial Keith, I learnt a lot by following your hour long tutorial.
    Created a new notebook instead of using the GIT version as it doesn't show what happens before you commented the code.

  • @cdgxflower2679
    @cdgxflower2679 Před 5 lety +12

    I've been looking for a good pandas and python video for quite sometime now. I have to say that this is really amazing. You've explained it so well that a beginner like me could easily understand. Great job and thank you. Can't wait for more videos. (if possible, matplotlib)

  • @shpigunov
    @shpigunov Před 3 lety +2

    Thank you Keith for an amazing crash course! I've filled in the gaps I've had with Pandas and now ready to apply this library in my work project! Many thanks!

  • @dondata718
    @dondata718 Před rokem

    this is my second python tutorial follow along, thanks man u explained well, its appreciated

  • @gustinelimurilo
    @gustinelimurilo Před 3 lety +19

    53:30 you can use .size() to get the count of each Pokemon type instead of adding a new column.
    It would look like this:
    df.groupby(['Type 1']).size()
    Great tutorial!!

  • @saurabh-patil
    @saurabh-patil Před 4 lety +4

    This tutorial helped me alot. Thank you so much!

  • @idonotcomplyrevolution
    @idonotcomplyrevolution Před 2 lety +2

    you've been really helpful mate, been struggling with pandas/numpy and finally im getting somewhere! more of this please!!!

  • @IsItBehindTheFridge
    @IsItBehindTheFridge Před 2 lety

    Love the way you say 'axsending'.
    Great tutorial!

  • @modernafsolutions3233
    @modernafsolutions3233 Před 4 lety +8

    Wow man! Holy smokes that was such an amazing breakdown. I came into this knowing nothing about Pandas and now I want to get back to work with my personal data! Thank you so so so so much. I’m off to find the documentation!

    • @KeithGalli
      @KeithGalli  Před 4 lety +2

      Glad you enjoyed! Your comment made my day :)

  • @stephanierodriguez1035
    @stephanierodriguez1035 Před 4 lety +5

    This was such a great introduction to pandas and on DataFrame. This is exactly what I was looking for.
    Since I hadn't previously downloaded pandas onto my mac, and didn't feel like installing anaconda either, I was running into some troubles installing pandas with just "pip install pandas" so I thought I would include the instructions as to how I did it.
    simply do:
    pip install pandas --user
    If nose and tornado aren’t downloaded do:
    pip install nose --user then pip install tornado --user (nose needs to be installed first)
    then terminal also suggested I add it to my path, so I did:
    sudo nano /etc/paths
    add the path at the end of the file
    do ^X and then Y then hit enter

  • @atraps7882
    @atraps7882 Před 3 lety +1

    Day 1 on my journey to learn data analysis with python, this vid and kaggle's free pandas course is just what i needed to give me more motivation to keep learning.

  • @johnwalton1656
    @johnwalton1656 Před 2 lety

    This helped me so much Keith thank you. Work does not want me to use excel anymore for any data frame so I have to learn python. I have spent nearly a month trying to learn, even something as simple as adding a data set to python I was getting wrong. Now I am so confident with the work I have produced.

  • @kylieying2
    @kylieying2 Před 5 lety +9

    Thanks for posting! As an MIT student taking a data analysis class, this video was very helpful, more useful than the other tutorials online!!

    • @kipishism
      @kipishism Před 5 lety

      Found it very useful too!

    • @kregg34
      @kregg34 Před 5 lety +4

      "As an MIT student"
      Weird flex but ok

  • @KeithGalli
    @KeithGalli  Před 5 lety +973

    Video Outline!
    0:45 - Why Pandas?
    1:46 - Installing Pandas
    2:03 - Getting the data used in this video
    3:50 - Loading the data into Pandas (CSVs, Excel, TXTs, etc.)
    8:49 - Reading Data (Getting Rows, Columns, Cells, Headers, etc.)
    13:10 - Iterate through each Row
    14:11 - Getting rows based on a specific condition
    15:47 - High Level description of your data (min, max, mean, std dev, etc.)
    16:24 - Sorting Values (Alphabetically, Numerically)
    18:19 - Making Changes to the DataFrame
    18:56 - Adding a column
    21:22 - Deleting a column
    22:14 - Summing Multiple Columns to Create new Column.
    24:14 - Rearranging columns
    28:06 - Saving our Data (CSV, Excel, TXT, etc.)
    31:47 - Filtering Data (based on multiple conditions)
    35:40 - Reset Index
    37:41 - Regex Filtering (filter based on textual patterns)
    43:08 - Conditional Changes
    47:57 - Aggregate Statistics using Groupby (Sum, Mean, Counting)
    54:53 - Working with large amounts of data (setting chunksize)
    Thanks for watching friends! :)
    Let me know if you have any questions

    • @dtran288
      @dtran288 Před 5 lety +4

      YES!!! THANK YOU!

    • @shadow2frost325
      @shadow2frost325 Před 5 lety +7

      Thank you so much for posting this! I have a test in Python soon, so I've been watching this for a review. You explain everything so well and make it easy to follow. I also like how the data was from Pokémon - it makes it more relatable.

    • @dchitan1234
      @dchitan1234 Před 4 lety +2

      great tutorial

    • @tejasnareshsuvarna7948
      @tejasnareshsuvarna7948 Před 4 lety +34

      A reference notes to help you while you watch the video.
      docs.google.com/document/d/16qcfjwLp1vV-5VnIOGuDC2vxkHQ534_RzQd2Gihk7x8/edit?usp=sharing

    • @Tropax1
      @Tropax1 Před 4 lety +2

      Hey dude, love this video by the way but I have a question, can this data be used for machine learning? I have my exams coming up where I have to find a dataset to make predictions and stuff. Are these pokemon cards, do they have label and features if you understand what i'm talking about? Any help would be greatly appreaciated. Thanks in advance.

  • @MatBat__
    @MatBat__ Před 3 lety +2

    Bro I started a data science internship in the beggining of the Year, we use a lot of pandas and you are saving my life from day 1.
    Thanks again, you are a god send! Subbed on both channels, cheers!

  • @000maestro000
    @000maestro000 Před 3 lety

    This tutorial is built so well, I keep going back to this. thanks

  • @xnick_uy
    @xnick_uy Před 2 lety +12

    27:15 It seems that the dataframe got scrambled up a bit there, most likely from having the cell running multiple times. Even when there was an error message, it appears that either the Total or the Legendary column was moved to the left of HP. Upon running the cell again (with the corrected version?) it calculated a new Total adding the previous values and generating corrupted results.

  • @jasonaraosfuentes2130
    @jasonaraosfuentes2130 Před 5 lety +79

    This is an extremely usefull tutorial. You explain so good bro. Thank you very much. Like and subscribed. Hugs.

  • @tototoysentertainment9483

    i watched more than 10 different videos about pandas, this is the most easy and understandable one. Worth your time!

  • @harshitsarda3297
    @harshitsarda3297 Před 2 lety

    literally one the most useful videos on pandas ever

  • @yomajo
    @yomajo Před 5 lety +11

    Great job dude.

  • @mohitjain4943
    @mohitjain4943 Před 5 lety +6

    finally.. a new video... I was waiting for a Long Time😍😋

  • @yagovpf
    @yagovpf Před 2 lety

    Thanks for this content. I really apriciate when people pass their knowledge ahead. I'm starting with Data Analytics, after getting to know the basics of Python. Your video helped me a ton! Hope we get to see other videos with panda a cases of use.

  • @franciscoortega104
    @franciscoortega104 Před 6 hodinami

    Thanks Keith for this video! I'm new on data science I'm using your videos to practice and learn a lot more. Really thanks!

  • @SMFahim-vo5zn
    @SMFahim-vo5zn Před 4 lety +8

    When I start making money with these knowledge, I'll give you some share!

  • @philipcoppage3592
    @philipcoppage3592 Před 3 lety +5

    SQL person w/ limited exposure to Python here. This was useful as hell.

  • @sunnysky115
    @sunnysky115 Před 2 měsíci

    The is one of the best python videos that I've seen online. Thank you.

  • @oziomaignatius9778
    @oziomaignatius9778 Před 2 lety

    Thanks so much, you are a life saver. I have been stuck at running just pandas since Monday.

  • @BrandonS-lk2qc
    @BrandonS-lk2qc Před 3 lety +3

    I learned so much, thank you. Then at the end...that music tho. I lost it! LOL! Did not see it coming.

  • @DrewLevitt
    @DrewLevitt Před 2 lety +7

    In the chunksize section, you pick a well-documented bad practice, namely calling pd.concat inside a for loop. As the loop runs repeatedly, this operation becomes more and more expensive (because new_df gets longer and longer). Per the pandas documentation, the better approach is to append each df to a list and then pd.concat the list elements just once, after the for loop.

    • @terabhaininja9
      @terabhaininja9 Před rokem

      Hello, can you please provide with a tutorial for that? Quite new and clueless here.

    • @terabhaininja9
      @terabhaininja9 Před rokem +3

      dataHere = []
      for chunk in pd.read_csv('modified.csv', chunksize=5):
      dataHere.append(chunk)

      newnew = pd.concat(dataHere)
      This looks right?

  • @aaronbaldwin2845
    @aaronbaldwin2845 Před 3 lety +1

    This is so great. I appreciate you taking your time to do this. It takes the mystery out of Pandas and got me started. Thanks

  • @vicbits
    @vicbits Před 2 lety

    This the best Pandas videos I have ever watched. Thank you so much for this.