Effective Pandas I Matt Harrison I PyData Salt Lake City Meetup

Sdílet
Vložit
  • čas přidán 27. 08. 2024

Komentáře • 49

  • @MartyAckerman310
    @MartyAckerman310 Před rokem +21

    I'm not exaggerating when I say this video changed my life.
    I went from a guy who did everything upstream in SQL and grudgingly used Pandas to a guy who uses Pandas for everything.
    The approach Matt demonstrates also translates generally to PySpark.
    I'm now considered the go-to guy for Pandas and PySpark code in my department. There's so much bad code around, often written by people with advanced degrees and MATLAB experience it seems. I could make a full time job out of cleaning up bad code.
    Dot chain FTW!

    • @mattharrison721
      @mattharrison721 Před rokem +1

      Thanks! Glad to help.

    • @amilkyboi
      @amilkyboi Před 9 měsíci

      Heh, MATLAB and bad coding practices - the two are never far from one another it seems.

  • @DavidDobr
    @DavidDobr Před 2 lety +5

    90 minutes of pure gold.
    Thanks Matt!

    • @mattharrison721
      @mattharrison721 Před 2 lety

      Thanks David. 👍🙏 Make sure you check out my book, Effective Pandas, if you appreciated this.

    • @scottlucas232
      @scottlucas232 Před 2 lety +1

      AGREE COMPLETELY ! FANTASTIC PRESENTATION ! Learned more here than in past two years

  • @ninhluong5004
    @ninhluong5004 Před rokem +3

    This is easily the best pandas guide I have ever watched so far.

  • @AkashRana1111
    @AkashRana1111 Před 2 lety +7

    This is gold! Matt did an amazing job showing best practices when using pandas and a lot of intuition about how pandas function run under the hood.

  • @bendirval3612
    @bendirval3612 Před 2 lety +9

    This was a ridiculously useful video. I feel like I've watched a lot of python videos, but I think this might be the most practically useful for people who are not brand new to pandas--who use it all the time.

  • @flipside5482
    @flipside5482 Před 6 měsíci

    This man is a living data legend.
    Mass respect.

  • @nickhodgskin
    @nickhodgskin Před 2 lety +10

    Really interesting talk, was doubtful about chaining at first but you have converted me :) . A very very informative talk, thanks

    • @mattharrison721
      @mattharrison721 Před 2 lety +1

      Thanks for coming around Nick. 😉 Hope you find these techniques useful to you.

  • @erginceyhan
    @erginceyhan Před 2 lety +1

    Great presentation. As others said pure gold. If there is button called pure gold I would have clicked it. A simple like is not enough. It also changed my view of code organization. Thanks for sharing.

  • @annagora6409
    @annagora6409 Před rokem

    Matt, big thank you for chaining idea!

  • @johannes-euquerofalaralema4374

    By far the best pandas video I have ever seen

  • @aoihana1042
    @aoihana1042 Před rokem

    This tutorial had so many gems! Thanks Matt

  • @rephechaun
    @rephechaun Před rokem

    This is mind blowing... Thank you very much!

  • @gregorywpower
    @gregorywpower Před rokem

    I can’t wait for you to give another talk on polars!

  • @bullbranch
    @bullbranch Před rokem

    Excellent Pandas best practices video. I was already a big user of chaining but for some reason hadn't used append much. This is a cleaner way to do things and I will be using it. My next notebook is going to be much easier to maintain and much easier to build. Thanks Matt!

  • @FRANKWHITE1996
    @FRANKWHITE1996 Před rokem +1

    Thanks for sharing ❤

  • @whkoh7619
    @whkoh7619 Před 2 lety +1

    Thanks Matt, this was an incredible presentation. Came here from the Real Python podcast, just bought the book too!

  • @santchev1326
    @santchev1326 Před 2 lety

    Really interesting, many thanks to Matt and Pydata :)

  • @NearLWatson
    @NearLWatson Před 2 lety +1

    I was looking how to speed up my pandas operations since I read Python itself is faster than R and pandas should be faster than python, i am happy i came here.
    Excellent tips that I am going to experiment and hopefully achieve a quicker output time.
    Excellent session nevertheless.

  • @Davidkiania
    @Davidkiania Před 2 lety +1

    I really love this session and it’s completely changed the way I process data going forward.
    Thanks a lot !

  • @elidrissii
    @elidrissii Před 2 lety +1

    Here from your HN comment. Super informative.

  • @ioannisnikolaospappas6703

    Thanks for the wonderful pandas insights matt and pydata!

  • @firefoxmetzger9063
    @firefoxmetzger9063 Před 2 lety +4

    1:18:00 For the specific question being asked (find duplicates in a primary key) there is a much simpler solution than what Matt Harrison suggested: df.duplicated("primary_key", keep=False). It will select all rows with non-unique values in the "primary_key" column, i.e., all the rows that are duplicated.
    Matt solves the more general problem of "find all rows for which the element in primary_key occurs at least N times". A more concise (though perhaps less readable) solution to this would be something like
    (df
    [df.primary_key.value_counts()[df.primary_key].reset_index().primary_key > N]
    )

    • @kernel2006
      @kernel2006 Před 2 lety

      An alternative to your approach is to use .transform() with .groupby(), to act effectively like a SQL window function that counts the primary keys, but whose result is the same length as the original data (rather than being collapsed due to aggregation).
      Something like:
      num_dups = df.groupby('key')['key'].transform('size') # has same index as df
      df.loc[num_dups > N]

  • @hazemmosaad3440
    @hazemmosaad3440 Před rokem

    Really interesting and informative talk.
    Thanks

  • @samplaying4keeps
    @samplaying4keeps Před rokem

    Thank you for this! This is super helpful. I learned so much!

  • @grumpy_techo
    @grumpy_techo Před 2 lety

    Thanks for you 'rant' Matt - have your recent books and still realised something that I should be doing with my data. 👌

    • @mattharrison721
      @mattharrison721 Před 2 lety

      Thanks Tyrone! Good luck with your Pandas. 😉🐼

  • @jongcheulkim7284
    @jongcheulkim7284 Před rokem

    Thank you

  • @antecavlina8897
    @antecavlina8897 Před 2 lety

    just a tip:
    at 48:30 when commenting line by line upwards you could point with mouse at desired line, then press (i think) ALT and keep pressed, pointer might switch to a thin lined cross, then drag with mouse pointer up or down the lines and then insert #
    its like doing block comment...
    still looking for a way to do that without mouse, but not sure to use sth like vim extension, if there is one...

  • @mischaminnee
    @mischaminnee Před rokem

    Awesome!

  • @abimaeldominguez4126
    @abimaeldominguez4126 Před 2 lety

    I have a problem with aggregations, sometimes if you aggregate two columns and one column has a cell with a NaN .groupby will ignore it, I know you can keep those NaNs, but I would like to see a use case when is good idea to keep NaNs while using a .groupby and when is not a good idea.

  • @tariqaziz1795
    @tariqaziz1795 Před 2 lety

    Sir the apply method gave me error such as unhashable series.
    How to fix that?

  • @dragangolic6515
    @dragangolic6515 Před rokem

    Great video, I need this data set.
    Where can I find it?

  • @pmiron
    @pmiron Před rokem

    Can someone identify the font he uses in Jupyterlab ? :D

    • @JimmieChoi93
      @JimmieChoi93 Před 3 měsíci

      'Lato' I guess

    • @pmiron
      @pmiron Před 3 měsíci

      @@JimmieChoi93 I just tried and I don't think it is Lato.

    • @JimmieChoi93
      @JimmieChoi93 Před 2 měsíci

      @@pmiron damn. Here's an idea, screenshot it to ChatGPT and ask

    • @pmiron
      @pmiron Před 2 měsíci

      @@JimmieChoi93 haha I actually did try with some screenshots. It recognizes that is a notebook and a monospace font but then suggest it might be the default JupyterLab font or Consolas, Menlo, etc. Also tried WhatTheFont and FontSquirrel with no luck.

  • @joecookieee
    @joecookieee Před 2 lety

    ty for the video matt this is awesome
    can you explain how u got those numbers @ 57:30 --
    6_220 / 125
    Thank you!

    • @walkingintopeople
      @walkingintopeople Před 2 lety

      235.215 is a ratio between mpg and l/100km. It's a constant the presenter looked up on a search engine ahead of time