How to use groupby() to group categories in a pandas DataFrame

Sdílet
Vložit
  • čas přidán 4. 09. 2024

Komentáře • 145

  • @ShiladityaBiswasNow
    @ShiladityaBiswasNow Před 3 lety +37

    Thanks a lot! You saved me days! I'm literally crying rn. So pricise and to the point. Love the content

    • @ChartExplorers
      @ChartExplorers  Před 3 lety +3

      I'm glad it helped! Groupby was always a sore spot for me learning, but now that I know it I use it all the time.

  • @lightningmi
    @lightningmi Před 2 lety +3

    Good step by step tutorial. But one thing you missed by Groupby multi columns, and apply different aggregate function. example: [column A, column B] A=sum, B=average. something like that

  • @DuniyaJahan1
    @DuniyaJahan1 Před 2 lety +1

    🙏🙏🚩🚩🙏🙏Truly sir great lecture I had been trying to understand group by in pandas since last 25 days, but no-one was able to clear my confusion. But you sir explained me brilliantly and I am really so obliged of you. Thanks and I subscribed you and share on Facebook page, from Banaras City, India 😄😄😄🙏🙏🙏🙏🙏🙏

  • @crystalchaung1576
    @crystalchaung1576 Před rokem

    I had to watch this a couple times too hear that part around 4:18 about why groupby will only return those who survived. It is good you added that. Now that I understand that, I can take a shot at age groups for the Titanic.

  • @athief
    @athief Před 2 lety

    It's great to have a 5-min quick & dirty dive, but a couple more seconds here and there to say that "agg" means "aggregate", that if we want more than one column summarised we must provide a list (hence the double brackets), etc. It provides a simple explanation that facilitates memory.

  • @imad_uddin
    @imad_uddin Před 3 lety +3

    I have seen three of your videos so far, all were very well thought out. Really helpful. You deserve many more subscribers!

  • @MohsinAli-yd9js
    @MohsinAli-yd9js Před 2 lety +2

    at 5:39. in setting labels for 'age_bins' how did it get to know that from which age group is young, which one is middle and old. like you did not set the parameters from 0 to 20 for young, 21 to 60 for middle and above 60 for old. or either it does it implicitly.

    • @JopieSchaft
      @JopieSchaft Před 2 lety

      Using bins=3 as a parameter to the pd.cut() function automatically divides the group into 3 equally sized categories. See my comment to Xuan Tran for an explanation of how you can find out what it does or what you could do differently.

  • @rashadm.sadigov4366
    @rashadm.sadigov4366 Před rokem

    Dude thank you sooo much. Finally someone with proper english explained things properly

  • @saisarath623
    @saisarath623 Před 2 lety +1

    Really helpful tricks. Thank you!

  • @sgerodes
    @sgerodes Před 3 lety +3

    Brilliant. It had exactly what i needed. Multiple groups and the splitting trick

  • @Aleqsie
    @Aleqsie Před 8 měsíci

    ok this is a mad comprehensive information that is explained amazingly briefly and clearly within just 7 min.

  • @blueciel_03
    @blueciel_03 Před 8 měsíci

    Thanks a lot, it's really informative for my upcoming exam.

  • @XuanTran-ri1hn
    @XuanTran-ri1hn Před 2 lety +4

    Hi. Thank you for your video. May I ask how do you know exactly that which age group is divided to which bin? Although these ages are put into 3 bins but I am unclear which exact age which bin contains? For example: what age range for 'young' in this case?

    • @JopieSchaft
      @JopieSchaft Před 2 lety +1

      ​@Adeel KhanI can think of 3 approaches to this:
      - Group by age_bins, then take the minimum and maximum age: df.groupby(['age_bins']).['age'].agg(['min', 'max'])
      - Use retbins=True in the pd.cut() function; I think retbins returns the bounds of your bins.
      - Define the bins yourself, i.e. bins=[0, 20, 60, 120] (instead of bins=3 as in the video) will divide the passengers into a 60 bin

  • @carolinamalosabastos2648
    @carolinamalosabastos2648 Před 9 měsíci

    Great video! so clear... It helps me a lot! Tks from Brazil!)

  • @skye5107
    @skye5107 Před 9 měsíci

    Thanks a lot i am searching this in entire weeks on articles.

  • @denisml42
    @denisml42 Před 2 lety +4

    Thanks for the great video. Im wondering about how you could group the ages in intervals of 10 years. I feel like you probably wouldnt use cut for that since you would need to know the highest / lowest age in order to determine how many cuts you need. Do you have a recommendation on how to do that?

  • @jackfarah7494
    @jackfarah7494 Před 7 měsíci

    Simple and informative i love this video and am saving it for future references! Thank you!

  • @tonianibal7585
    @tonianibal7585 Před rokem

    Thank you very much for sharing! It really helped me, was exactly what I was looking for. People like you are blessed ang good people helping to develop this world! I just subscribed, follow and will share in my groups!

  • @youknownothing_
    @youknownothing_ Před rokem

    great video. it would be great if you also provide the link for the notebook

  • @mrb7931
    @mrb7931 Před rokem

    Thanks a lot! You saved me day , now i can calculate mean by categorizing datasets

  • @lawngreenlyp
    @lawngreenlyp Před 2 lety

    This is a very good video for explanation. Thanks so much from Hong Kong.

  • @afonsoosorio2099
    @afonsoosorio2099 Před 2 lety

    Awesome 👌. Clear crystal 🔮.
    I specially like the bin trick, straightforward. That is really amazing 👏 😍. I had to break into intervals using numpy select ( ) or user defined function with apply ( ) to get the same result with the bin method.
    Keep it up.

  • @Monkeysal07
    @Monkeysal07 Před 2 lety

    THANK YOU!!! that last tip is a life saver

  • @fashaikh5339
    @fashaikh5339 Před 3 lety +1

    VERY CLEAR , PLEASE IF YOU CAN EXPLAIN HOW DOING INTERSECTION IN CASE WE HAVE (ONE -TO -MANT) RELATIONAL DATA BASE ?. THANKS

  • @InteligenciadeNegocios

    This is one of the best videos EVER! really helpfull! Thanks a LOT!

  • @rohitekka2674
    @rohitekka2674 Před 3 lety +1

    concise, short , illustrious!! Thanks alot!!!

  • @zebramc3693
    @zebramc3693 Před rokem

    Thank you for your detailed demonstrations.

  • @aishwaryapattnaik3082
    @aishwaryapattnaik3082 Před rokem +1

    Just what we needed . Awesome content 🙌🏼

  • @ThanhVo-zs7ns
    @ThanhVo-zs7ns Před 2 lety

    Very good and funny videos bring a great sense of entertainment!

  • @ericc1317
    @ericc1317 Před 2 lety +1

    The as_index=0 tip is great! When doing this with .count() instead of sum, like for example I’m doing a project with the code format Df.groupby([‘x’][‘y’],as_index=False)[‘y’].count(), is there any way to keep the original y column along with the new y “count” column in a resulting data frame? With this method it replaces the original y with the count of y.

  • @osoriomatucurane9511
    @osoriomatucurane9511 Před 11 měsíci

    Hi Bradon, Awesome tutorial. 4:41, survived by class, mean and sum. Proportion would have been more meaningful. How to get percentagem there, I mean the proportion of survived (survived rate) by class. Using transform?????
    For aggregation only allowed sum, mean, count,......

  • @coledd9487
    @coledd9487 Před 2 lety +1

    Hey there, for some reason when i try doing Single Group, Multiple Columns (like in 2:19), I keep getting an error basically stating that it thinks my 'fare' column is filled with strings - as opposed to floats. As such, I can't do sum/mean/numeric methods on that data.
    I can't seem to get around it.

    • @ChartExplorers
      @ChartExplorers  Před 2 lety

      Hey Cole DD, sometimes when you read in your data pandas thinks the data is a string even though it should be integers or floats. This video here czcams.com/video/evKYySLSzyk/video.html discusses how to convert datatypes of columns and some common problems that you may run into when doing so. Let me know if that works.

  • @vitorribeirosa
    @vitorribeirosa Před rokem

    Neat and objective!!!
    Thanks for sharing. I do appreciate your content.

  • @pazenriqueguillermo
    @pazenriqueguillermo Před 2 lety +1

    Great Video! One question... Let say you do like the first example, group survivers by class and sum(), but I want the result sorted in a descending order ( the class with most survivers to the least...) How would you do that?

    • @coledd9487
      @coledd9487 Před 2 lety +1

      .sort_values(ascending=False)

  • @nivviyer_
    @nivviyer_ Před 2 lety +1

    Thank you so much sir !!

  • @ssrwarrior7978
    @ssrwarrior7978 Před 3 lety

    wow, u made it easy for me and saved lot of time.. THANK YOU

  • @andrenevares7543
    @andrenevares7543 Před 2 lety

    Great explanation! Good JOB! Thumbs up!

  • @Jitendrakumar-du1ng
    @Jitendrakumar-du1ng Před 2 lety +1

    thanks for the great video, it really helped me.

  • @VRUNO
    @VRUNO Před 2 lety

    you got a new follower Sir!
    really clear, really good explained, God, finally I understand :D thanks so much!

  • @ZirothTech
    @ZirothTech Před 2 lety

    Great video, thanks!

  • @onurkoc6869
    @onurkoc6869 Před 2 lety

    you are telling very well proffessor:))

  • @TheShrikhande
    @TheShrikhande Před 3 lety +1

    What if I have a dataframe with two date columns (start-date, end-date) along with other attributes and I wish to create bins for each year incorporating both those date columns.
    How do you think I can manage to do that?

  • @bnadir3930
    @bnadir3930 Před 2 lety

    Great video ! how can I get max() value grouped by column and yet get the intire dataframe colums to be presented ?

  • @michaelcruz1322
    @michaelcruz1322 Před 3 lety +1

    How did python determine which age_bin to place the individual into? You never specified the age-ranges associated with the categories?

    • @ChartExplorers
      @ChartExplorers  Před 3 lety

      Hi Michael, good question. The age bins was were grouped with the pandas cut method. By default the cut method will turn continuous data into categorical data by grouping it into three bins (you can specify how many bins you want - but if you don't it will make three bins). So if you have 12 values it will create three bins with 4values in each bin. pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html

    • @Monkeysal07
      @Monkeysal07 Před 2 lety

      Maybe this will allow you to specify the ranges of the bins. The length of the labels have to be -1 inferior with respect to the length of the bins
      df['age_cat'] = pd.cut(df['age'],
      bins=[x for x in range(0,100, 5)],
      labels=[x for x in range(5,100, 5)],
      right=True)

  • @hansrc4469
    @hansrc4469 Před rokem

    When I use groupby for multiple columns like you did, it show me a message that used list instead of square brackets.

  • @nurshibumi
    @nurshibumi Před 2 lety

    thank u for your time and exertion!
    i have a question, i have a dataset, there are a few columns in it including "Fuel_Type". Fuel types are petrol, diesel and CNG. all i want is to group by the fuel_type and store the copy of datasets in variables both petrol and diesel. how can I do that, i have been searching for hours :))) pls answer me

  • @tinayesibanda3070
    @tinayesibanda3070 Před 10 měsíci

    How can I combine groupby then do distinct count on one of the cat column then sum on some of the numeric column

  • @mohamedfawzy5453
    @mohamedfawzy5453 Před rokem

    Great explanation! Thank you.

  • @govindrajput8503
    @govindrajput8503 Před 2 lety

    hi thanks for this. How do I show group by results for more than one variable with more than one aggregate function without the index. so basically mulitple groups as columns + aggregated on more than one function

  • @rajibroy1170
    @rajibroy1170 Před rokem

    You are a savior

  • @jakobstigsson9687
    @jakobstigsson9687 Před 2 lety

    Hey, thanks for the video. I have a dataframe that has a column with 0-4 in value, but I wish to group it by 0 and then 1-4. How would that be possible? Is it a big difference?

  • @mohamedkhaled902
    @mohamedkhaled902 Před rokem

    Very helpful , keep it up ❤

  • @gabriellopes0
    @gabriellopes0 Před rokem

    Great explanation!

  • @javierclement3047
    @javierclement3047 Před rokem

    It seems to me like this function doesn’t really need to exist. I feel like I could make all of these manipulations relatively easily with Boolean operations.
    Can someone explain the advantage of using groupby()? Because it’s easier? Or is there something I’m missing?

  • @ericzheng4815
    @ericzheng4815 Před 2 lety

    When trying out this example: df['age_bins'] = pd.cut(df['age'], 3, labels=('young','middle_age', 'old')), I got a error returned. TypeError: can only concatenate str (not "float") to str. I don't know why. I looked at the manual, the code seems good to me.

  • @AIdevel
    @AIdevel Před rokem

    I have a problem it keeps giving me keyError it doesn’t identify the name of the columns how can I solve it ? Please help me

  • @sebastianperalta4775
    @sebastianperalta4775 Před 2 lety

    Thanks for the video.

  • @czr372
    @czr372 Před rokem

    Saved me looots of hours haha! thanx!

  • @rohanbangash5827
    @rohanbangash5827 Před 2 lety

    How would we put the result of a groupby function as a column in our dataframe?

  • @ahovebismark4001
    @ahovebismark4001 Před rokem

    so please, I need a personal favor, I need to make labels for a plot I generated from a groupby method, any help with that?

  • @pramishprakash
    @pramishprakash Před rokem

    Great video sir

  • @ibrar6121
    @ibrar6121 Před 11 měsíci

    In the Quick Tip Section, How did the program know that 29 is Middle_age, 2 is Young_age and 50 is old???

  • @aliyananwar3727
    @aliyananwar3727 Před 2 lety

    I came here to understand concept of groupby but left with emotions we men sacrificed. 🥺

  • @fashaikh5339
    @fashaikh5339 Před 3 lety +1

    I have data frame contains three columns, one for restaurants_id , the second for his categories (one or plus categories) and the third column is for his zone. I need to calculate for each restaurant how many restaurants in his zone that share this restaurant in one category at least, and put the result in a new column ?

    • @ChartExplorers
      @ChartExplorers  Před 3 lety +1

      Hi F Ashaikh, is it possible for you to email me your data (or provide me with some made up data that is similar to the data you have). That will help me see what is going on a little better. My email is bradonvalgardson@gmail.com

    • @fashaikh5339
      @fashaikh5339 Před 3 lety

      I did , thank you very much for your help.

  • @danielrico3352
    @danielrico3352 Před 2 lety

    Thanks for the video! I have a question. If you want to select one specific biological sex, How could I write that code? For example just females.
    df.groupby(["pclass", [sex] == female])["survived].sum()
    It would be right to write it like this?
    Thanks in advance!

  • @maxons.e4643
    @maxons.e4643 Před 2 lety

    How do you sort the data when different conditions are involved in the groupby?

  • @MatthieuKhairallah
    @MatthieuKhairallah Před rokem

    Thanks a lot!

  • @premprakash6863
    @premprakash6863 Před 2 lety

    I want to group by on mobile number and want to merge messages received, how can i do that?

  • @yili6498
    @yili6498 Před 2 lety

    very clear, thxxx

  • @pritisingh2432
    @pritisingh2432 Před 3 lety +1

    Hey I'm having problem in groupby as it is giving Data error and No numeric type to aggregate. Could you please help ?

    • @ChartExplorers
      @ChartExplorers  Před 3 lety

      Hi Priti, will you run df.dtypes and let me know if there are any numeric (float or int) datatypes in your dataframe? If they are all objects check out this video on how to convert objects into numberic values czcams.com/video/evKYySLSzyk/video.html (hopefully that will solve your problem. If this doesn't solve your problem will you copy and past your groupby statement and send it to me please?

    • @pritisingh2432
      @pritisingh2432 Před 3 lety

      @@ChartExplorers # Visualize Churn Rate by Gender
      plot_by_gender = churn_dataset.groupby('gender').Churn.mean().reset_index()
      plot_data = [
      go.Bar(
      x=plot_by_gender['gender'],
      y=plot_by_gender['Churn'],
      width = [0.3, 0.3],
      marker=dict(
      color=['orange', 'green'])
      )
      ]
      plot_layout = go.Layout(
      xaxis={"type": "category"},
      yaxis={"title": "Churn Rate"},
      title='Churn Rate by Gender',
      plot_bgcolor = 'rgb(243,243,243)',
      paper_bgcolor = 'rgb(243,243,243)',
      )
      fig = go.Figure(data=plot_data, layout=plot_layout)
      po.iplot(fig)
      This is giving me the error .Can you suggest an alternative

  • @varshakamble2095
    @varshakamble2095 Před 2 lety

    Thanks by heart

  • @shaikhjunaid8693
    @shaikhjunaid8693 Před rokem

    Sir how will you solve the problem when you have to determine who are the top5 highest rated players for every position in fifa dataset?

    • @YoungerLei
      @YoungerLei Před rokem

      Hi, it might be fifa.groupby(by='position').apply(lambda group: group.sort_values(by='rate', ascending=False').head(n=5)

  • @MachineLearningPro
    @MachineLearningPro Před 9 měsíci

    Great video

  • @AimarZayyan
    @AimarZayyan Před 2 lety

    Hi, how do i get with specific value column pclass sum for ex : 1 only

    • @ChartExplorers
      @ChartExplorers  Před 2 lety

      I'm not sure I understand your question. Are you looking to filter the dataframe so that only pclass = 1 is contained in the dataframe? You could use a boolean mask pclass1 = df[df['pclass'] == 1]. If that's what you are looking for you can check out this video on filtering which I think you will find helpful czcams.com/video/ni9ng4Jy3Z8/video.html

  • @crunchnos
    @crunchnos Před 2 lety

    Thank you so f much!

  • @paar6128
    @paar6128 Před rokem

    Waow, your're amazing man :))

  • @MagnusAnand
    @MagnusAnand Před 3 lety

    excellent tutorial

  • @jaskaransingh3200
    @jaskaransingh3200 Před rokem

    Nice. helpful

  • @kiko1955
    @kiko1955 Před 2 lety

    Como hago un grafico con el resultado de un groupby.
    How do I make a graph with the result of a groupby?

  • @isaacenobun6370
    @isaacenobun6370 Před 2 lety

    Thanks man

  • @shoaibsoomro
    @shoaibsoomro Před 2 lety

    at 5:54 while applying pd.cut did not work for me it gives error
    TypeError: can only concatenate str (not "float") to str
    Solution: used the two lines that solved the issue.
    df['age'] = df['age'].replace('?',0) #clean data
    df['age']=df.age.astype('float64') #convert data type to float

  • @russellmubaya2662
    @russellmubaya2662 Před 3 lety

    Can we then plot a graph of any sort using the generated table we've just grouped ?
    @Chat Explorers

  • @brainwaves2389
    @brainwaves2389 Před 2 lety +1

    thanks

  • @houndofjustice5
    @houndofjustice5 Před 3 lety +1

    Hello is there any way to put all values in their column depending on their index if value i m trying to group by is lets say Switzerland and it has multiple Happiness ratings for each year how do i put all ratings in same column for each year but just seperate them by comma without summing them up?

    • @ChartExplorers
      @ChartExplorers  Před 3 lety +1

      Great question Ivan. Try this out and see if it works for you.
      First I create a dictionary of data with 3 different countries and some happiness scores.
      Then I create a DataFrame with this data.
      The I use groupby function to group each country and then use apply(list) to create a list of all the values in each group.
      data_dict = {'country':['country_1','country_2','country_3','country_1','country_',
      'country_2','country_3','country_2','country_3','country_1, 'happiness':[3,1,3,5,7,4,1,2,3,4]}
      df = pd.DataFrame(data_dict)
      df_grouped = df.groupby('country'['happiness'].apply(list)

    • @houndofjustice5
      @houndofjustice5 Před 3 lety

      @@ChartExplorers thank you for swift answer i managed to do it for one column but i m trying to do it for multiple columns basically just uniting rows with same country values but seperate them with comma its working when i do it for happiness score but if i try to add happiness rank it just throws out happiness score and happiness rank not values just those strings i tried as list but yea still not working
      I did it with this code which works for Happiness Score:
      frame.groupby(['Country'])['Happiness Score'].apply(lambda x:' , '.join(x.astype(str))).reset_index()

    • @ChartExplorers
      @ChartExplorers  Před 3 lety

      @@houndofjustice5 I think I see what you are asking. So you want to groupby country and then list out all the values for that country in the happiness and rank columns.
      Let me know if this works. If not, I am setting up a discord server for Chart Explorers. That might be a better medium for problem solving.
      # Example Data
      data_dict = {'country':['country_1','country_2','country_3','country_1','country_1',
      'country_2','country_3','country_2','country_3','country_1'],
      'happiness':[3,1,3,5,7,4,1,2,3,4],
      'rank':[1,2,3,4,5,6,7,8,9,10]}
      df = pd.DataFrame(data_dict)
      # groupby with list for multiple columns
      df_grouped = df.groupby('country')[['happiness','rank']].agg(lambda x: list(x))

    • @SudhirKumar-ry4gk
      @SudhirKumar-ry4gk Před 3 lety

      Please help as I have data of employees in which they did multiple sale, I want if any employee did sale more the 50000 againt it each emp I'd of that person print excellent rest low.
      Like
      Emp I'd. Sale status
      Emp1001 5000. Excellent
      Emp1001 45000. Excellent
      Emp1001 2000. Excellent
      Emp1002 5000. Low
      Emp1003 2500. Low

    • @ChartExplorers
      @ChartExplorers  Před 3 lety +1

      Hi @@SudhirKumar-ry4gk, so you are wanting to group by employee Id and for employees that had sales greater than $50,000 mark them as excellent otherwise mark them as low? Is that correct?

  • @marchanselthomas
    @marchanselthomas Před rokem

    to the point!

  • @richarda1630
    @richarda1630 Před 3 lety +1

    nice ! thanks :)

  • @souravde2283
    @souravde2283 Před 3 lety +1

    Awesome.

  • @laychansethaaerd
    @laychansethaaerd Před 3 lety +1

    Perfect

  • @ainahannani4489
    @ainahannani4489 Před 3 lety

    How do I make a poisson distribution of a groupby column?

    • @ChartExplorers
      @ChartExplorers  Před 3 lety

      I'm not sure. I would need to see your data and know more context to better understand what you are trying to accomplish.

  • @mohammadmfd682
    @mohammadmfd682 Před 3 lety

    very good

  • @jha6783
    @jha6783 Před 11 měsíci

    how do you know what is young, middle_age or old. This is not defined.

  • @pursh2002
    @pursh2002 Před 3 lety

    # function that groups data by attribute1 and calculates per-group statistics for attribute2
    mean and count , how do we make a function for this
    def get(data, attr1, attr2, statistic):

    • @ChartExplorers
      @ChartExplorers  Před 3 lety

      Hi Pursh, I'm not sure if I understand exactly what you are trying to accomplish.
      Are you trying to obtain the mean and count on groups based on multiple columns/attributes?
      df.groupby(['pclass','sex], as_index=False)['survived'].agg(['mean','count'])
      If this is the case I'm not sure the purpose of creating a function to do this.

  • @Abdullah_Alhathloul
    @Abdullah_Alhathloul Před 6 měsíci

    nice

  • @srideviponmalarp
    @srideviponmalarp Před 10 měsíci

    Can you send dataset

  • @azrflourish9032
    @azrflourish9032 Před 3 lety

    why '?' is needed while reading a csv file??

    • @ChartExplorers
      @ChartExplorers  Před 3 lety +1

      Good question, I should have explained this in the video. In the csv file missing data is represented with '?'. When we read in missing data into pandas we can tell it that missing data is represented by then pandas will treat it as a missing value rather than getting confused.

    • @azrflourish9032
      @azrflourish9032 Před 3 lety

      @@ChartExplorers oh, thank you (^ ^)

  • @apz9022
    @apz9022 Před 3 lety

    I have a dataframe that has around 20 columns and 800 rows. One column contains multiple duplicate information that I am using as the group, and based on one of the other columns I want to filter the dataframe to show unique values based on the highest number of this column using max(). I still want to retain all of the other columns and end up with a dataframe that contains these unique values including the original columns.
    group = df_UE5_Compatability_info.groupby('lookup')['Function Count'].max()
    where "lookup" is the column I want to group by (containing multiples of the same value) and filter to show the rows with the highest number for "Function Count", how do I make the dataframe contain the other remaining columns associated with the resultant rows determined by the groupby? I am struggling. Difficult to describe in words.. sorry

    • @ChartExplorers
      @ChartExplorers  Před 3 lety +1

      Hi Alan, you did a great job explaining thanks providing me an example of what you have done. 😀 If I'm understanding correctly (please correct me if I'm wrong), you have 1 column that contains categories and you want to get the max value for each of those categories in every column that you have (using groupby).
      Here is a simple example I made that will get the max value for every column in the dataframe based on the groups in Col_4.
      import pandas as pd
      # Create practice df
      df = pd.DataFrame({'Col_1':[1,2,3,4,5],
      'Col_2':[6,7,8,9,10],
      'Col_3':[11,12,13,14,15],
      'Col_4':['Group_1','Group_2','Group_1','Group_1','Group_2']
      })
      # groupby Col_4 (in your case use lookup)
      group = df.groupby('Col_4').max()
      group.head()
      You will notice here, instead of adding a list of columns to perform the groupby function on I excluded it. This will perform the operation on all the columns. In your example, you should be able to do the following to get your answer:
      group = df_UE5_Compatability_info.groupby('lookup').max()

    • @apz9022
      @apz9022 Před 3 lety

      @@ChartExplorers Thanks for the reply. Below is a sample dataset (made up) to try and better explain and one that is more representative to my actual dataset.
      df = pd.DataFrame({'lookup':['abc123','abc124','abc123','abc125','abc125'],
      'Supported':['no','yes','no','yes','yes'],
      'Percentage':[0.9,0.6,0.6,0.7,0.6],
      'Number of features':[1,6,10,8,11],
      'Platform':['Release 1.0','Release 1.0','Release 2.0','Release 1.0','Release 2.0']
      })
      The output should look like the following:
      lookup Supported Percentage Number of features Platform
      0 abc123 no 0.9 1 Release 1.0
      1 abc124 yes 0.6 6 Release 1.0
      2 abc123 no 0.6 10 Release 2.0
      3 abc125 yes 0.7 8 Release 1.0
      4 abc125 yes 0.6 11 Release 2.0
      Column "lookup", Row 0 and 2 are common values, as are rows 3 and 4.
      My goal is to have one row per value in column "lookup", filtered on the highest value in column "Number of features" and all other columns values for the selected row should be shown in the output data frame.
      Using the following group = df.groupby('lookup').max() creates:
      Supported Percentage Number of features Platform
      lookup
      abc123 no 0.9 10 Release 2.0
      abc124 yes 0.6 6 Release 1.0
      abc125 yes 0.7 11 Release 2.0
      But the percentage is wrong for rows abc123 and abc125, as its has included the highest percentage in each of the groups. My desired result is as follows:-
      abc123 no 0.6 10 Release 2.0
      abc124 yes 0.6 6 Release 1.0
      abc125 yes 0.6 11 Release 2.0
      where values for columns "Supported', 'Percentage' are taken "as-is' from the dataframe row that contains the row with the highest "Number of features'
      In my script I am using group = df.groupby('lookup')['Number of features'].max() which returns the following, but I am missing the other columns, in this example Supported, Percentage and Platform.
      lookup
      abc123 10
      abc124 6
      abc125 11
      Also, if I try to save the dataframe to csv, I only get the following
      Number of features
      10
      6
      11
      I would have expected to have this csv output?
      lookup Number of features
      abc123 10
      abc124 6
      abc125 11
      Thanks again.. and I hope this is more descriptive?

    • @ChartExplorers
      @ChartExplorers  Před 3 lety +1

      @@apz9022 thanks for providing the example, that clarifies things a lot. If you use the same dataframe you created in your example you should be able to use the following code:
      new_df = pd.DataFrame(pd.DataFrame(columns=df.columns))
      for item in df['lookup'].unique():
      temp_df = df[df['lookup']==item]
      row = temp_df[temp_df['Number of features'] == temp_df['Number of features'].max()]
      alist.append(row)
      new_df = pd.concat([new_df, row], ignore_index=True)
      new_df
      Sadly, this uses a for loop. There might be another way to do this would avoid the for loop (I need to work on it a little more to get it to work - I'll let you know if I get it to work). I'm also going to look into groupby a little more. There are some cool things you can do with groupby, but this has several constraints that I do not think groupby will support. With 800 rows and 20 columns performance should not be an issue (but it's always nice to squeeze as much performance out as possible just for fun!).
      Hope this works. Let me know.

    • @apz9022
      @apz9022 Před 3 lety

      @@ChartExplorers Thanks.. what is "alist.append" ? I get an error stating "alist" is not defined?

    • @apz9022
      @apz9022 Před 3 lety +1

      @@ChartExplorers Thanks.. updated my code and its working like a charm! Thanks. One point, alist.append(row) did not work for me? I have left it out and it still seems to work. What does this do?

  • @shekharmandal4569
    @shekharmandal4569 Před rokem

    goat

  • @xowp.
    @xowp. Před rokem

    i love u

  • @NextVersionOfYou
    @NextVersionOfYou Před 2 lety

    wow

  • @ericfayhuynh
    @ericfayhuynh Před rokem

    looks like the data set is outdated