Tidyverse in R - tips & tricks

Sdílet
Vložit
  • čas přidán 2. 08. 2024
  • 🔔 Subscribe for weekly R videos: / @tomhenry-datasciencew...
    Here are 18 ways to speed up data cleaning, tidying, and exploration with the tidyverse packages in R. They'll help you to work with data more efficiently, simplify your R code, and surprise your friends!!
    🎉 Enjoyed this video? Leave a comment below to share what you liked the most!
    0:00 Intro
    1:04 Create new columns in a count or group_by
    2:11 Sample and randomly shuffle data with slice_sample()
    3:05 Create a date column specifying year, month, and day
    3:25 Parse numbers with parse_number()
    4:07 Select columns with starts_with, ends_with, etc.
    4:56 case_when to create or change a column when conditions are met
    6:36 str_replace_all to find and replace multiple options at once
    7:15 Transmute to create or change columns and keep only those columns
    7:48 Use pipes everywhere including inside mutates
    9:11 Filter groups without making a new column
    10:04 Split a string into columns based on a regular expression
    11:10 semi_join to pick only rows from the first table which are matched in the second table
    12:20 anti_join to pick only rows from the first table which are NOT matched in the second table
    12:48 fct_reorder to sort bar charts
    14:06 coord_flip to display counts more beautifully
    14:32 fct_lump to lump some factor levels into "Other"
    15:26 Generate all combinations using crossing
    16:00 Create functions that take column names with double curly braces
    18:00 The end
    Code:
    gist.github.com/larsentom/727...
    #rstats #rstudio #datascience #tidyverse
  • Věda a technologie

Komentáře • 53

  • @tomhenry-datasciencewithr6047

    ▶️ Top 7 R packages that are less well known - czcams.com/video/V-EssPrGPHg/video.html
    🎉 *Subscribe* if you want more videos like this! - czcams.com/channels/b5aI-GwJm3ZxlwtCsLu78Q.html
    😃 *Comment* below to share which tricks you liked the most!!

  • @s-sugoi835
    @s-sugoi835 Před 26 dny

    Thanks, I work in a bank we migrated from SAS to R. This is so helpful.

  • @tjaeg
    @tjaeg Před rokem +2

    Please keep on doing these kind of videos!

  • @djangoworldwide7925
    @djangoworldwide7925 Před 2 lety +1

    Super informative and advanced! Thank you. It's hard to find these days advanced tutorials on youtube

  • @bridgettsmith7206
    @bridgettsmith7206 Před 2 lety

    Great tips. I appreciate that you have an index of time stamps for the content. I will be more easily able to reference this video later.

  • @spikeydude114
    @spikeydude114 Před rokem

    Great video! Very dense with information and straight to the point!

  • @ZuluMonk
    @ZuluMonk Před 4 lety +1

    Great tips! Always nice to see better ways of doing things.

  •  Před 2 lety

    This is great, so useful. Thanks!

  • @tpflowspecialist
    @tpflowspecialist Před 3 lety

    Fantastic tydiverse data processing tips. Thank you!

  • @dataslice
    @dataslice Před 4 lety +2

    Great tips, Tom! I'm definitely saving this video!

  • @GimboCodCommentaries
    @GimboCodCommentaries Před 4 lety +1

    Fantastic, Tom! Just subscribed, so helpful

  • @aliramadan7425
    @aliramadan7425 Před 3 lety +1

    Thank you. Learned so much!

  • @MarcelloNesca
    @MarcelloNesca Před 4 lety +1

    Great tips! Always looking for new ways of coding for datasets. Subscribed!

  • @shreyaroraa2234
    @shreyaroraa2234 Před 3 lety

    Very Nice video TOM. Future video idea - Moving from Sql to R common issues and functions comparisons

  • @ecarlosbc
    @ecarlosbc Před 2 lety

    Great tutorial Sir.!!!!!!!

  • @timmytesla9655
    @timmytesla9655 Před 2 lety

    This is awesome. Thank you!

  • @manohar-kg
    @manohar-kg Před 3 lety +1

    Very helpful video... Thanks

  • @QuentinAndres06
    @QuentinAndres06 Před 3 lety

    Tom, you are a boss.

  • @patricklogan6089
    @patricklogan6089 Před 2 lety

    Good stuff

  • @nuk3man
    @nuk3man Před 2 lety +1

    Great video. Question; in tip nr. 9, what does "\\.?$" do in the first str_replace_all?

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  Před 2 lety +1

      Good question! It's a regular expression, and its purpose is to remove an optional '.' at the end of the string of text.
      For example, ' INC.' or ' CO.' or ' INC' or ' CO' would all be matched and replaced with the empty string (i.e. removed from the text). So 'QUANTAS CO.' (hypothetical) and 'QUANTAS CO' would both become 'QUANTAS'
      We can break down the "\\.?$" like this:
      \\. translates into \. - this says to match an actual '.' character. If we didn't have the '\\', it would match _any_ character because '.' is the regular expression code for any character.
      ? means 'optional' - so the actual '.' may or may not be present - if it is present, it will be matched.
      $ means the 'end of the string of text'.
      So putting it together, this means:
      'Replace
      ' ' # a space
      followed by 'INC' or 'CO' # (INC|CO)
      followed by an optional '.' # \\.?
      if all at the end of the string # $
      with
      the empty string # ""
      '
      More info on regular expressions here:
      r4ds.had.co.nz/strings.html#matching-patterns-with-regular-expressions

  • @DM-py7pj
    @DM-py7pj Před 3 lety +3

    2:06 what purpose does ungroup() serve in this case?

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  Před 3 lety +1

      Technically it's not necessary! But I have gotten into the habit of 'ungroup()'ing every time after a group_by() because in other contexts - e.g. when the pipe continues with further mutates, summarizes, etc. - forgetting to ungroup() can result in the wrong outcomes. That's because summarize() by default only peels off the last variable in the group_by(). So I have developed the habit of always ungroup()ing after a group_by(), even when it's not necessary!

    • @DM-py7pj
      @DM-py7pj Před 3 lety

      @@tomhenry-datasciencewithr6047 Thanks. Great explanations. Subscribed. :-)

  • @ahmed007Jaber
    @ahmed007Jaber Před 2 lety

    wow! mate love this one. keep it up
    for tip #18 how would you exclude some columns from this? i actually need to do a similar function to this

  • @AkashMathur-yc9nu
    @AkashMathur-yc9nu Před 3 lety +1

    Power Pack !

  • @clono1984
    @clono1984 Před 4 lety +2

    Hi Tom, do you have the script available for download anywhere? Would love to revisit a few of the tips here. Really like your work. Thanks for sharing!
    -- Juan

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  Před 4 lety +1

      Sure! I've put a link at the end of the description. Here it is: gist.github.com/larsentom/727da01476ad1fe5c066a53cc784417b

    • @clono1984
      @clono1984 Před 4 lety +1

      @@tomhenry-datasciencewithr6047 ahh! can't believe I missed it. Thank you Tom.

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  Před 4 lety

      @@clono1984 Glad you liked the tips! Let me know if you have others to share too!

    • @melissawong4125
      @melissawong4125 Před 2 lety

      Thanks. Great tips! The github link is no longer working. Is there a new link?

  • @alihashemian225
    @alihashemian225 Před 2 lety

    I am having trouble accessing the script. Can someone help me?

  • @heartheart5543
    @heartheart5543 Před 2 lety

    link for the code cannot be accesed: 404

  • @jaritos675
    @jaritos675 Před 4 lety +8

    light RStudio theme not acceptable

  • @educationulx
    @educationulx Před 2 lety

    My data(csv) is about historial heights between both genders of different ages .
    Here , my data contains heights of every years (1986-2019) & Age group > 2 , 8 , 16 , 19 , 22 . Also Male , Female sex . But I just want to select / work with only age 19 (gender male/both) to see their heights between 1986-2019 . How can I do it ? plz let me know .

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  Před 2 lety

      You can work with something like this:
      heights %>%
      filter(year %>% between(1986, 2019)) %>% # year >= 1986, year = 1986 & year % # change to just 'age_group == 19' or 'age_group %in% c(19)' if you want
      ggplot(aes(year, height, color = sex)) +
      geom_line() +
      facet_wrap(~age_group, ncol = 1)
      this assumes your data looks like this:
      year | age_group | sex | height
      2015 | 16 | Female | 150
      etc

    • @tomhenry-datasciencewithr6047
      @tomhenry-datasciencewithr6047  Před 2 lety

      Then try removing / changing parts of this to see the effects!

  • @nkuatedivinely7369
    @nkuatedivinely7369 Před 3 lety +2

    I just started watching some few videos for school purposes, they are great, but u are so fast😅, it will be a great thing if u could speak a little slowly and even repeat some few things, thanks

  • @Jelieto
    @Jelieto Před 3 lety +1

    you sound like daniel ricciardo

  • @JamesJosephMcPherson
    @JamesJosephMcPherson Před 3 lety

    FLIGHTS DOESNT WORK WITH NEW r