Video není dostupné.
Omlouváme se.

Predict injuries for Chicago traffic crashes with tidymodels

Sdílet
Vložit
  • čas přidán 18. 08. 2024

Komentáře • 21

  • @avnavcgm
    @avnavcgm Před 3 lety +5

    Thank you yet again for this exceptional material Ms. Silge.

  • @sidharthadaggubati438
    @sidharthadaggubati438 Před 3 lety +1

    This channel deserves more views. High quality content. Thank you Julia

  • @brendenmorley2643
    @brendenmorley2643 Před 3 lety +1

    Once again your tutorial is sooo insightful. My r knowledge continues to explode, due to your time and work.

  • @hesamseraj
    @hesamseraj Před 3 lety +1

    Once again thank you very much Julia. I watched and worked the coding of all your videos and will be following you as long as you share these fantastic videos.

  • @HamJeong
    @HamJeong Před 3 lety +3

    Incredibly useful, thanks so much, I really learn a lot from you sharing like this!

  • @ochiwar
    @ochiwar Před 3 lety +3

    Another Excellent tutorial! I love your plot theme/aesthetics. Will it be possible for you to share your ggplot template theme? Thanks!

    • @JuliaSilge
      @JuliaSilge  Před 3 lety +4

      I have it in a little personal package here -- theme_plex(): github.com/juliasilge/silgelib
      But there are some very similar themes in the hrbrthemes package (the one that uses IBM Plex):
      cinc.rud.is/web/packages/hrbrthemes/

  • @mattm9069
    @mattm9069 Před 3 lety +2

    thanks Julia!!!

  • @datasciencenerd3263
    @datasciencenerd3263 Před 3 lety +1

    I learn a lot from you thank you.

  • @prod.kashkari3075
    @prod.kashkari3075 Před 3 lety +2

    Hello Julia! Thanks so much for these tutorials and your book on tidymodels, I’m a undergrad who wanted to learn machine learning in R and you had great resources to help me get started. A few things I wanted to ask you about tidymodels based on what I’ve noticed recently when working with it.
    1. I’ve been getting errors when trying to call the tune_grid() function, I have all my workflows setup, my recipe, I even prep and bake it to check to make sure it’s good, I create my cross validation folds and tuning grids yet when I call tune_model and pass in my workflow, resamples, and grid, it says that my models have failed, do you know what the source of this could be? It says on every fold that something failed. Also it is very slow and tends to freeze.
    2. When I try and fit with my workflow object, by calling fit(), I get a message which says “error could not find fit function from workflow” so I solved the problem by attaching the parsnip:: in front of it and it worked fine, but this error came up one day randomly when I never experienced it the day before.
    These issues I’m sure are because tidymodels is so new and in development.
    Also as a request could you make more videos on the stacks package as well with building ensemble learners in tidymodels?
    Thanks!

    • @JuliaSilge
      @JuliaSilge  Před 3 lety +1

      In general, I'd recommend making sure your packages are up to date with the latest CRAN versions. If you can create a reprex with your problem and post on RStudio Community, we are happy to help find the solution:
      rstd.io/tidymodels-community

  • @terrencerussell1999
    @terrencerussell1999 Před 3 lety +1

    Hey Julia! Great stuff again here as always. I look forward to each one of your posts and follow along in R.
    When doing this one with my own Canadian Lat/longs I don't produce a map like yours did in Chicago is that a limit of the function for Canada coordinates? or am I missing something?

    • @JuliaSilge
      @JuliaSilge  Před 3 lety +1

      Hmmmmm, I haven't looked at data from Canada so I can't say for sure. If you can put together a small, self-contained reprex demonstrating the issue and post on RStudio Community, I bet folks will be eager to help. There is even a spatial tag where you can get interested folks to see: community.rstudio.com/tag/spatial

    • @terrencerussell1999
      @terrencerussell1999 Před 3 lety

      @@JuliaSilge Ok great will do! Thanks again

  • @UndecidedFellow
    @UndecidedFellow Před 3 lety +1

    Thank you for the video Dr Silge! Quick question, how are `bag_tree()` and `vfold_cv()` functions accounting for the time series nature in the data? I'm reading the documentation and it looks like your current pipeline treats the dates as non ordinal and categorical, using the dates as factors with line `step_date(crash_date) %>%`. Is my reading correct? In short, why did you choose `vfold_cv()` over `rolling_origin()` and how is seasonality/autocorrelation modeled in your pipeline?

    • @JuliaSilge
      @JuliaSilge  Před 3 lety +2

      So this isn't time series in the sense that I want to predict the next crash(es). Instead it is a classification model where some of the predictors are date features. You can look at another example of this kind of model here: www.tidymodels.org/start/recipes/

  • @mattm9069
    @mattm9069 Před 3 lety

    Julia, can you please elaborate on what step_downsample() does once we get to the resampling steps? I wanted to see what I would get out of this code:
    train_preprocessed %
    prep(crash_train) %>%
    juice()
    I get a balanced dataset of the outcome variable, and it has ~45,000 rows. Yet, one cross validation fold has 138,000 rows for analysis. So, I want to understand what's happening conceptually. I've seen other people build the recipe from the original dataset, but we use the training set
    (i.e. recipe(injuries ~ ., data = crash_train))

    • @JuliaSilge
      @JuliaSilge  Před 3 lety +1

      Reading this section might help clear some things up for you: www.tmwr.org/recipes.html#skip-equals-true
      As well as the section a little bit further about row sampling steps like downsampling.
      A subsampling step like `step_downsample()` will downsample the analysis set of a CV fold but not the assessment set.