Modeling crop yields with tidy data principles

Sdílet
Vložit
  • čas přidán 1. 09. 2020
  • Use the tidyverse and tidymodels to build many models at once and understand change in #TidyTuesday crop yields over time. Check out the code here on my blog: juliasilge.com/blog/crop-yields/
  • Věda a technologie

Komentáře • 30

  • @Kosa884
    @Kosa884 Před 3 lety +1

    Thank you! I wait impatiently for every episode!

  • @jansenai6764
    @jansenai6764 Před 3 lety

    Wow this is a very appealing way of explaining the rate of change of any feature over time. Superb work! Appreciate it!

  • @user-ld6rv4gu2t
    @user-ld6rv4gu2t Před 3 lety +1

    Thank you so much for these valuable lectures!

  • @robalan9975
    @robalan9975 Před 3 lety

    These are always fantastic to watch. Thank you.

  • @THEDRAWINGSTUDIO1
    @THEDRAWINGSTUDIO1 Před 2 lety

    This is very useful for my portfolio. I am an agronomist seeking to leverage my current biology, biometry and calculus skills with data science to pursuit a career in plant breeding

  • @brynhumberstone
    @brynhumberstone Před 3 lety

    These videos are extremely helpful; thanks for posting them! There are two shortcuts in RStudio that I think you might like if you didn't already know about them. Alt-hyphen (for %). Apologies if this is patronising and you did already know about them!

  • @FieldsDynamic
    @FieldsDynamic Před 3 lety

    Thanks for the screencast! pretty nice and helpful... best

  • @davidjackson7675
    @davidjackson7675 Před 3 lety

    Another helpful video.

  • @Ilproff77
    @Ilproff77 Před 3 lety

    Hi Julia, very nice videos. I really appreciate your effort for the R community and all the valuable materials you produced (like the tidymodel course). I wonder if you could make some video on how to save and re-use model across different script/notebook.

  • @mohsinramay
    @mohsinramay Před 3 lety

    Great presentation Julia and also thanks for sharing the code. Do you have any plans to give us demo on Functional Data Analysis using tidy functions?

  • @JamesLee1
    @JamesLee1 Před 3 lety +3

    Thank you for the video and blog! Do you mind letting us know if you self-taught data science with R? Your background is in astrophysics but you became a full blown data scientist. I work with a lot of physics PhDs but not all of them have good coding/scripting skills.

    • @JuliaSilge
      @JuliaSilge  Před 3 lety +10

      During my time in astrophysics, I worked with a lot of real-world messy data (not in R, but with different coding languages) so I do have a good bit of background in data munging, cleaning, plotting, etc. When I transitioned to data science, I used lots of books and courses to update my skills, learn R and a bit of Python (very rusty these days!), and dig into modern machine learning. I talk a bit about that process here: ropensci.org/blog/2018/06/08/rprofile-julia-silge/

  • @zegpi1821
    @zegpi1821 Před 3 lety

    Excellent as always! I have a ggplot question: how would you make the y_log_scale less radical? Trying to show p-value = c(.001, .01, .05, .10,).

    • @JuliaSilge
      @JuliaSilge  Před 3 lety +2

      It doesn't work great for this particular dataset, but you can control the scaling with the arguments to the scale function, like: scale_y_log10(limits = c(0.001, 0.1), breaks = c(0.001, 0.01, 0.05, 0.1))

  • @lukemiller8976
    @lukemiller8976 Před 3 lety

    Great video, and a brilliant introduction to using tidy principles in R. I have a question on the use of adjusted p-values in the video. Is this adjustment still required given that each p-vaue is calculated based on a different subset of the data? In other words the data used to estimate the time coefficient for each nested country/crop combo is independent of the data used elsewhere in other nesta.
    Thanks

    • @JuliaSilge
      @JuliaSilge  Před 3 lety

      That's a great question, but not one that I understand to have a clear cut answer in terms of "good" statistical practice. It probably does depend a lot on whether we want to make a single claim about the overall relationship between time and crop yield.

  • @transportation-talk
    @transportation-talk Před 3 lety

    This was a very useful tutorial. Thank you. Quick question: I noticed that you didn't use 'group_by()' or 'group_nest'. What is the difference between group_nest and nest?

    • @JuliaSilge
      @JuliaSilge  Před 3 lety +1

      I think the main difference is that group_nest() expects already grouped data; it can be helpful if certain situations like when you need to apply some operation to groups and then nest: dplyr.tidyverse.org/reference/group_nest.html

    • @transportation-talk
      @transportation-talk Před 3 lety

      @@JuliaSilge Thank you.

  • @mathewvarghese7471
    @mathewvarghese7471 Před 3 lety

    Fantastic video. This might sound trivial, but what RStudio theme do you use? It didn't strain my eyes at all.

    • @JuliaSilge
      @JuliaSilge  Před 3 lety

      It's one of the ones available through rsthemes, I believe Oceanic Plus! github.com/gadenbuie/rsthemes

    • @mathewvarghese7471
      @mathewvarghese7471 Před 3 lety

      Julia Silge thanks a ton!

  • @sitendugoswami1990
    @sitendugoswami1990 Před 3 lety

    Very nice video, but can we really use p-values as the surrogate of effect sizes with variable sample numbers?

    • @JuliaSilge
      @JuliaSilge  Před 3 lety

      No, we definitely do not want to confuse what a p-value measures with what an effect size measures. I show how to plot both (and explain a bit) in this blog post: juliasilge.com/blog/crop-yields/

  • @ronit8067
    @ronit8067 Před 3 lety

    hey Julia I am having a really hard time understand the flow of R. I have been working with Py till now and I usually take OOP or a close to OOP using functions kinda approach. I have been trying to understand R by following these kinds of projects from the #tidytuesday community. Any advice would be much appreciated!

    • @JuliaSilge
      @JuliaSilge  Před 3 lety +1

      Most R programming for data analysis is not based on an OOP approach, but more on a Lisp-like, functional programming approach. It might help to realize that there are some fundamental differences in how R overall approaches data analysis; IMO these differences are a strength! A resource that might help you understand an R-like take on data analysis is R for Data Science: r4ds.had.co.nz/

  • @alexandroskatsiferis
    @alexandroskatsiferis Před 3 lety +3

    Amazing video, but for your statistician friends the p-value phrase should be finished with the following 'assuming the null hypothesis states true'!

  • @bnouadam
    @bnouadam Před 3 lety

    Poor modeling