Modeling crop yields with tidy data principles
Vložit
- čas přidán 1. 09. 2020
- Use the tidyverse and tidymodels to build many models at once and understand change in #TidyTuesday crop yields over time. Check out the code here on my blog: juliasilge.com/blog/crop-yields/
- Věda a technologie
Thank you! I wait impatiently for every episode!
Wow this is a very appealing way of explaining the rate of change of any feature over time. Superb work! Appreciate it!
Thank you so much for these valuable lectures!
These are always fantastic to watch. Thank you.
This is very useful for my portfolio. I am an agronomist seeking to leverage my current biology, biometry and calculus skills with data science to pursuit a career in plant breeding
These videos are extremely helpful; thanks for posting them! There are two shortcuts in RStudio that I think you might like if you didn't already know about them. Alt-hyphen (for %). Apologies if this is patronising and you did already know about them!
Thanks for the screencast! pretty nice and helpful... best
Another helpful video.
Hi Julia, very nice videos. I really appreciate your effort for the R community and all the valuable materials you produced (like the tidymodel course). I wonder if you could make some video on how to save and re-use model across different script/notebook.
Great presentation Julia and also thanks for sharing the code. Do you have any plans to give us demo on Functional Data Analysis using tidy functions?
Thank you for the video and blog! Do you mind letting us know if you self-taught data science with R? Your background is in astrophysics but you became a full blown data scientist. I work with a lot of physics PhDs but not all of them have good coding/scripting skills.
During my time in astrophysics, I worked with a lot of real-world messy data (not in R, but with different coding languages) so I do have a good bit of background in data munging, cleaning, plotting, etc. When I transitioned to data science, I used lots of books and courses to update my skills, learn R and a bit of Python (very rusty these days!), and dig into modern machine learning. I talk a bit about that process here: ropensci.org/blog/2018/06/08/rprofile-julia-silge/
Excellent as always! I have a ggplot question: how would you make the y_log_scale less radical? Trying to show p-value = c(.001, .01, .05, .10,).
It doesn't work great for this particular dataset, but you can control the scaling with the arguments to the scale function, like: scale_y_log10(limits = c(0.001, 0.1), breaks = c(0.001, 0.01, 0.05, 0.1))
Great video, and a brilliant introduction to using tidy principles in R. I have a question on the use of adjusted p-values in the video. Is this adjustment still required given that each p-vaue is calculated based on a different subset of the data? In other words the data used to estimate the time coefficient for each nested country/crop combo is independent of the data used elsewhere in other nesta.
Thanks
That's a great question, but not one that I understand to have a clear cut answer in terms of "good" statistical practice. It probably does depend a lot on whether we want to make a single claim about the overall relationship between time and crop yield.
This was a very useful tutorial. Thank you. Quick question: I noticed that you didn't use 'group_by()' or 'group_nest'. What is the difference between group_nest and nest?
I think the main difference is that group_nest() expects already grouped data; it can be helpful if certain situations like when you need to apply some operation to groups and then nest: dplyr.tidyverse.org/reference/group_nest.html
@@JuliaSilge Thank you.
Fantastic video. This might sound trivial, but what RStudio theme do you use? It didn't strain my eyes at all.
It's one of the ones available through rsthemes, I believe Oceanic Plus! github.com/gadenbuie/rsthemes
Julia Silge thanks a ton!
Very nice video, but can we really use p-values as the surrogate of effect sizes with variable sample numbers?
No, we definitely do not want to confuse what a p-value measures with what an effect size measures. I show how to plot both (and explain a bit) in this blog post: juliasilge.com/blog/crop-yields/
hey Julia I am having a really hard time understand the flow of R. I have been working with Py till now and I usually take OOP or a close to OOP using functions kinda approach. I have been trying to understand R by following these kinds of projects from the #tidytuesday community. Any advice would be much appreciated!
Most R programming for data analysis is not based on an OOP approach, but more on a Lisp-like, functional programming approach. It might help to realize that there are some fundamental differences in how R overall approaches data analysis; IMO these differences are a strength! A resource that might help you understand an R-like take on data analysis is R for Data Science: r4ds.had.co.nz/
Amazing video, but for your statistician friends the p-value phrase should be finished with the following 'assuming the null hypothesis states true'!
Very true; thanks!
@@JuliaSilge Looking forward for the next video 😊😊
Poor modeling