Combining Random Forests and GLMs in R

Sdílet
Vložit
  • čas přidán 23. 02. 2021
  • Learning objectives:
    General strategy for RF
    How to model RF in R
    How to Compute importance/OOB
    How to visualize RF in flexplot
    Different uses of RF
    Here's the link to the paper I referenced: psyarxiv.com/ebsmr

Komentáře • 21

  • @decimus6
    @decimus6 Před rokem +3

    Thanks a lot for this video ! It's awsome as usual. A question rised in my mind : RF are good to select predictors. Yet how does it tackle multicollinearity ?
    Is it possible that two or three good predictors are multicollinear?
    Thanks a lot.

  • @mohamedrefaat197
    @mohamedrefaat197 Před 3 lety +1

    Thanks for the quality content! I wonder what you mean by transportable at the beginning?

    • @QuantPsych
      @QuantPsych  Před 3 lety +1

      Have you checked out this video? czcams.com/video/VqKExZG1caI/video.html I believe that explains what "transportable" means.

  • @OnLyhereAlone
    @OnLyhereAlone Před rokem +1

    Very informative as usual. The case for linear mixed models (LMM) brought me to your channel. Question; could random forest be used to determine variables to include in an LMM too? Thanks again for the great work you do.

    • @QuantPsych
      @QuantPsych  Před rokem

      Yes. There was a paper I saw recently that builds random forest atop mixed models: onlinelibrary.wiley.com/doi/abs/10.1002/sam.11505
      Alternatively, I've averaged the scores within cluster, used RF to find variables, then used mixed models on those variables.

  • @gimanibe
    @gimanibe Před 3 lety +1

    Thanks for the videos you make. I learn a lot! Are this R script available somewhere?

    • @QuantPsych
      @QuantPsych  Před 2 lety

      The code in the video should work. If I find time, I'll put them in the description.

  • @nikidiogou4203
    @nikidiogou4203 Před 6 měsíci

    Another very useful video for stats, thank you for all of them! The estimates from the rf using flexplot seem not to align with the variable importance score (vi). Shouldn't we have the same ranking of variables when we look at the estimates and when we look at the vi?

  • @tatjanajak
    @tatjanajak Před 2 lety +1

    cforest() from party package takes a loooong time. But, when I try to use result of the randomForest() from randomForest package within the estimates(), I get the following error: Error in x$r.squared : $ operator is invalid for atomic vectors. I guess that the results of these two functions are different and only cforest() can be used within flexplot functions. I hope in the future you will introduce randomForest() into this whole process. I think it's worth it because cforest is just too memory/time consuming.

  • @francisolsson9728
    @francisolsson9728 Před 2 lety

    Can you use random forests using categorical and numeric variables?

  • @christoph3933
    @christoph3933 Před 6 měsíci

    What do you do in case of missing values? Do you recommend doing Multiple Imputation before?

  • @scottnelson7841
    @scottnelson7841 Před rokem

    no matter how many times I load the package and Library, I get this error message: Error in variable_dropout(explained_rf, type = "raw") :
    could not find function "variable_dropout". Any help?

  • @tatjanajak
    @tatjanajak Před 2 lety +1

    @QuantPsych it seems you did not use GLMs but rather standard lm.

    • @QuantPsych
      @QuantPsych  Před 2 lety

      Possibly. I haven't watched this video for a while :) But I use to use GLM to refer to general linear models and GLIM to refer to general*ized* linear models. I switched the notation somewhat recently. I might have meant to refer to LMs instead of GLMs.

    • @tatjanajak
      @tatjanajak Před 2 lety

      @@QuantPsych at 20:58 is where I believe the error occurs. It is really just a small mistake. It is really not a big deal, but the whole presentation is awesome as usual and the only thing to do is to say at 20:58 "I ment glm instead of lm".

    • @woosterjeeves
      @woosterjeeves Před rokem +1

      @@tatjanajak GLM is variously used to refer to General Linear Model (which is done using lm() in R), or the Generalized Linear Model, which is what you are referring to (which is done using glm() in R). So he is talking of General Linear Model. Hence not a "mistake".

    • @tatjanajak
      @tatjanajak Před rokem

      @woosterjeeves thanks.

  • @Martyr022
    @Martyr022 Před 3 lety +1

    still waiting on that paper!

    • @QuantPsych
      @QuantPsych  Před 2 lety

      Here it is! psyarxiv.com/ebsmr

    • @Martyr022
      @Martyr022 Před 2 lety

      @@QuantPsych Huzzah! Thank you! Your channel has been super helpful!