Propensity Score Analysis in R with Nearest Neighbor, Optimal Pair, and Optimal Full Matching

Sdílet
Vložit
  • čas přidán 26. 07. 2024
  • For tutoring/consultation services email: statsguidetree@gmail.com
    I offer one-on-one tutoring/consultation services for many topics related statistics/machine learning.
    For tutoring/consultation prices:
    guide-tree-statistics-consult...
    For rcode and dataset: gist.github.com/musa5237
    Tutorial video going over Propensity Score Analysis and Matching in R. General description of Causal Inferences with Propensity Score Analysis and how it compares to Randomized Control Trials is provided. Also reviewed is how to generate Propensity Scores and use the matching methods Nearest Neighbor Matching, Optimal Pair Matching, and Optimal Full Matching (also known as Full Matching) with MatchIt package. In addition, Propensity Score weighted regression models i.e. ANCOVA are reviewed.
    The dataset used is the College Scorecard dataset from the U.S. Department of Education for academic year 2015. The link to the data source is provided below. Download the zip file next to the access URL row. The file used is for the 2014-15 academic year. However, the data may have changed since I last downloaded it.

Komentáře • 41

  • @rockleeroy
    @rockleeroy  Před 11 měsíci +2

    To load the dataset into R use the following code:
    coll

  • @lanredaodu945
    @lanredaodu945 Před dnem

    excellent tutorial i watched 3x

  • @muhammedhadedy4570
    @muhammedhadedy4570 Před rokem +1

    I've watched many tutorials explaining propensity score matching on CZcams, and I can tell that this video is the best I've ever seen.
    Well done, sir. You helped me a lot.
    ❤❤❤❤

  • @basser1995
    @basser1995 Před 2 lety +6

    I am pretty desperate because i need to perform propensity matched analysis, having never used R-statistics, (used SPSS). But 15 minutes into this video i can already tell it's going to be extremely helpful!

    • @rockleeroy
      @rockleeroy  Před 2 lety

      Thank you so much for the compliment.

  • @sanjanakhondaker887
    @sanjanakhondaker887 Před 7 měsíci

    What an amazing explanation!!! Hats off. You even provided the R-script. Super helpful! You saved my thesis, thank you so very much.

  • @analyticspipeline2526
    @analyticspipeline2526 Před 2 lety +1

    Great video, thank you for that!

  • @francyy-ug1qr
    @francyy-ug1qr Před měsícem

    thank you sm!!

  • @fleurestethique
    @fleurestethique Před 2 lety +2

    This was extremely helpful thank you so much!
    When working with subsets, should I calculate the propensity scores on the whole dataset first and then apply them on the subset or directly calculate the propensity scores only for observations in my subset?
    Also, the dataset I am working requires me to incorporate additional weights due to the way they did the sampling. How can I apply both the propensity score and the other weights in my regression? Thank you

    • @rockleeroy
      @rockleeroy  Před 2 lety

      I may need some more information on the nature of the dataset. But, generally, you could calculate PS for the whole dataset. For your other question about weights, not all PS matching methods produce weights. For example, if 1:1 matching without replacement is used, all the weights =1. But, if you are using a PS matching method that does produce weights and you already have a set of weights you need to apply -- there are a few things you can do. The issue is the 'weights' argument in the lm() function only allows you to use a vector. Now you may have a reason depending on the nature of your dataset to not use the whole dataset and consider subsets -- if that makes sense. Or you may want to consider combining the two sets of weights by multiplying; however, you would need to look at the weights produced and see whether they make sense, before carrying out your regression analysis. Ultimately, my suggestions are just general statements, you may want to consult with some other sources (e.g., previous PS analyses using your dataset or a similar dataset, content experts, etc.).

  • @manonkinaupenne2090
    @manonkinaupenne2090 Před rokem +1

    Thank you very much for this clear explanation!
    I have a small question: would you use PSM to match patients to healthy controls in a cross-sectional case-controled study? I want to look at the difference in physical activity expressed in minutes per day (dependent variable) between these two groups.
    thank you!

    • @rockleeroy
      @rockleeroy  Před 11 měsíci

      Yes, PSM should always work when you have a control group.

  • @praveena6095
    @praveena6095 Před 2 lety +1

    Great video. If I want to include in my analysis part some additional covariates which are not used for matching, how can I get it in my data after using match.data.

    • @rockleeroy
      @rockleeroy  Před 2 lety

      If you want to use additional variables in the analysis phase you can enter those additional variables in the final regression model that were not included in the matching process.

  • @amalalkalbani4572
    @amalalkalbani4572 Před 2 lety +2

    Thank you for the comprehansive explanation. I have an issue with my PSA, the variance ratio doesn't appear when I use the summary function. I got dots only! could you please tell me why? Thank you. (All my covariates are categorical & Binary)

    • @fleurestethique
      @fleurestethique Před 2 lety +1

      I had the same problem when I entered my covariates as factors into the formula, but variance ratios appeared once I converted them as.numeric. I don't know what that means in terms of interpretation though

    • @rockleeroy
      @rockleeroy  Před 2 lety +2

      ​ @fleurestethique I noticed that the function to visualize the overrate imbalance love.plot() does not allow for categorical variables. However, you can still inspect the covariate imbalance when you use the summary() function.

  • @festusattah8612
    @festusattah8612 Před rokem +1

    great video!!! what will you advise I do if I have more 'treated than control' and the matching approach to use if treatment is not randomized; take for example a state legislation

    • @rockleeroy
      @rockleeroy  Před rokem

      You can try using K to 1 matching and optimization or you can try full matching. You can run both and compare which gives you better balance across your covariates.

  • @vikasmishra4485
    @vikasmishra4485 Před 2 lety +2

    This video is pretty informative. I have one question.
    In cov balancing plot using cobalt, we need to match both mean and variance stats?
    In my case mean us balanced with in the threshold but variance is not. Can i say that matching is balanced with mean balancing only?

    • @rockleeroy
      @rockleeroy  Před 2 lety

      It is good to have both, I presented only one set of criteria to use but there has been other suggested criteria. Also, recommendations in the literature are always changing. I would try some techniques to see if I get a better balance. But, if I cannot do a better job I would just report in the methods and discussion/limitation. Balancing the covariates will be a big part of the challenge to PS matching.

  • @sharmilibalarajah1940
    @sharmilibalarajah1940 Před 2 lety +2

    Thank you, this was really helpful!
    Do you have any ideas about how I can approach this if I want to match three groups i.e. non-binary??

    • @rockleeroy
      @rockleeroy  Před 2 lety

      I can say that generally PS analyses can be conducted with non-binary treatment groups (i.e., treatment variable with more than 2 levels). But, I do not think the MatchIt package supports it (I could be wrong because it could have been updated). There is another package available if your treatment variable has 3 levels instead of 2 levels called TriMatch. I am not too familiar with the package but here is the general documentation: cran.r-project.org/web/packages/TriMatch/TriMatch.pdf

  • @user-fm6ih6sb6u
    @user-fm6ih6sb6u Před rokem +1

    Thank you for informative video. I did full matching based on your video, and ran comparisons after propensity matching. But, mean, standard deviations and p score did not change at all compared to unmatched data. How can I solve this problem?

    • @rockleeroy
      @rockleeroy  Před rokem

      That is a good question, I assume you are talking about p-values in your final model post matching -- if that is the case, ultimately with PS matching you are attempting to just balance the data between your treatment and control groups to make more reliable interpretations of your final model. It could be that after balancing your data you find no average treatment effect.

  • @alexwisniewski7105
    @alexwisniewski7105 Před 8 měsíci +1

    Do you include both the quadratic and non quadratic terms in your propensity match? For example, if my quadratic term had a lower SDM, should I remove the non quadratic term and just include the quadratic one in my final model?

    • @rockleeroy
      @rockleeroy  Před 8 měsíci

      This depends on your data and the type of relationships you want to capture and what makes sense specifically for the data you are working with. If you have a quadratic term and quadratic term for your explanatory variable in the model, you are saying that the relationship between your response and the explanatory variables is quadratic and linear (i.e., your model captures both), but just keeping the quadratic term you are saying the relationship is just quadratic. Generally, if you want to capture wider scope of relationships you can leave both but be mindful this could lead to overfitting.

  • @SCaRaB6288
    @SCaRaB6288 Před 2 lety +1

    can we use categorical covariates e.g. 1 = male 2 = female or should they be dummy coded? Thank you

    • @rockleeroy
      @rockleeroy  Před 2 lety +1

      Yes. Categorical covariates can be included.

  • @hasanhash12
    @hasanhash12 Před 6 měsíci +1

    Hi, Thank you for video. I loaded dataset coll from the link that you have pinned and then ran the script from identify field names to adjust units for continuous variables. After running it makes all values as NULL in coll and makes coll2 as o obs. of 6 variables. what should i do?

    • @hasanhash12
      @hasanhash12 Před 6 měsíci

      and also at line 136 #no psa, just regression if i run mod_test1

    • @hasanhash12
      @hasanhash12 Před 6 měsíci

      I suppose problem is here at line 22:
      coll

  • @user-iq2qr8lb2y
    @user-iq2qr8lb2y Před 8 měsíci

    I did the the first step (design phase: selecting covariates) but only 3 out of 14 are significant. And I want to know if it is considered balanced or not and what to do.

    • @rockleeroy
      @rockleeroy  Před 8 měsíci

      So if covariates are significant it won't be related to whether the values of those covariates are balanced across treatment conditions. To check balance you have to look at standardized mean difference and/or variance ratios values to see whether they are in some threshold you decide to use.

  • @priyankaroy7243
    @priyankaroy7243 Před rokem +2

    while im installing "MatchIt" it shows "There is no package called MatchIt". How to solve it?

    • @rockleeroy
      @rockleeroy  Před rokem

      Hello, just saw your post. Did you run the code library(MatchIt) first with out running install.packages("MatchIt") I did not install it again because I already installed it before. I kept that line in the code but put the hash sign # first so it was there as a note. Try running it without the hash sign.

    • @priyankaroy3686
      @priyankaroy3686 Před rokem

      @@rockleeroy Yes that's solved. Thanks!

  • @katieweir4166
    @katieweir4166 Před rokem

    The data doesnt work anymore!

    • @rockleeroy
      @rockleeroy  Před 11 měsíci

      My apology for the delayed response, you can use the following code to load it into r: coll

  • @maddybond007
    @maddybond007 Před 2 lety +1

    Please validate if this link has same data, which you have posted initially, since your link is no more accessible:
    LINK: ed-public-download.app.cloud.gov/downloads/CollegeScorecard_Raw_Data_04262022.zip

    • @rockleeroy
      @rockleeroy  Před 2 lety +1

      I will try to find a way to load the dataset on my GitHub. But, until then, I can email it you. Just send me an email at statsguidetree@gmail.com