Ordinal Logistic Regression or Proportional Odds Logistic Regression with R

Sdílet
Vložit
  • čas přidán 21. 07. 2024
  • R file: drive.google.com/file/d/1B8lp...
    TIMESTAMPS
    00:00 Ordinal Logistic Regression with R
    00:06 Read Data
    02:36 Partition Datasets
    03:28 Ordinal Logistic Regression Model
    05:47 Calculating p-values
    07:25 Prediction
    09:08 Equations for Calculating the Probabilities
    12:29 Model Building with all Variables
    16:18 Confusion Matrix for Training Dataset
    17:12 Confusion Matrix for Test Dataset
    Time-Series videos: goo.gl/FLztxt
    Machine Learning videos: goo.gl/WHHqWP
    Becoming Data Scientist: goo.gl/JWyyQc
    Introductory R Videos: goo.gl/NZ55SJ
    Deep Learning with TensorFlow: goo.gl/5VtSuC
    Image Analysis & Classification: goo.gl/Md3fMi
    Text mining: goo.gl/7FJGmd
    Data Visualization: goo.gl/Q7Q2A8
    Playlist: goo.gl/iwbhnE
    R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Komentáře • 131

  • @gnomzb5070
    @gnomzb5070 Před 5 lety +2

    while I was looking for an example project on ordered logit model in R, I came across with this superb video. Thanks a lot, Bharatendra!

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @flamboyantperson5936
    @flamboyantperson5936 Před 6 lety +2

    Really great tutorial. Thank you Sir.

  • @gabriellamartinez7985
    @gabriellamartinez7985 Před 2 lety

    Hello thank you for this video, its been super helpful!
    I have a question regarding the dependent variables. How would you interpret the polr function output for dependent variables that are factors? For example, Tendency (levels: -1,0,1) was used as a dependent variable, how would you interpret each of the coefficients?

  • @hermanhyde7000
    @hermanhyde7000 Před 7 lety +4

    Absolute genius. I would pay a million bucks to be your student.

  • @datascience1274
    @datascience1274 Před 2 lety

    Hello Professor. Great lesson. Quick question. I was wondering if we could have used as.ordered(data$Tendency) instead of as.factor. Can you please share some light about this? Thanks a lot in advance

  • @euphorockz
    @euphorockz Před 4 lety +3

    This video really helps alot for my project! Thank you!!!!!

    • @bkrai
      @bkrai  Před 4 lety +1

      Thanks for the feedback!

  • @jc.nogueira
    @jc.nogueira Před 2 lety +1

    Great video! Many thanks for sharing this wonderful material. I will subscribe to your channel.
    Greetings from Uruguay, South America!
    All the best,
    jc

    • @bkrai
      @bkrai  Před 2 lety +1

      Thanks and welcome!

  • @victorhenostroza1871
    @victorhenostroza1871 Před 4 lety +1

    Thank you so much
    for this contribution...congratulations from Peru

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for comments!

  • @yubarsubedi2781
    @yubarsubedi2781 Před 3 lety

    Hello Sir, Thank you so much for this tutorial. I leaned a lot. However, I encountered a problem. When I ran the summary commend, I encountered ..Error in svd(X) : infinite or missing values in 'x'.. message. how to fix this problem.

  • @dr.bheemsainik4316
    @dr.bheemsainik4316 Před 2 lety

    Hi Sir... can you please explain the Ordered Probit model for the same data with a tendency with 3 levels as the dependent variable?

  • @hayonimengi4171
    @hayonimengi4171 Před 5 lety

    How would you interpret the predicted probabilities from a reference category of a categorical predictor? In other words I’m trying to present the probabilities which I get in my model however I’m confronted with my reference category and hence what would be the best way to derive these? Thanks

  • @aadvikpanda3339
    @aadvikpanda3339 Před 7 lety

    Hello Sir ,
    Great video.
    I did not get the way you calculated probability from the t-stat using this formula
    pnorm(abs(ctable[ ,"t value"]),lower.tail=FALSE)*2 .Could you please explain each term you have used in this formula and why?

  • @user-mo4gb2xb2h
    @user-mo4gb2xb2h Před 2 lety +1

    Thank you so much!! This video are extremly helpful and clear!!!

    • @bkrai
      @bkrai  Před 2 lety

      You're so welcome!

  • @wasafisafi612
    @wasafisafi612 Před 2 lety +1

    Big thanks for your video. It helps a lot

    • @bkrai
      @bkrai  Před 2 lety

      You are welcome!

  • @1612kanika
    @1612kanika Před 6 lety

    how to calculate bias and variance for ordinal.

  • @MKmadhurima
    @MKmadhurima Před 2 lety

    Is there any way to do a ordinal logistics regression for panel Data?

  • @alainataylor4181
    @alainataylor4181 Před rokem

    is there a way to add nested effects into the model???

  • @abhishekbansal5182
    @abhishekbansal5182 Před 4 lety

    Thanks for making this video its very helpful for us
    Plz sir can you explain how we get alpha values for categories. is there any formula to calculate tha alpha (@) plz explain it

  • @Sandra-tq6yb
    @Sandra-tq6yb Před 2 lety +1

    Very helpful video. Thank you very much!

    • @bkrai
      @bkrai  Před 2 lety +1

      You're welcome!

  • @DAMGood73
    @DAMGood73 Před 5 lety +1

    Perfect, thanks for sharing!

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @internetjunkie247
    @internetjunkie247 Před 3 lety

    Thanks for the video. To calculate probabilities, why did you use alpha-b1x1+.... and not the conventional alpha+b1x1+... It seems different software uses different form of the equations (?) I believe, it its the former in R, perhaps SPSS too.

  • @88MSRobby
    @88MSRobby Před 7 lety +2

    Very good video!

  • @WillIsGoodAtStatistics
    @WillIsGoodAtStatistics Před 4 lety +1

    Excellent video. Thank you

    • @bkrai
      @bkrai  Před 4 lety

      You are welcome!

  • @leliaglass1568
    @leliaglass1568 Před 5 lety +1

    thanks for the video! very helpful

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @shoumicshahid9315
    @shoumicshahid9315 Před 4 lety +1

    Hello Professor, how can I rank the significant variables from an ordinal logit model? I previously performed dominance analysis on the binary logit model but in case of an ordinal logit model that seems inappropriate.

    • @bkrai
      @bkrai  Před 3 lety

      One way could be to use p-value.

  • @fileniaantoniou8649
    @fileniaantoniou8649 Před 5 lety +1

    Hello and great video!
    Would you suggest this model for modelling the results of a football game where the points earned in the end are 0,1 or 3?

    • @bkrai
      @bkrai  Před 5 lety

      Yes, it should work for such data.

  • @mangaikalai82
    @mangaikalai82 Před 4 lety +1

    Sir, This video was helpful. Can you make a video on Brant test for proportional odds assumption?

  • @ganneesh
    @ganneesh Před 6 lety +1

    Indeed, its a great video on Ordinal Logistic regression. Thanks professor, I am trying to create a model for my data set. i am facing an issue. When i ran predict command for my training data set, i am getting probability as very small value (summation of the probability is not equal to one). what could be the reason?

    • @bkrai
      @bkrai  Před 3 lety

      Seeing this today. Probably resolved by now.

  • @miccoligno1
    @miccoligno1 Před 5 lety +1

    Hi Bharatendra, my respond variable is the score of a likert scale from 0 the worst condition to 4 the best. Should I use the function as.order? if yes, I should I keep the 4 as the best condition and the zero as the worst? Thanks

    • @bkrai
      @bkrai  Před 5 lety

      Yes, that would work fine.

  • @smitagupta1771
    @smitagupta1771 Před 5 lety +1

    what should be the change in Input file , if the independent variables have 3-4 level of ordinal category ? Should the independent variable be marked at 1,2,3,4 and then converted to ordinal factor like you did for NSP ?

    • @bkrai
      @bkrai  Před 5 lety

      You can use ordered() for independent ordinal variable. Some researchers also recommend changing then to numeric variable as it leads to much simpler model.

  • @nageshgoud4266
    @nageshgoud4266 Před 6 lety

    Hi Sir, It's a nice video, I always follow you other videos, they are very good.
    I am running the ordinal LR on my own data i.e., insurance to find the EMlevel and this dependent variable contains 6 levels i.e., 1,2,3....6. So as per your instructions I converted EMlevel variable to ordered and str is appearing as "EMLevel : Ord.factor w/ 6 levels "1"

  • @parthshah9451
    @parthshah9451 Před 4 lety +1

    Great Video Dr. Rai, Could you also help for Partial Proportional Odds Model

    • @bkrai
      @bkrai  Před 4 lety

      Thanks, I've added it to my list.

  • @lauualb
    @lauualb Před 7 lety

    hi sir, how do you know the variable Max is causing the warning?

    • @bkrai
      @bkrai  Před 7 lety

      +lauualb it was based on trial and error.

  • @taniamendoza9247
    @taniamendoza9247 Před 4 lety +1

    Dr, Thanks a lot for your example, but could you help me with a question, Which is the differece between clm and polr, becasue i was traying to use polr in financial rates to stimated rating, but your when i use this waring
    Warning message:
    glm.fit: fitted probabilities numerically 0 or 1 occurred
    But if i use clm dont happend that
    Could you help me to undertand this 2 functions
    Thanks a lot
    Regards from Ecuador

    • @bkrai
      @bkrai  Před 4 lety

      Note that warning messages in R are ok. It's not an error.

  • @khadijabenmoussa8064
    @khadijabenmoussa8064 Před 4 lety +1

    Hello, Thanks a lot for your video it is very helpful. Could you pelase explain what s the meaning of the confusion matrix error. Also please, how can we compute the R square of our model

    • @bkrai
      @bkrai  Před 4 lety +1

      For confusion matrix you may refer to:
      czcams.com/play/PL34t5iLfZddvv-L5iFFpd_P1jy_7ElWMG.html
      Also note that when response is a factor variable, we do not use R-square.

  • @AddisuYohannes-h8p
    @AddisuYohannes-h8p Před 13 dny

    Consider my dependent variable is Anaemia status thesis on "mixed effect ordinal logistic regression"1. How can I obtain table on percentage of anaemia status by region in R software?
    2. How can I obtain table on prevalence of anaemia status by predictors for anaemia among reproductive age of women in R software?
    3. How can I obtain table on Adjusted odds ratio(AOR) and 95%CI of adjusted odds ratios(AOR) for mixed effect ordinal logistic regression in R software?

  • @sunilbobb
    @sunilbobb Před 6 lety +1

    sir - can u show how to do we interpet abalone data from kaggle or UCI

    • @bkrai
      @bkrai  Před 4 lety

      I saw this today, hope it's taken care of.

  • @Astronoom
    @Astronoom Před 4 lety +1

    Is this approach equal to the CatReg function in SPSS with ranking?

    • @bkrai
      @bkrai  Před 4 lety +1

      I've not checked it in SPSS. But I guess results should be same.

  • @subashghimire1604
    @subashghimire1604 Před 7 lety +2

    Do you have any tutorial for goodness of fit test for ordinal logistic regression?

    • @tariqawanish
      @tariqawanish Před 6 lety +1

      is goodness for fit test is ap
      plied in stata

    • @bkrai
      @bkrai  Před 4 lety

      It already includes test of significance.

  • @Pinky-pb6od
    @Pinky-pb6od Před 6 lety +1

    Hi sir. Can u please code support vector learning with ordinal regression

    • @bkrai
      @bkrai  Před 6 lety

      Thanks for the suggestion, I'm adding it to my list for future.

  • @AymanTurkistani
    @AymanTurkistani Před 2 lety +1

    Thank you!

    • @bkrai
      @bkrai  Před 2 lety

      You are welcome!

  • @drkim2
    @drkim2 Před 6 lety +1

    excellent

  • @mdtanimhasan3312
    @mdtanimhasan3312 Před 3 lety +1

    The video is really helpful.
    I am struggling to see the dependent variable's factors outcome combined by or |
    Could anyone please explain?
    TIA

    • @bkrai
      @bkrai  Před 3 lety

      1 | 2 means level-1 given level-2, and 2 | 3 means level-2 given level-3.

    • @mdtanimhasan3312
      @mdtanimhasan3312 Před 3 lety

      Dr. Bharatendra
      Could you please explain how to interpret the outcome of the dependent variable combined with |
      For example here is the summary and p-value of my model, I am struggling to interpreter the dependent variable outcome, TIA.
      Coefficients:
      Value Std. Error t value
      H 0.10955 0.06687 1.6381
      AGR 0.05929 0.06825 0.8687
      NP2 -1.00909 0.30407 -3.3186
      NP3 -1.69956 0.40289 -4.2184
      NP4 -0.28106 0.44589 -0.6303
      Intercepts:
      Value Std. Error t value
      1|2 -1.1571 0.6301 -1.8363
      2|3 -0.0505 0.6090 -0.0829
      3|4 0.9036 0.6022 1.5005
      4|5 2.2627 0.7164 3.1584
      5|6 5.1148 1.5859 3.2253
      6|7 16.5213 9.1049 1.8145
      Residual Deviance: 631.3888
      AIC: 653.3888
      Value Std. Error t value p-value
      H 0.10954539 0.06687426 1.6380799 0.1014
      AGR 0.05928751 0.06825109 0.8686676 0.3850
      NP2 -1.00909459 0.30407139 -3.3186107 0.0009
      NP3 -1.69956102 0.40288860 -4.2184390 0.0000
      NP4 -0.28105858 0.44589078 -0.6303306 0.5285
      1|2 -1.15712803 0.63014735 -1.8362817 0.0663
      2|3 -0.05048673 0.60902379 -0.0828978 0.9339
      3|4 0.90356996 0.60219631 1.5004575 0.1335
      4|5 2.26273192 0.71641548 3.1584073 0.0016
      5|6 5.11484231 1.58585762 3.2252847 0.0013
      6|7 16.52126027 9.10488998 1.8145480 0.0696

  • @nimeshcheedella8124
    @nimeshcheedella8124 Před 6 lety +1

    Sir , very nicely explained. I tried with my data by following your vedio step by step. But one issue. I have a data independent variables are also ordinal in nature . I made into categorical is it correct? which regression you suggest to predict a ordinal variable and independent variable also ordinal.?

    • @bkrai
      @bkrai  Před 3 lety +1

      The method depends on the dependent variable and not much on the independent variable.

  • @subashghimire1604
    @subashghimire1604 Před 7 lety

    Hello, could you please tell me how did you get equations for probability, at 9:31/19:21 in above video

    • @bkrai
      @bkrai  Před 7 lety

      It is similar to steps shown in the link below at 4:13,
      czcams.com/video/fDjKa7yWk1U/video.html

    • @yujiaoli947
      @yujiaoli947 Před 6 lety

      I have the same question. Only z-statistics' p-value can be calculated by pnorm() while hereby it is t-statistic.

  • @landersebastian7886
    @landersebastian7886 Před rokem +1

    good day professor how can I use Ordinal Logistic regression with bmi

    • @bkrai
      @bkrai  Před rokem

      See if this research paper helps:
      www.researchgate.net/publication/260273192_Does_Consumer_Behaviour_on_Meat_Consumption_Increase_Obesity_-_Empirical_Evidence_from_European_Countries

  • @nicolasaguirre8170
    @nicolasaguirre8170 Před 4 lety +1

    how can i fit a model with ordinal response without proportional odds?

    • @bkrai
      @bkrai  Před 4 lety

      You can try this:
      czcams.com/video/dJclNIN-TPo/video.html

  • @hayonimengi4171
    @hayonimengi4171 Před 5 lety +1

    Superb!!!!

  • @dearcollynn3498
    @dearcollynn3498 Před 7 lety

    Hello, thank you for your great video. I have a question. Is AIC important here? Isn't AIC here big for the model since it is larger than 1000 already?

    • @bkrai
      @bkrai  Před 7 lety

      Yes it is high. In the same example when we made a model with three variables, it was over 1700. By adding more variables it came down to about 1038, which is a significant improvement.

    • @Astronoom
      @Astronoom Před 4 lety

      When you add more variables the AIC goes down, but then you select variables which have a significant level >0.1 and the AIC goes back up, isn’t it? Wouldn’t you use the model with the lowest AIC, and if not why use the AIC at all? Can I compare models with the AIC as well when in some models variables are log transformed as in others they are not log transformed?

  • @kaapiglass
    @kaapiglass Před 4 lety +1

    I'm getting this kind of error do you know what this mean?
    Warning message:
    In polr(AccessOnlineRecord ~ ., trainHint, Hess = TRUE) :
    design appears to be rank-deficient, so dropping some coefs..........

    • @bkrai
      @bkrai  Před 4 lety

      It is just a warning message, not an error.

  • @alfredkik3675
    @alfredkik3675 Před 3 lety +1

    Excellent tutorial!

    • @bkrai
      @bkrai  Před 3 lety

      Thanks!

    • @alfredkik3675
      @alfredkik3675 Před 3 lety

      @@bkrai Hello again Dr Rai, I tried to perform an OLR but the brant test assumption did not hold. Omnibus plus other variable were less than 0.05. What else should I do? is there any alternative test for ordinal dependent variables? Your kind advice will be greatly appreciated.

  • @SandeepKumar-me6qr
    @SandeepKumar-me6qr Před 5 lety +1

    Very Nice explanation sir. Can you please upload the Cardiotocographic.csv file?

    • @bkrai
      @bkrai  Před 5 lety

      Here is the link: goo.gl/Xc4G7J

  • @seant7907
    @seant7907 Před 4 lety +1

    what does it mean to be 'rank defficient'?

    • @bkrai
      @bkrai  Před 3 lety

      Which part of the video are you referring to?

  • @nasamumusa5044
    @nasamumusa5044 Před 7 lety

    Thank you Bharatendra Rai. I get your explanation and have adapted my work well following the steps shown in your video.
    I have one issue please. Where columns with independent categorical data having 3 or more levels like the column of "Tendency" shown in your video; the model gives different "Value", "Std. Error", "t value" and "p value" for each level of such variable.
    This seems challenging and confusing to interpret and write out the equation of the model as some of the p values of the levels may not be significant, which should be removed while the other levels been significant are left.
    How can such a model be clearly written out and explained?
    Gracias!

    • @bkrai
      @bkrai  Před 7 lety +1

      When a independent variable is categorical and takes three values, the correct way to represent it in a regression based model is with the help of 3-1=2 dummy variables. That's what you see here. When Tendency0 & Tendency1 are both zero, then Tendency = -1. When Tendency0 =1 & Tendency1 = 0, then Tendency = 0. When Tendency0 = 0 & Tendency1 = 1, then Tendency = 1. Note that in the equation Tendency0 & Tendency1 can only 0 or 1.

    • @nasamumusa5044
      @nasamumusa5044 Před 7 lety

      Bharatendra Rai in my case I used dummy variables of 1,2,3 for the three levels my independent categorical data. (Probably I should start with zero?)
      I converted them to factors. With some independent variables which were continous or categorical and the dependent variable, I ran the model using polr.
      The output gave me always a coffeficient value for the continous independent variables whereas the categorical ones had different coffeficient for each level. Like with yours Tendency 0 had different coffeficient and p values from Tendency 1 and both were significant.
      However, when I found the significancy of my data from their p values. I observed that the p value of the various levels differ in some variable (say e.g. edu with levels 1,2,3. R choose level 1 as reference level and so level 3 had value greater than 0.05 while level 2 had p value less than 0.05). I should remove the level 3 too as I remove the non significant variables from the equation I suppose.
      How can I do so and what may be the following interpretation.
      Thanks for your kind offer to help.

    • @bkrai
      @bkrai  Před 7 lety +1

      For categorical variables, even if one level is significant, do not drop the variable from the model.

    • @nasamumusa5044
      @nasamumusa5044 Před 7 lety

      Bharatendra Rai I sincerly appreciate your explanation. It is noted.

    • @bkrai
      @bkrai  Před 7 lety

      +Nasamu Bawa great 👍

  • @zahradidarali5804
    @zahradidarali5804 Před 4 lety +1

    What are your thoughts on AIC?

    • @bkrai
      @bkrai  Před 4 lety

      It estimates model related error. It is lower the better type of metric and helps to assess model quality. It is used for model selection or comparison.

  • @Nientjuh22
    @Nientjuh22 Před 4 lety +1

    Does anyone know if there is a maximum of independent factors R can handle for this model? I have 6 factors and it gives me an error. However, if I only use 5 of them, no matter which of them, R works perfectly normal

    • @bkrai
      @bkrai  Před 4 lety

      It must be some other issue. In this example I've used 21 variables without any problem.

    • @Nientjuh22
      @Nientjuh22 Před 4 lety

      @@bkrai Thanks! But the error I get is: attempt to find suitable starting values failed
      In addition: Warning messages:
      1: glm.fit: algorithm did not converge
      2: glm.fit: fitted probabilities numerically 0 or 1 occurred

  • @adarsha1981
    @adarsha1981 Před 6 lety +1

    Sir, does Ordinal Regression and Ordinal Logistic Regression are one and the same or are they different?

    • @bkrai
      @bkrai  Před 6 lety

      Ordinal logistic regression is one type of ordinal regression.

    • @adarsha1981
      @adarsha1981 Před 6 lety

      ok.. what kind of ordinal regression you would suggest to a situation where, i have 15 features with 3 features integer, 3, numeric and 8 categorical (binary) and 1 count variable (dependent).. i followed logistic ordinal but not a better result.. i have zero inflated count and tried ZIP model too.. not that great.. ..and cumulative link model(clm) is not fitting as well..kindly suggest

    • @bkrai
      @bkrai  Před 6 lety

      what is your response variable?

    • @adarsha1981
      @adarsha1981 Před 6 lety

      @@bkrai it's count and also I tried with ranking it .. I have more zeros

  • @micheleannarumma4690
    @micheleannarumma4690 Před 5 lety +1

    thank you :)

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for your comment!

  • @R.K.3010
    @R.K.3010 Před 7 lety

    Hello sir,
    I am getting the following error
    "Error in optim(s0, fmin, gmin, method = "BFGS", ...) :
    initial value in 'vmmin' is not finite
    In addition: Warning message:
    glm.fit: fitted probabilities numerically 0 or 1 occurred"
    can you explain this?

    • @bkrai
      @bkrai  Před 7 lety

      Send the codes that you used to look at.

    • @R.K.3010
      @R.K.3010 Před 7 lety

      mod

    • @natasabajic7072
      @natasabajic7072 Před 7 lety

      @Rahul Kadge could you find a solution for this error, I got the same and would like to know how you solved it. Thanks,

  • @thejuhulikal6290
    @thejuhulikal6290 Před 3 lety +1

    Sir which model I should use if all the variables both dependent and independent are categorical. Please help me with this

    • @bkrai
      @bkrai  Před 3 lety +2

      Try Random Forest:
      czcams.com/video/dJclNIN-TPo/video.html

    • @thejuhulikal6290
      @thejuhulikal6290 Před 3 lety +1

      @@bkrai thanks again sir, grateful forever

    • @bkrai
      @bkrai  Před 3 lety

      You are welcome!

    • @thejuhulikal6290
      @thejuhulikal6290 Před 3 lety +1

      @@bkrai sir I am getting much error, can I have your mail id, please.

    • @bkrai
      @bkrai  Před 3 lety

      seemabharat@gmail.com

  • @kmahim82
    @kmahim82 Před 4 lety +1

    what if the intercept is insignificant

    • @bkrai
      @bkrai  Před 4 lety +1

      That's ok, we should still keep it.