Logistic Regression with R: Categorical Response Variable at Two Levels (2018)

Sdílet
Vložit
  • čas přidán 8. 10. 2017
  • Provides an example of student college application for carrying out logistic regression analysis with R. This is among top 10 must know machine learning methods.
    R code and data files: github.com/bkrai/Top-10-Machi...
    Machine Learning videos: goo.gl/WHHqWP
    Includes,
    - use of a categorical binary output variable
    - data partition
    - logistic regression model
    - prediction
    - equation for prediction
    - misclassification errors for training and test data
    - confusion matrix for training and test data
    - goodness-of-fit test
    Machine Learning videos: goo.gl/WHHqWP
    Becoming Data Scientist: goo.gl/JWyyQc
    Introductory R Videos: goo.gl/NZ55SJ
    Deep Learning with TensorFlow: goo.gl/5VtSuC
    Image Analysis & Classification: goo.gl/Md3fMi
    Text mining: goo.gl/7FJGmd
    Data Visualization: goo.gl/Q7Q2A8
    Playlist: goo.gl/iwbhnE
    R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Komentáře • 331

  • @earlymorningcodes6100
    @earlymorningcodes6100 Před 4 lety +11

    2:46 two way table of factor variables 3:23 Data Partition 5:21 logistic regression Model,8:36 prediction 10:05 probability calculation 17:32 error test data 14:24 Interpretation of coefficinet,18:28 goodness of fit 15:03 error training data 16:04 confusion matrix

  • @diliniherath1299
    @diliniherath1299 Před 3 lety +4

    No words to express my gratitude for you. Found your channel days before submitting the project and you saved me !

    • @bkrai
      @bkrai  Před 3 lety

      Great to hear!

  • @saipri
    @saipri Před 2 lety +4

    Extremely crisp and accurate! Hope you get many more views! By far the best on this topic...

    • @bkrai
      @bkrai  Před 2 lety

      Thanks for the comments!

  • @youroldmangaming8150
    @youroldmangaming8150 Před 3 lety +1

    Thanks mate. Been struggling to find a practical person to show how to do this. Very clear and well thought out. Thank you.

    • @bkrai
      @bkrai  Před 3 lety

      You're very welcome!

  • @wardhereadan1187
    @wardhereadan1187 Před 4 lety +2

    DR. Bharatendra Rai that video was amazing!! I hope you continue to post more videos like this! seriously amazing!!!!!!!!!!!!!!!

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for comments!

  • @crossray974
    @crossray974 Před 3 lety +2

    Thank you Mr. Dr. Bharatendra - your stuff and method are on top of youtube, greets from Europe!

    • @bkrai
      @bkrai  Před 3 lety +1

      Most welcome!

  • @mnorberta24
    @mnorberta24 Před 3 lety +2

    Thank you for helping save my grades for this module!!!!! I might just watch all your videos because they're so helpful!!!!

    • @bkrai
      @bkrai  Před 3 lety

      Glad to hear it!

  • @allabtlyf
    @allabtlyf Před 4 lety +2

    Wonderful video.. I was struggling to calculate the probablity from estimate in notebook but you made it quite simple.
    Thanks a lot

    • @bkrai
      @bkrai  Před 4 lety +1

      Thanks for comments!

  • @victorhenostroza1871
    @victorhenostroza1871 Před 4 lety +2

    Thanks man, again other amazing job, u r the teacher we all want at univ.

    • @bkrai
      @bkrai  Před 4 lety +1

      Thanks for comments!

  • @shaikhalishams4065
    @shaikhalishams4065 Před 4 lety +10

    Finally I've got a perfect video on this topic.

    • @bkrai
      @bkrai  Před 4 lety +2

      Thanks for comments!

  • @lindanidube5714
    @lindanidube5714 Před 4 lety +2

    This was amazing... you explain everything step by step nicely :-)

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for your feedback!

  • @rajeshtukdeo
    @rajeshtukdeo Před 3 lety +1

    Amazing video to understand the logistic regression concepts thoroughly !!!

    • @bkrai
      @bkrai  Před 3 lety

      Thanks for comments!

  • @debashishchatterjee5198
    @debashishchatterjee5198 Před 5 lety +2

    I love your videos .... concise and to the point. Superb .... keep it up

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @viigeminalegio
    @viigeminalegio Před 4 lety +1

    Thanks you very much Dr. Bharatendra. I was looking to solve some of my doubts and I finally solved them. Thanks for sharing your knowledge. I wish I could have the opportunity to help you in some occasion. Thanks for all, great job.

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for your comments and feedback!

  • @flamboyantperson5936
    @flamboyantperson5936 Před 6 lety +1

    Excellent video Sir. You are a great statistician and expert in R. Thank you for the video Sir

  • @kir66846037
    @kir66846037 Před 3 lety +2

    the best teaching of logistic regression!!!! Thanks a lot

    • @bkrai
      @bkrai  Před 3 lety

      Most welcome!

  • @ayush612
    @ayush612 Před 6 lety +1

    This is an awesome video Sir...thanks for uploading this!!

    • @bkrai
      @bkrai  Před 6 lety

      Thanks for comments!

  • @jared1122
    @jared1122 Před 4 lety +2

    Thank you Dr Rai for the wonderful explanation👍 God bless you 🙏

  • @pranatim
    @pranatim Před 2 lety +1

    Best tutorial on logistic regression. Thank you so much for sharing.

    • @bkrai
      @bkrai  Před 2 lety

      You're very welcome!

  • @sagaranvekar856
    @sagaranvekar856 Před 5 lety +2

    Great Explanation.Thank you Sir!

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @philippdegens6776
    @philippdegens6776 Před 4 lety +1

    Thanks. You gave a clear and concise explanation and a bonus was that it was in R which I am learning.

    • @bkrai
      @bkrai  Před 4 lety

      You're very welcome!

    • @bkrai
      @bkrai  Před 4 lety

      You may also find this useful:
      czcams.com/play/PL34t5iLfZddvv-L5iFFpd_P1jy_7ElWMG.html

  • @nth.education
    @nth.education Před 4 lety +3

    Amazing explanation, loved the way you went through with the code and how to proceed step by step.
    I have a doubt with the pvalue calculation at the end. Can you explain a bit more the "with" command you used ? i couldn't understand the parameters used in that, interpretation of p-value is fine, but would like to know the use of the command so i can employ that in some places as well.
    Thanks

    • @bkrai
      @bkrai  Před 3 lety +1

      you can run ?with in the console, it will give you all details and also examples.

  • @jbei9981
    @jbei9981 Před 3 lety +1

    Thank you so much. Excellent video. I was really thinking I would fail my assignment until I found this.

    • @bkrai
      @bkrai  Před 3 lety +1

      You're very welcome!

  • @rizwanghulamhussain7309

    Excellent Video! Could you please guide how to fit panel logistic regression in R. I want to make confusion matrix / ROC curve using pglm library but could not find fitting probabilities in pglm library

  • @OrcaChess
    @OrcaChess Před 5 lety +1

    Thank you very much!
    Your videos create high value.
    Kind regards from Karlsruhe
    Jonathan

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for your comments!

  • @harishnagpal21
    @harishnagpal21 Před 5 lety +1

    Hi Bharatendra, I saw your linear regression video also. The explanation on results was fantastic. I got to learn new things. One query - when to use linear and when to use logistic regression? Thanks

    • @bkrai
      @bkrai  Před 5 lety

      When y variable is factor, logistic is used. For numeric y linear regression is used.

    • @harishnagpal21
      @harishnagpal21 Před 5 lety +1

      thanks :)

  • @yogeshdhar5825
    @yogeshdhar5825 Před 4 lety +2

    Very well explained!

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for comments!

  • @alphar85
    @alphar85 Před 3 lety +1

    You are just amazing 👏. You made my life easier with the codes.

    • @bkrai
      @bkrai  Před 3 lety

      Happy to hear that!

  • @ramp2011
    @ramp2011 Před 6 lety +1

    Great video. rank is a factor variable and looks like logistic regression has auto converted that in to dummy variables internally (from the summary model). Is there a way to find which algorithms auto converts categorical variable to dummy variables automatically and the ones one has to convert manually? Thank you for your help

    • @bkrai
      @bkrai  Před 6 lety

      Many algorithms do not need conversion of categorical variables to dummy variables. However, when using regression-based methods, R does so automatically.

  • @MemphianSounds
    @MemphianSounds Před 3 lety +1

    Great as always! What do you do when you have so many rows and variables that your computer can't compute the vector in R?

    • @bkrai
      @bkrai  Před 3 lety +1

      You can take a sample.

  • @valeriasanchez4910
    @valeriasanchez4910 Před 3 lety +1

    Excellent video Dr.!, I just have one question: Why it is necessary to do the data partition for the estimation?

    • @bkrai
      @bkrai  Před 3 lety +2

      It can help to avoid over fitting which happens when results are good with training data, but not so good on test data.

  • @jarrelldunson
    @jarrelldunson Před 3 lety +3

    Thank you for sharing, very helpful

    • @bkrai
      @bkrai  Před 3 lety

      You are so welcome!

  • @genevieveemefaasare8352
    @genevieveemefaasare8352 Před rokem +1

    thanks so much. very precise and concise explanations. Thank you Sir.

    • @bkrai
      @bkrai  Před rokem

      You are very welcome!

  • @rohitkamble1737
    @rohitkamble1737 Před 3 lety +1

    Very clear explanation. Understand all things

    • @bkrai
      @bkrai  Před 3 lety

      Thanks for comments!

    • @rohitkamble1737
      @rohitkamble1737 Před 3 lety

      @@bkrai sir, I am working on project on Real estate and banking model to predict prizes of house, could you plz help me on that?

  • @soumikchatterjee3996
    @soumikchatterjee3996 Před 4 lety +1

    Excellent video. Just few things to mention. In glm result, residual deviance is greater than residual degree of freedom that means the data has overdispersion. Better to use quasibinomial function rather than binomial. Other wise p value would show false significance level.
    Second thing to mention backward variable selection without montecarlo permutation has type2 error therefore better to use it cautiously or use Information theoretic approach proposed by Burnham etal with model weight as a criterion.
    Thanks for this beautiful video sir

    • @soumikchatterjee3996
      @soumikchatterjee3996 Před 4 lety +1

      Although you created seed and resample which can reduce the error but it is extremely difficult to find proper seed size without understanding model weight (wi). Thanks

    • @bkrai
      @bkrai  Před 4 lety +1

      Thanks for the feedback and comments!

  • @anuchowdarybds
    @anuchowdarybds Před 5 lety +1

    Very clear explanation . Thank you . Do you have any more videos on logit regression ?

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments! The link below also has multinomial logistic regression and other regression based methods.
      czcams.com/play/PL34t5iLfZddtKNwFNic3HWNV2qMsQ9AjD.html

  • @kumarvarma942
    @kumarvarma942 Před 5 lety

    HI Sir Great video's and easy to learn topics. I have small doubt don't mind. before dividing data into train and test . we need to do null values removable , finding outliers, scaling, EDA, then sampling .... could you please please share if any video on linear regression or logistic with combination of these steps. because we need to check all above conditions to predict best output . I am bit confusion on finding outliers(or remove outliers) and null values removable and scaling (min max or z-score) . Please please share any video it will helpful to us. Thanks in advance .

  • @AnaPTedim
    @AnaPTedim Před 6 lety +2

    Great video it really helped a lot. I have a question though can I use the same model if one of my categorical variables has in the two-way table that equal zero? If not is there any alternative? How can I solve this?

    • @bkrai
      @bkrai  Před 6 lety

      Let's say your categorical variable has 10 levels and the last one has frequency below 5. You can combine last two levels into one and then do the analysis.

    • @AnaPTedim
      @AnaPTedim Před 6 lety +1

      Thank you very much! That might work :)

  • @abdulazeez9863
    @abdulazeez9863 Před 5 lety +2

    Excellent explanation... please make a video of Boosted Regression Tree model with R. Thank you sir.

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments and suggestion! I've added it to my list.

  • @yaweli2968
    @yaweli2968 Před rokem

    If you have two or more categorical variables which are strings, how do you decide which one to make a factor of 0 or 1. Like how do you assign them specific factors ?

  • @williamstan1780
    @williamstan1780 Před rokem +1

    Very informative video and explain it in a manner that easy to understand
    I have a question though , what is the difference between logistic regression and multinomial logistic regression ?

    • @bkrai
      @bkrai  Před rokem

      response variable has more than 2 levels in multinomial. See this for details:
      czcams.com/video/ftjNuPkPQB4/video.html

  • @mueezwaq
    @mueezwaq Před 8 měsíci +1

    Hi there, thanks for this. I don't like how R displays the results for factors with more than 2 level - is there any way to get output like SPSS (which supplies a single odds ratio, 95% CI and p-value for each variable in the model). I have tried both the logistic.display and exp() commands but they do not provide an overall value like this. Any ideas?

    • @bkrai
      @bkrai  Před 8 měsíci

      You can use the output and customize it.

  • @dmukherjee4049
    @dmukherjee4049 Před 6 lety +3

    Sir can you explain "goodness of fit test". What is df.null-df.residual, lower tail & why it is 'F'?
    Thank You

    • @bkrai
      @bkrai  Před 6 lety

      When in RStudio, you can run ?glm. This will provide you with more details.

  • @PA_hunter
    @PA_hunter Před 3 lety +1

    Thank you Dr. Bharatendra Rai. Can you explain more why Rank 1 is not included in the model, please?

    • @bkrai
      @bkrai  Před 3 lety +1

      For factor independent variables, we covert them to dummy variables. For more detailed coverage see:
      czcams.com/video/s23CMIjfwHk/video.html

  • @reubenmarfo9855
    @reubenmarfo9855 Před 2 lety +1

    Professor, can you please comment on why in your previous video on logistic regression, you trained the model and predicted on the same data without splitting.

    • @bkrai
      @bkrai  Před 2 lety

      Just wanted to show mainly how to run logistic regression. But after getting feedback created this on which is more complete.

  • @fernandoflores3161
    @fernandoflores3161 Před 2 lety +1

    Excellent explanation! How do you deal with ordinal and nominal categorical variables?

    • @bkrai
      @bkrai  Před 2 lety

      If response variable is ordinal, refer to this:
      czcams.com/video/qkivJzjyHoA/video.html

  • @shyamchaurasiya1069
    @shyamchaurasiya1069 Před 5 lety +2

    Love You Sir Very Useful videos

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments! For recent Python video, see this link:
      czcams.com/video/mKb5hRJmtCU/video.html

  • @femiakinmade4077
    @femiakinmade4077 Před 4 lety +2

    I enjoyed your video, thank you! Can I get some clarity on why you used the "train" dataset in your prediction instead of "test"? dataset: ## p1

    • @bkrai
      @bkrai  Před 4 lety

      After 'train', I also use 'test'. Note that if you get good results with 'train' but not with 'test', it will suggest over-fitting problem.

    • @femiakinmade4077
      @femiakinmade4077 Před 4 lety +1

      @@bkrai Thanks for your response. Appreciated

    • @bkrai
      @bkrai  Před 4 lety

      Welcome!

  • @maheswarivemula141
    @maheswarivemula141 Před 2 lety +1

    Thank you sir for the wonderful video.
    Sir, I have a doubt that I'm not getting value while running on the test dataset. Could you please help me out of this error.
    It is showing ' all arguments must have same length '

    • @bkrai
      @bkrai  Před 2 lety

      Check your data again.

  • @priyadarsinisamal1779
    @priyadarsinisamal1779 Před 2 lety

    sir how can i use one data set for training and another different dataset(having similar variables like training set) for testing?

  • @yousif533
    @yousif533 Před 2 lety +1

    Hi,
    Dear Dr.Bharatendra Rai
    What are the best models for fitting the binary data? I know that the logistic regression model is one of the models.
    What is the other model to make a comparison with the logistic model to find the best model?
    I would be grateful if you could assist me with this.
    I look forward to hearing from you soon
    Best regards,

    • @bkrai
      @bkrai  Před 2 lety

      You can use tree based methods for comparison, especially random forest and extreme gradient boosting. See this link for details:
      czcams.com/video/hCLKMiZBTrU/video.html

    • @yousif533
      @yousif533 Před 2 lety +1

      @@bkrai Thank you, Prof.
      Are these methods (tree-based methods) can be used for regression or classification? Since my concern is to do regression ( predict disease status). As I think that these methods are used only for classification. Kindly confirm. Best regards

    • @bkrai
      @bkrai  Před 2 lety

      It does both regression or classification. I have included examples for both regression and classification.

    • @yousif533
      @yousif533 Před rokem

      @@bkrai Thank you, Prof.

  • @ezechielamoussou7409
    @ezechielamoussou7409 Před 2 lety +1

    Thank you for the video Sir.
    If I were running a logistic regression with categorical predictor variables, should I change them to factors?

  • @sayamnandy5855
    @sayamnandy5855 Před 5 lety +1

    I have a question sir..Should i check multicollinearity (Vif) while performing the logistic regression? If any of the variable's vif value is greater than 2 then i will remove this variable from my model. Can i do that?

    • @bkrai
      @bkrai  Před 3 lety

      Yes, you should be able to do it.

  • @arunshowri7829
    @arunshowri7829 Před 6 lety +1

    Hi Sir,
    I have a question. how to predict the target variable if we have many independent variables( eg: around 60). what we have to do if most of the values in independent variable are NA's. Please suggest me Sir.

    • @bkrai
      @bkrai  Před 6 lety

      60 independent variables should be fine. But before applying the method, you need to take care of missing values to prepare your data ready for analysis.

    • @sayamnandy5855
      @sayamnandy5855 Před 5 lety

      Apart from sir's suggestion..you can go for information value concept if you have plenty independent variable.

  • @rohinipatil2929
    @rohinipatil2929 Před 4 lety +1

    Very well explained

    • @bkrai
      @bkrai  Před 4 lety

      Glad it was helpful!

  • @adtx11
    @adtx11 Před 4 lety +1

    One quick question: model 1 : a is the output variable, b and c are covariates, and both have significant p value. Model 2 : same output variable, b and d are covariates, and both have significant p values after we run the summary command. Finally Model 3: same output variable and all three b, c, d are covariates. Here if we see that only b and c are significant, but d doesn't have a significant p value , - then how do you interpret the result ? Can we say that adding covariate d doesn't add value to the model , even though it was significant in the previous bivariate scenario? Thank you.

    • @bkrai
      @bkrai  Před 3 lety +2

      Check relationship between c and d, that may help clarify.

  • @InfinitesimallyInfinite
    @InfinitesimallyInfinite Před 5 lety +1

    Brilliant video professor. I have question... so it is always that you convert categorical integer variables into factor variables before performing logistic regression? At the other places, like the algorithm XGB, I haven't seen you convert 'Admit' variable into a factor variable, why is it so? Thanks.

    • @bkrai
      @bkrai  Před 5 lety +1

      Different methods require data to be prepared in certain way. For example, XGB and neural networks require response to have numeric format.

    • @InfinitesimallyInfinite
      @InfinitesimallyInfinite Před 5 lety +1

      Thanks professor for the quick response. Really appreciate. 😀

    • @bkrai
      @bkrai  Před 5 lety

      Thanks!

  • @kartt100
    @kartt100 Před 4 lety +1

    Excellent explanation sr

  • @uhsay1986
    @uhsay1986 Před 5 lety +1

    Hi Sir , i have a retail train data set where i need to predict if a store should be opened or not in a respective location. I removed NAs from the train set , trying to apply glm function ( store~. , data=train, family='binomial' ) .. even after waiting 5-10 min i dont get any output .. the data set consist of character , int columns.

    • @bkrai
      @bkrai  Před 3 lety

      You will have to look at the structure of your data and make sure response variable is of factor type.

  • @rupeshbharadwaj
    @rupeshbharadwaj Před 5 lety +1

    Really great! Thanks!

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @rohankulkarni8613
    @rohankulkarni8613 Před 4 lety +1

    Please let me know if we have data visualization on this data ? like in tableau or any other software ?

    • @bkrai
      @bkrai  Před 4 lety

      For data visualization, you can try this link:
      czcams.com/video/niB5A8qa88I/video.html

  • @mujeebrahman5282
    @mujeebrahman5282 Před 4 lety +2

    Absolutely brilliant thank you

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for comments!

    • @mujeebrahman5282
      @mujeebrahman5282 Před 4 lety +1

      Dr. Bharatendra Rai could you please make videos on Machine learning using python

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for the suggestion, I'll work on this.

  • @dhanielaritonang3273
    @dhanielaritonang3273 Před 4 lety +1

    Sir I have a question, how if we have three levels of categorical response variable.. what 'family' should I use ?

    • @bkrai
      @bkrai  Před 4 lety

      For 3 or more, use multinomial logistic regression:
      czcams.com/play/PL34t5iLfZddvv-L5iFFpd_P1jy_7ElWMG.html

  • @myakaramyakrishna4400
    @myakaramyakrishna4400 Před 4 lety +1

    Why do you use xtabs
    ?
    How we do find a dependent variable in data set?

    • @bkrai
      @bkrai  Před 4 lety +1

      xtabs is for cross tabulation. A dependent variable is based on the context of data. In the example I have used, it is obvious.

  • @BBkhadka
    @BBkhadka Před 10 měsíci +1

    thank you for great lecture

    • @bkrai
      @bkrai  Před 10 měsíci

      Most welcome!

  • @jitendratrivedi7889
    @jitendratrivedi7889 Před 6 lety +2

    Great video.

  • @dipanjanroy589
    @dipanjanroy589 Před 4 lety +1

    sir can you please provide the code for testing accuracy of this example. I'm a new learner & i find it pretty interesting & simple by the way you teach.

    • @bkrai
      @bkrai  Před 4 lety

      It's in the description.

  • @chandrasekharkona6462
    @chandrasekharkona6462 Před 4 lety +1

    excellent work

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for comments!

  • @mohammadj.shamim9342
    @mohammadj.shamim9342 Před 6 lety

    sorry dear teacher, what is the difference between this and common lm? I mean can we use the common model with dummy variables to solve this problem?

    • @bkrai
      @bkrai  Před 6 lety +1

      If response variable is continuous, lm is the right method. But here response is categorical or factor and has only two levels. Logistic regression is used in such situation.

  • @amritthapa9315
    @amritthapa9315 Před 5 lety +1

    Could you please explain the importance of "xtabs" command in logistic regression? You said we should not get zero. Could you explain more on this.

    • @bkrai
      @bkrai  Před 5 lety +1

      Key idea is to have sufficient number of samples in each cell. If there are too few or zero samples, then the prediction model may not be stable or consistent.

    • @JidduVillarin
      @JidduVillarin Před 4 lety

      @@bkrai Thank you for this video. It is very concise and understandable. I'd like expand on this question slightly. If you did have a zero value in the xtab, what would have been the appropriate course of action?

  • @guruji885
    @guruji885 Před 2 lety +1

    Sir
    Outstanding 👍✍️

  • @lutfyabdulah1321
    @lutfyabdulah1321 Před 3 lety +1

    Thanks for your share. It is very helpfull

    • @bkrai
      @bkrai  Před 3 lety

      You are welcome!

  • @mohamedbousarout6515
    @mohamedbousarout6515 Před 2 lety +2

    Thank you sir keep up the good work ;)

    • @bkrai
      @bkrai  Před 2 lety +1

      You are welcome!

  • @narasimhapuvalla3211
    @narasimhapuvalla3211 Před 5 lety +1

    1.) Suppose we have categorical fields in our data. Is it mandatory to always change to numeric factors ?
    2.) If the answer for question 1 is correct, then what if we have too many unique values in each category columns?
    Let us take for example : I have a dataset of 100,000 records. There are a few columns with categorical data in it. Each of these categorical columns may have 1000 or more unique values. So if I convert them into factors, then "labels = c(1:1000 or more)".
    Is this ok to do it this way?
    3.) Is there a way to not convert categorical data into numeric values and still use them in the machine learning model?
    4.) How do we deal with Date fields?
    5.) The conversion of categorical variables into dummy variables --> should we do this in all cases or is this something we need to consider only if the unique values in the categorical fields are limited to a lesser number?

    • @dhavalpatel1843
      @dhavalpatel1843 Před 4 lety +1

      1. No , it is not mandotary to change. You can set family parameter as “binomial”.
      2.Answered in no.1
      3.Answered in no.1
      4.Convert it into factor variables
      5.Try to consider it in all cases.

  • @s9438679525
    @s9438679525 Před 5 lety +1

    Hi sir,
    Please explain the use of type='response' in line number 23
    Thanks

    • @bkrai
      @bkrai  Před 3 lety

      The type="response" option tells R to output probabilities of the form P(Y = 1|X), as opposed to other information such as the logit.

  • @yekhtiari
    @yekhtiari Před 5 lety +1

    Loved all of your videos.I've learned lots of good tricks in R with your videos.Could you tell what I did wrong the following code returns the same index?(I am thinking this way is not safe for splitting dataset into training and testing)
    x

    • @bkrai
      @bkrai  Před 5 lety

      You have a very small sample. Try
      sample(2,length(x),replace = T,prob = c(.5,.5))

    • @yekhtiari
      @yekhtiari Před 5 lety

      @@bkrai How about using :
      idx

  • @kavyayd3577
    @kavyayd3577 Před 5 lety +2

    Great explaination, sir, can you upload a video of logistic regression with more than 10 varaibles. it would be great help.

    • @bkrai
      @bkrai  Před 5 lety

      The process will work same with any number of variables.

  • @nevmiku
    @nevmiku Před 5 lety +2

    You sir are a god send. i love you.

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for your comments!

  • @C0pyC4tt
    @C0pyC4tt Před 6 lety

    I tried the script in editor and summary() does not give me p-values and so on, whats the reason?

    • @bkrai
      @bkrai  Před 6 lety

      If you used same code as shown in the video, p-value will be the last column when you run summary.

  • @akashprabhakar6353
    @akashprabhakar6353 Před 3 lety +1

    Thanks for this video sir...
    Kindly tell how can we increase the accuracy of this model...as error rate is quite high..

    • @bkrai
      @bkrai  Před 3 lety

      You can try other methods to improve accuracy:
      czcams.com/play/PL34t5iLfZddsQ0NzMFszGduj3jE8UFm4O.html

  • @nidhijoshi1532
    @nidhijoshi1532 Před 4 lety +1

    Sir, I have data set on food security and I want to apply logistic regression model. Sir, but I am not getting how to apply the model.

    • @bkrai
      @bkrai  Před 4 lety

      Make sure you have a categorical response variable just as I have 'admit' variable in this video.

    • @shanicemohanlal1605
      @shanicemohanlal1605 Před 4 lety

      @@bkrai Hi Dr, Please can you kindly explain how to do this when you have a categorical response variable in my case is a presence/absence and the other variables contain categorical variables as well which I have changed to read as factor variables however, when the logistic regression model runs I get the warning message glm.fit: fitted probabilities numerically 0 or 1 occurred.

  • @laxmanbisht2638
    @laxmanbisht2638 Před rokem +2

    Sir, thanks a lot!

  • @merumomo
    @merumomo Před 5 lety +1

    Great video! What do we do if we do have "0"(zero) in factor variables?

    • @bkrai
      @bkrai  Před 5 lety +1

      Do you mean missing values?

    • @merumomo
      @merumomo Před 5 lety

      Bharatendra Rai yes, I meant missing values. We fill in missing values with mean/median in numeric variables but I guess we need to remove missing values if it is in categorical variables?

    • @bkrai
      @bkrai  Před 5 lety +1

      For categorical variables you can go with category with highest frequency.

    • @merumomo
      @merumomo Před 5 lety +1

      Bharatendra Rai thank you!!

  • @shivamparashar...9536
    @shivamparashar...9536 Před 6 měsíci +1

    Sir when I uploaded a data set then it doesn't take all data ..it is leaving a few rows.
    Please tell me how I can upload a dataset.
    Thank you

    • @bkrai
      @bkrai  Před 5 měsíci

      How many rows your original data has?

  • @manasarath4146
    @manasarath4146 Před 6 lety +1

    If my response variable is categorical but present as "Yes" or "No" instead of 1 or 0. Is that a problem?

    • @bkrai
      @bkrai  Před 6 lety

      That should be fine.

  • @YatiChoudhary
    @YatiChoudhary Před 3 lety +1

    Sir, do we change intergers into factors if the variables are categorical even in Multinomial Logistic Regression or it is done only in Logistic Regression?

    • @bkrai
      @bkrai  Před 3 lety

      For Multinomial Logistic Regression you can refer to this:
      czcams.com/video/S2rZp4L_nXo/video.html

    • @YatiChoudhary
      @YatiChoudhary Před 3 lety +1

      @@bkrai Sir actually I came to this video after watching your video on Multinomial Logistic Regression. But now I am confused if we should always change all categorical variables into factors or it just happens in logistic regression. Because in Multinomial Regression you changed only response variable into a factor.

    • @bkrai
      @bkrai  Před 3 lety +1

      For response variable I would say yes. But for others you can go case by case.

    • @YatiChoudhary
      @YatiChoudhary Před 3 lety +1

      @@bkrai Thank you Sir

    • @bkrai
      @bkrai  Před 3 lety

      You are welcome!

  • @navneetjain2507
    @navneetjain2507 Před 4 lety +1

    What about the case when we have a lot of independent variables that have zero as a response or missing values?

    • @bkrai
      @bkrai  Před 4 lety

      For missing values refer to this link:
      czcams.com/video/An7nPLJ0fsg/video.html

  • @shajibkumarguha234
    @shajibkumarguha234 Před 3 lety +1

    Hello Sir! Why did you choose rank as factor and not as ordered?

    • @bkrai
      @bkrai  Před 3 lety

      You are right, ordinal will be more correct.

  • @hyunjungariuka1686
    @hyunjungariuka1686 Před 3 lety +1

    can anyone help? I got about NSP, but in the regression in appears only 2 rows which is suspect and pathological, but in my regression there is 4 lines like that. I think that it is suspect pathological and the other 2 what can it be?

    • @bkrai
      @bkrai  Před 3 lety

      For response more than 2 levels, you need to apply multinomial logistic. Here is the link:
      czcams.com/play/PL34t5iLfZddvv-L5iFFpd_P1jy_7ElWMG.html

  • @irfaneditzstatus9760
    @irfaneditzstatus9760 Před 4 lety +1

    Hi Sir, I am analyzing the data based on the traffic survey. I have Age, Gender, TripDist, TravelMode, TravelTime, DepartureTime, LectureTime information. What is the meaning of factor and margin in regression Modelling. can you help me in that. Thanks in advance

    • @bkrai
      @bkrai  Před 3 lety

      'factor' is another name for a categorical or qualitative variable.

  • @NHYIRABAful
    @NHYIRABAful Před 6 lety +2

    Thank you so much

  • @valeriasanchez4910
    @valeriasanchez4910 Před 3 lety +2

    Thank you!

    • @bkrai
      @bkrai  Před 3 lety +2

      You are welcome!

  • @juanurrego796
    @juanurrego796 Před 6 lety +1

    I want to create a data frame of covariates in order to be able to use the predict function. However, to use the predict function, I must include all my variables in this new data frame, which seems really hard because I have a variable that identifies countries by name. I've tried using the following code:
    data_for_fitted_values3

    • @bkrai
      @bkrai  Před 3 lety

      Sorry I saw this today. Hope it is resolved by now.

  • @JamTik734
    @JamTik734 Před 3 lety +2

    Perfect Sir

  • @fafeabo8681
    @fafeabo8681 Před 5 lety

    Thanks a lot. Can I ask about goodness of fit test please. I got 0 value is that true?

    • @bkrai
      @bkrai  Před 5 lety

      Very low p-values are rounded to 0, they are not exactly zero.

  • @vairachilai3588
    @vairachilai3588 Před 3 lety +1

    In Logistic regression, how to check the linear relationship between the logit of outcome and each predictor values

    • @bkrai
      @bkrai  Před 3 lety

      That's not needed.

    • @vairachilai3588
      @vairachilai3588 Před 3 lety +1

      @@bkrai Linear relationship between the logit of outcome and each predictor values.
      If this condition is not met, logistic regression is invalid
      log⁡〖𝑝/(1−𝑝)〗=𝑏0+𝑏1 ∗𝑋
      I read in almost many article. If possible can you explain for this case study

  • @Merjy01
    @Merjy01 Před 8 měsíci +1

    Thank you a lot 🙏

    • @bkrai
      @bkrai  Před 8 měsíci

      You're welcome 😊

  • @me3jab1
    @me3jab1 Před 4 lety +1

    Hello , Before u remove gre residual deviance was 369.99 when u rerun the model without gre it became 371.81 I mean increased , PLease in this case we should not keep gre even its not significant ? or the value change is negligible

    • @bkrai
      @bkrai  Před 4 lety +1

      That change is negligible. When a variable is not statistically significant, we should remove it.

    • @me3jab1
      @me3jab1 Před 4 lety +1

      @@bkrai thank you Boss

    • @bkrai
      @bkrai  Před 4 lety

      welcome!

    • @me3jab1
      @me3jab1 Před 4 lety +1

      @@bkrai if we have only one Y result ( not 2 as this example ) which Family type we must choose ?

    • @bkrai
      @bkrai  Před 4 lety +1

      If Y has only one value then that doesn't need a classification model.

  • @evansumido6191
    @evansumido6191 Před 2 lety +1

    hi sir. do you have a code for cross validation? thank you.

    • @bkrai
      @bkrai  Před 2 lety +1

      refer to this for various ways to use CV:
      czcams.com/video/GmkHvDs0GG8/video.html

  • @nishadseeraj7034
    @nishadseeraj7034 Před 5 lety +1

    Good day sir, your video was very helpful, but i still have some questions. Whenever i run the glm on my dataset, i am getting an error stating:
    glm.fit: fitted probabilities numerically 0 or 1 occurred and
    glm.fit: algorithm did not converge
    My dataset is the swiss bank notes dataset and data is organised (i.e. all of the Y=0 rows are first followed by the Y=1 rows). Any help on how to proceed with solving these problems would be great!

    • @bkrai
      @bkrai  Před 5 lety

      Convert your response variable to factor type.