Video není dostupné.
Omlouváme se.

Categorical Variables in Stata

Sdílet
Vložit
  • čas přidán 28. 06. 2017
  • More information on categorical variables in Stata: www.stata.com/f...

Komentáře • 89

  • @wkatieivey
    @wkatieivey Před 5 lety +4

    Thank you so much! I am a grad student trying to work with survey data and this helped immensely! Your video saved me a solid three hours of time I would have spent making mistakes in Stata.

  • @bevbeautifulhealing
    @bevbeautifulhealing Před rokem

    Thank you so very much. 🙏🏾 Helping a friend with masters and this has really helped me understand and am able to move forward after completing this section to help her to move on. Cheers mucho

  • @LouisaOsei-Bonsu
    @LouisaOsei-Bonsu Před 6 dny

    Thank you, Sebastian, this is so helpful and you made it easy

  • @DavidMwangi
    @DavidMwangi Před 3 lety +1

    Sebastian sir, thank you so much for this! you went straight to the point with no time wasted

  • @GM__user
    @GM__user Před 15 dny

    THANK YOU🙏🏼 This video has been extremely helpful

  • @anushkanegi5220
    @anushkanegi5220 Před 3 lety

    This was so helpful! I spent hours looking for this on the internet. Thank you so much!!!

  • @anastaciamatlebjane1040
    @anastaciamatlebjane1040 Před 4 lety +2

    I think you just saved my life😢

  • @danilorodriguez1638
    @danilorodriguez1638 Před 5 lety +1

    Muchas gracias. Saludos desde Colombia.

  • @anlanhnguyen-ly9vi
    @anlanhnguyen-ly9vi Před 3 měsíci

    Excellent! Thank you for saving me tons of time :)

  • @wharrison2010
    @wharrison2010 Před 5 lety +1

    Thank you for this very precise and succinct lecture. A quick question: I saw positive coefficients when the base was "No Diploma" and negative coefficients when "Graduate". Kindly interpret one of the coefficients in either regression results.

    • @sebastianwaiecon
      @sebastianwaiecon  Před 5 lety +2

      The reason for this is that "no diploma" is the lowest income group. Any departure from that group would be associated with an increase in income. For example, the estimate for the bachelor's degree tells us how much more income bachelor's degree holders have above those with no diploma.
      Graduate, on the other hand, is the highest income group (people with PhDs, MBAs, and so forth). Here, people with (only) bachelor's degrees have less income on average than the base group.

  • @ViralVidz21
    @ViralVidz21 Před rokem

    You are the best man.

  • @koketsomokoditoa3255
    @koketsomokoditoa3255 Před 3 lety +1

    Hi. If i have values such as “grade 1, grade 2 grade 3, grade 10, grade 11 etc” under a variable named “Education” but I want all lower grades such as grade 1-grade 7 to be called “Primary School” and higher grades such as grade 8-grade 11 to be called “High School”- how do I code that in stata?

    • @keri-annfacey6794
      @keri-annfacey6794 Před 3 lety

      I came on here trying to find answers to the same question. I was able to group my grade levels but not in the order that I wanted. I have grade levels 7-11 and I want to group grades 7-9 as "lower school" and grades 10-11 as "upper school". The command I have is was able to group grades 7-8 as "lower school" and grades 9-11 as "upper school". I am trying to figure out how to group grade 9 in the lower school category.
      Anyways,
      Try this and see if it works.
      gen schoollevel = recode (education, 1,2)
      Label define schoollevels 1 "Primary School" 2 "High School"
      As I mentioned this command worked for me but not in the order I wanted. Hope it helps in some small way.

    • @keri-annfacey6794
      @keri-annfacey6794 Před 3 lety

      Try this video czcams.com/video/XWVaXN2KwmA/video.html

    • @sebastianwaiecon
      @sebastianwaiecon  Před 3 lety

      You can do this with logical operators, in this case the "pipe" (|), which means "or." For your example, gen primaryschool = Education == "grade 1" | Education == "grade 2" and so on. Keep adding pipes and statements for each grade.

  • @shannonbarnes1888
    @shannonbarnes1888 Před 2 lety

    Hi how can i apply this information if my categorical variable is already in numbers (0,1,2) and i need to regress it with a continuous variable? STATA doesn't know what the values correspond to but 0= conservative, 1=labour and 2=other

  • @mussahemed1153
    @mussahemed1153 Před 3 lety

    I only bought stata yesterday and first time using it for my Msc dissertation. This was so helpful. When using the label define command, I assume you have to type in the variable names exactly as was shown when tab. What if the variable name has spaces in between, as in instead of NODIPLOMA, it was NO DIPLOMA? The variable in my data had a space in between and I got a syntax error when I tried to use the command label define on it.

    • @mussahemed1153
      @mussahemed1153 Před 3 lety

      Found the answer to my own question by playing around with stata. When entering the name for instance NO DIPLOMA, make sure you enter it as 1 "NO DIPLOMA"

    • @sebastianwaiecon
      @sebastianwaiecon  Před 3 lety

      @@mussahemed1153 Yes, you got it. The reason you had this problem is that spaces are used as delimiters in Stata commands, similar to how commas are generally used in Excel functions. So, any time you have text with a space you need quotation marks.

  • @user-ty8tk5hg6r
    @user-ty8tk5hg6r Před 2 lety

    how about dummy varible
    I have code nominal variable for exp. status how is it ?

  • @theReal_Mimi
    @theReal_Mimi Před 3 lety

    whats the value of x if you could calculate estimated value of wage for each category.

  • @inestnewdocile1646
    @inestnewdocile1646 Před 2 měsíci

    How to check colleration for categorical variables

  • @markvanderlinde30
    @markvanderlinde30 Před 3 lety

    hi, is it true that the i. command for categorical variables does not work when using the Oaxaca regression command?

  • @simonetaddeo1935
    @simonetaddeo1935 Před 2 lety

    How could I see if even NODIPLOMA is significant in regression? Is there any command which shows all the variables without any base reference?

    • @sebastianwaiecon
      @sebastianwaiecon  Před 2 lety

      You can force Stata to omit the constant using the noconstant option. This will remove the collinearity and allow all categories to be in the regression. However, be careful about the interpretation and what statistical significance would mean in this new context. I personally wouldn't advise doing what I'm suggesting but that is the answer to your question.

  • @ceciliadelvi2724
    @ceciliadelvi2724 Před 2 lety

    Hi, at the end of the video you say "we have the same exact numbers" when you compare the last two regressions you run. But the coefficients changed. So, not only from positive to negative.
    I tried with other data and the number of my coefficients also change when I change the category of reference but most importantly the significance changes too. So, I have a variable with 4 categories, when I chose category 1 as reference I get somme significant (p-value) results. But, when I chose category 3 as reference I have no significant results. Could you help me to understand why? And should we use then, the reference that give us the most interesting results?

    • @sebastianwaiecon
      @sebastianwaiecon  Před 2 lety

      The coefficients change because the base group changed. The coefficients always give you the difference from the base group. Any predictions you make will be mathematically identical. The significance would change because you're doing a different test when you change the base group - it's a comparison between different groups now.

  • @jayanthsaishiva
    @jayanthsaishiva Před 3 lety

    Great video. Thanks a lot

  • @asmasultana2732
    @asmasultana2732 Před 3 lety

    @SebastianWaiEcon I am a student of MRes course and I need to understand stata and it's working. I feel it is difficult. Can u train

  • @keeks4914
    @keeks4914 Před 2 lety

    Hi I have an issue. I am trying to convert my string variables (in a group called stage) to numeric variables. I used the encode option and created a new variable called stage_cat. But when I tabulate stage_cat I get no observations. The list of stage_cat also seems to be empty. It looks like the encoding option didn't work. How can I fix this?

    • @sebastianwaiecon
      @sebastianwaiecon  Před 2 lety

      I'm not sure about recode, but did you try using encode like I showed in the video?

  • @redface4444
    @redface4444 Před 4 lety

    I am running a regression using Stata with the dependent variable being R.O.A and the independent variable being green-house-gas emissions. I also have 4 control variables. I also want to control for each industry. For example, firms that operate in the industry sector will typically have higher GHG emissions than firms in the health care sector. Would this be the way to control for each industry? If not is there a way to do so? Thanks

    • @sebastianwaiecon
      @sebastianwaiecon  Před 4 lety +1

      Sounds like dummy variables indicating each industry would be appropriate. You can do this using the methods outlined in this video.

  • @n10f98
    @n10f98 Před 4 lety

    How would I adjust the education variable to comprise of fewer categories before doing a moderation analysis?

    • @sebastianwaiecon
      @sebastianwaiecon  Před 4 lety

      If you want just two categories, then you could just generate a dummy variable indicating the subcategories you wanted.

  • @ericli6027
    @ericli6027 Před 4 lety

    Thanks this is really helpful!
    But what should I do if the original variable is numeric??

    • @sebastianwaiecon
      @sebastianwaiecon  Před 4 lety

      You can put the numeric variable directly in the regression. If you want to use a numeric variable as a categorical variable, you can still do that. You skip the step of encoding, and just use the "i." structure.

    • @ericli6027
      @ericli6027 Před 4 lety

      SebastianWaiEcon ok, thank you!

  • @saurabhsahu175
    @saurabhsahu175 Před 5 lety

    Thank you so much Sir.! Really helped me..!

  • @tahmidfaysal8315
    @tahmidfaysal8315 Před 2 lety

    Thanks a lot

  • @ralphnestorpadero950
    @ralphnestorpadero950 Před 4 lety

    what about the post estimation? is it the post estimation for continuous and categorical variables the same?

    • @sebastianwaiecon
      @sebastianwaiecon  Před 4 lety

      Any postestimation commands can still be used, since this is all contained within the regress command.

  • @thidachawhlaing3494
    @thidachawhlaing3494 Před 5 lety

    Thanks so much for this clear explanation. I am now doing PhD and your videos helps me a lot for my analysis.
    However, can I use "i. " in multiple regression? Is there any differences in STATA 14 and 13 for this creating dummy variable command "i. " in front of the variables we are going to use? Please help me.

    • @sebastianwaiecon
      @sebastianwaiecon  Před 5 lety +2

      You can use the "i." structure in any regression. It definitely works the same in Stata 14 and 15. I haven't used 13 in a very long time, so I can't remember for sure.

    • @thidachawhlaing3494
      @thidachawhlaing3494 Před 5 lety

      @@sebastianwaiecon Thanks so much for the prompt reply. I tested in logistic regression in Stata 14, it works but that command from DO file (Stata 14) did not work in Stata 13.
      I tested it in other friend's computer as I have only stata 13. When I commend all DO files from Stat 14 to 13, this "i" structure did not work!!
      Again, is this "i." structure the same in logistics regression, also no difference in "DO files" either 13 or 14 or 15??
      Appreciated on this online free teaching!

    • @sebastianwaiecon
      @sebastianwaiecon  Před 5 lety +1

      It shouldn't make any difference using a logit. I guess this was a new feature in Stata 14. The oldest reference in my own do files I can find to this was from late 2015, after Stata 14 came out. From version to version, usually not a whole lot changes, but I'd recommend upgrading to Stata 15 at this point.

    • @thidachawhlaing3494
      @thidachawhlaing3494 Před 5 lety

      @@sebastianwaiecon Thanks so much. I think the point is old version Stata needs to be upgraded.

  • @habtamudoe8868
    @habtamudoe8868 Před 4 lety

    Thanks you so much sir.if the dependent and independent variable are categorical how I run it on stata

    • @sebastianwaiecon
      @sebastianwaiecon  Před 4 lety

      See my video on binary choice models for basic ways to handle categorical dependent variables.

  • @201120sebastian
    @201120sebastian Před 5 lety

    Hi thank you for the video very helfpful! I was wondering if now that you have the "educcodes" variable you can drop "educcategory"? Or should you keep it?

    • @sebastianwaiecon
      @sebastianwaiecon  Před 5 lety +1

      There is no particular reason you would drop it, but it's not needed for the "educcodes" variable to function properly once it's been created. You might want to keep in case you want to do something else with it later, though.

    • @201120sebastian
      @201120sebastian Před 5 lety

      @@sebastianwaiecon thank you so much!

  • @emilbinny
    @emilbinny Před 5 lety

    Can I use a dummy variable as my dependent variable..with the i. command.I tried it, but I am getting an error message saying this "depvar may not be a factor variable" so what can I do

    • @sebastianwaiecon
      @sebastianwaiecon  Před 5 lety

      You can use a dummy variable (ie. the values must all be zeros or ones) as the dependent variable. See my video on this: czcams.com/video/vRKesKWMCsg/video.html
      There are special regression tools, such as the multinomial logit, that allow for more complex categoricals as the dependent variable, but I haven't covered those in videos.

    • @emilbinny
      @emilbinny Před 5 lety

      thanks Sebastian...

  • @emotionalstories8152
    @emotionalstories8152 Před 2 lety

    Hi, I need your help, Like i have only two educational categories, 1=PhD, 2nd Mphil then how can I estimate the return of two education levels. I have no the third category for the base then how can I find the returns of these two? My problem is different like I want o estimate the private return of education for these two-level, like how much impact on earning after one-year education increase

    • @sebastianwaiecon
      @sebastianwaiecon  Před 2 lety

      Short answer is that you can't. You cannot estimate a treatment effect if you do not have any observations without the treatment. The best you can do is estimate the difference between the PhD and the MPhil.

    • @emotionalstories8152
      @emotionalstories8152 Před 2 lety

      @@sebastianwaiecon I have cross sectional data of 800 respondents.

  • @saaakill
    @saaakill Před 4 lety

    Can you please make video about how we can get the frequency of a category as a new variable.

    • @sebastianwaiecon
      @sebastianwaiecon  Před 4 lety

      You can do that with egen:
      egen frequency = count(category), by(category)

  • @user-sl5ds8kn5x
    @user-sl5ds8kn5x Před 3 lety

    Thank you!

  • @sunrose68
    @sunrose68 Před 4 lety

    I entered the command 'label define .......' but the result is invalid syntax. I just replicated your steps. Why does this happen?

    • @sebastianwaiecon
      @sebastianwaiecon  Před 4 lety

      You probably have a typo somewhere. Did you forget to put all the numbers in?

    • @sunrose68
      @sunrose68 Před 4 lety

      @@sebastianwaiecon I'm not sure where I did wrong😭 Could you have a look at it: sm.ms/image/hjFQGbX1kpwaz9N

    • @sebastianwaiecon
      @sebastianwaiecon  Před 4 lety +1

      I'm not sure, but my guess is that it didn't like the slash in Others/Uncertainty.

  • @Alex-sy4gg
    @Alex-sy4gg Před 9 měsíci

    legend!

  • @godwinnerarwill2885
    @godwinnerarwill2885 Před 4 lety

    Pls I need help with my data analysis especially creating a composite variable

    • @thedatahall
      @thedatahall Před 4 lety

      what specifically are you looking for?

  • @domillima
    @domillima Před 4 lety

    Are the coefficients here your R^2

    • @sebastianwaiecon
      @sebastianwaiecon  Před 4 lety

      In the upper-right area of the Stata regression output, you can see where it says "R-squared."

  • @mahbubhasan8102
    @mahbubhasan8102 Před 4 lety

    It,s concise.

  • @reyaa8593
    @reyaa8593 Před 4 lety

    What if I want the education but only females, or only males?

    • @sebastianwaiecon
      @sebastianwaiecon  Před 4 lety

      You can add an "if" statement to most commands in Stata if you want to limit your analysis to a certain group.

  • @256hzart
    @256hzart Před 2 měsíci

    Hi I have a string of numbers but it's red. So I use this command to encode but all my numbers have been changed to others. I dont know why, pls help me 😭 🙏🙏🙏

    • @sebastianwaiecon
      @sebastianwaiecon  Před 2 měsíci +1

      The command you probably need is "destring." Most likely what happened is that you have at least one value that is not a number, which is why Stata read it as a string. You first need to make sure all the values are valid numbers, then use destring to generate a numerical version of your variable.

    • @256hzart
      @256hzart Před 2 měsíci

      @@sebastianwaiecon thank u, I can fix it now

  • @hazelw
    @hazelw Před rokem

    Just wanna turn the var after encoding into real numbers, like 1 for Someone, 2 for Anotherone, ... without claiming them one by one cause there are just too much of them. No one, literally no one can tell me how to do this. I am wondering why we are still using STATA rather than R, which is much more direct

    • @sebastianwaiecon
      @sebastianwaiecon  Před rokem

      When you encode, the new variable is just numbers (with labels on top). You can see this by clicking on the values in the data browser. Just make sure to set the order to how you want it. From there, just put the encoded variable into the regression without the "i." structure. You can see me do this in the video at 5:40.

    • @hazelw
      @hazelw Před rokem

      @@sebastianwaiecon Thanks mate:) Just found out that egen group() can also do this.

  • @zerohero109
    @zerohero109 Před 5 lety

    What is the intrepretation?

  • @manpreetuk4277
    @manpreetuk4277 Před 2 lety

    Is there anyone who help me on Stata exam?

  • @nuranidonesru9092
    @nuranidonesru9092 Před 3 lety

    10q very much

  • @YahyaMarei
    @YahyaMarei Před 3 lety

    encode the variable you want to change, gen (new name)

  • @vitriawaode6302
    @vitriawaode6302 Před 6 lety

    thank you...this video is quite helpfull