Video není dostupné.
Omlouváme se.

Identifying Multivariate Outliers with Mahalanobis Distance in SPSS

Sdílet
Vložit
  • čas přidán 18. 08. 2024
  • This video demonstrates how to identify multivariate outliers with Mahalanobis distance in SPSS. The probability of the Mahalanobis distance for each case is calculated using the “Compute Variable” function in SPSS.

Komentáře • 133

  • @scarlettthorn9060
    @scarlettthorn9060 Před 3 lety +24

    Honestly at this point I want to acknowledge you in my thesis thank you notes. Thank you Dr Grande, you are a gem.

  • @mimimcgee5512
    @mimimcgee5512 Před 2 lety +4

    Thank you for another helpful video. I am just a month or so away from receiving my doctorate and your videos have greatly assisted me in that! I'm brushing up in prep for my final defense and appreciate all of your videos. Thank you!

  • @Swityie
    @Swityie Před 2 lety

    Dr Todd, you've saved my life! I was dying with the Mahalonobis!!! Was crying at midnight while getting stuck at this.
    Thank you again!

  • @yacinehajji1784
    @yacinehajji1784 Před 8 lety +1

    I would like to thank you for speaking loudly and slowly, very usefull for someone not native English like me.

  • @shapsgh
    @shapsgh Před 3 lety

    Just Realized that the values of MD and Chi-Square test exactly match the output of the AMOS' outlier table. Thanks Dr. Grande

  •  Před 2 lety

    This video was very helpful! Thanks for sharing your knowledge for free on CZcams!

  • @thomasbarnes5703
    @thomasbarnes5703 Před 2 lety

    Thank You Dr. Grande, I have no background in statistics....yet had to take a course as a portion of my degree requirements. Your video have really helped me understand this very difficult subject!!

  • @ThatFellowOnline
    @ThatFellowOnline Před 7 lety

    Fabulous video, explained clearly, concisely. I like how you have also shown the importance of labelling data properly and presentation (decimals) etc as this is really important when keeping data organised i.e. not just focusing on having a tidy output.

    • @DrGrande
      @DrGrande  Před 7 lety

      I am glad you found this video useful - thanks for watching.

  • @arnelferaer6486
    @arnelferaer6486 Před 4 lety +6

    Dude you're a legend. Thank you for this.

  • @oscarespinozaparra6840

    Thank you Todd Grande for this extraordinary how to video. This was a prayer answered and feel so much better listening and following your instructions. I want to express how sincerely grateful for the detail analysis and steps you indicated on this video.

  • @krunal699
    @krunal699 Před 2 lety

    Dr Grande you are a saviour! Thank You!

  • @efrestein
    @efrestein Před 4 lety +1

    Your videos add a ton of value!

  • @naftalibendavid
    @naftalibendavid Před 2 lety

    This has proven so helpful again and again! Thanks.

  • @CoreFocusCoaching
    @CoreFocusCoaching Před 3 lety +1

    Amazing!! You should do a separate video for the Chi-square distribution. Nowhere on CZcams is the second part to the explanation and because it is not overtly flagged in the title it does not show up.
    Either way thank you so much!!

  • @zarifbaihaqi8538
    @zarifbaihaqi8538 Před 4 lety +1

    Thank you very much Dr Tod..you helped me a lot.....

  • @denniscraggs8393
    @denniscraggs8393 Před 6 lety

    I liked your presentation. SPSS has evolved from the old text script product. I am a current user of both Minitab and Matlab.
    I am studying the Mahalanobis Distance and see that it has many applications. The SAE and ZVEI published a standard where electronics were judged to be fit for use in a temperature x voltage environment defined by a potato shape. However, they never provided a method of dealing with the different unit scale distances. I am thinking the Mahalanobis Distance would be a more technically correct means of classifying a component's fitness for use in a temperature x voltage environment.

  • @voltisathartori6451
    @voltisathartori6451 Před 6 lety

    Thank you Dr Todd, for such a awesome explanation.It was very beneficial for my study to move on.

  • @fasamad6730
    @fasamad6730 Před 7 lety

    Wonderful explanation. Enjoyed the session. Thank u Todd Grande it was a great help

    • @DrGrande
      @DrGrande  Před 7 lety

      You're welcome, thanks for watching -

  • @lyrahazel2079
    @lyrahazel2079 Před 4 lety

    Omg thank you i was so frustated . My data wouldnt met the normal multivar assumption until i stumbled onto this!

  • @payonrayaneh
    @payonrayaneh Před 8 lety

    Very useful......Thanks a lot professor Grande.

  • @thankyou6555
    @thankyou6555 Před rokem

    Thank you! Very helpful.

  • @hafizahusairi
    @hafizahusairi Před 5 lety +2

    Thank You!! I more understand after watching your video =)

  • @mohammedimam3651
    @mohammedimam3651 Před 2 lety

    Wooooow! This is extremely useful! Thank you! 👌

  • @ammaarkidwai2732
    @ammaarkidwai2732 Před 3 lety +1

    Hi Todd! Great video as usual. Why was the cut off for the probability_MD column .001? Is that the norm cut off or based on your data?

  • @St0rytell3r
    @St0rytell3r Před 6 lety

    Thanks for the video, very thorough.

  • @thewaterhub
    @thewaterhub Před 8 lety

    Thank you, very useful video and clear explanation.

  • @jaynastics2
    @jaynastics2 Před 4 lety +1

    Very helpful video!

  • @Thejubeabides24
    @Thejubeabides24 Před 4 lety +1

    Excellent video!

  • @muhammadfaisal9918
    @muhammadfaisal9918 Před 4 lety +4

    Thank you Dr. Todd for your awesome work. This is a very useful video. I am wondering if you could mention the reference for this process (or a reference for the significance value - is it by Tabachnick & Fidell 2007?). Many thanks

    • @sebastiankruse4981
      @sebastiankruse4981 Před 2 lety

      Hair et al 2010 also recommend this process. They suggest to divide MD by the number of predictors and then designate outliers in small samples if these values surpasse 2.5 and in large samples if they surpass 4. I think the 2.5 cutoff point corresponds very closely to the .001 p-value used by Dr. Grande.

    • @Lello991
      @Lello991 Před 2 lety

      @@sebastiankruse4981 Hi! Could you please provide the full reference for Hair et al 2010? Is it this one?
      Hair, J.F., Black, W.C., Babin, B.J., & Anderson, R.E. (2010). Multivariate Data Analysis. Seventh Edition. Prentice Hall, Upper Saddle River, New Jersey

    • @sebastiankruse4981
      @sebastiankruse4981 Před 2 lety

      @@Lello991 yes, that‘s the one

  • @alibezzaa809
    @alibezzaa809 Před 4 lety

    I really appreciate the efforts your are putting to making concepts easy to understand. Do you have a video on transforming a multivariate outlier to a dummy variable.

  • @jahanzaibalvi2010
    @jahanzaibalvi2010 Před 3 měsíci

    thats great. thank you so much sir

  • @ibrahimmkheimer5311
    @ibrahimmkheimer5311 Před 3 lety +1

    awesome video dr

  • @harithfarhan5535
    @harithfarhan5535 Před 3 lety +1

    thanks for this

  • @HughMupfunya
    @HughMupfunya Před 5 lety +1

    Awesome... Thank you very much

  • @thoshsamanthar4815
    @thoshsamanthar4815 Před 5 lety +1

    Dr Todd, the video helped me a lot. I have 2 questions
    1) I have an integrated framework, where analysis is done in 2 stages. Should I check MD for each stage? One of my variable will look like a mediator but it is not. It will be a DV in first stage and subsequently an IV in 2nd stage of the analysis. Stage 1 and stage 2 does not have any connection. I have done each testing and got different Prob_MD / outliers to be deleted.
    2) Should I include demographic questions as part of df, as the prob outliers results are different when I omit or include?

  • @jongsuksong7493
    @jongsuksong7493 Před 8 lety

    Thank you so much for your great explanation! It really helped me a lot!

    • @DrGrande
      @DrGrande  Před 7 lety

      I'm glad you found the video useful. Thanks for watching.

  • @HarerimanaAlexis
    @HarerimanaAlexis Před 5 lety

    Dear Dr Todd, Thank you very much for this wonderful video. I h
    ave the same question about how do you decide on the degree of freedom, and whether .001 is the absolute rule. Thank you

  • @guitaqui
    @guitaqui Před 2 lety

    Perfect !!! Thank you!!!

  • @Elianaco
    @Elianaco Před 8 měsíci

    Hello, thank you for your helpful videos. Quick one, I'm running a moderation with multiple mediators. Are mediator variables independent variables? I'm trying to run the Malanobis distance but unsure if I should add my mediators to the IV box. Thank you

  • @kamrannawaz
    @kamrannawaz Před 7 lety +1

    Thanks very helpful.....I understand that why you used 3 as DF, however please explain what is Chi Square?

  • @henkpiet1908
    @henkpiet1908 Před 11 měsíci

    What do I do if there’s a missing value in one of the scales when I use pair wise deletion for my regression. In that case the mahalanobis distance returns a missing value as well.

  • @wongjanice7753
    @wongjanice7753 Před 8 lety +1

    Thank for your sharing! I would like to ask a question: if i detected 8 outliers with Mahalanobis distance, is this necessary for me to delete all outliers ? or 8 outliers out of 200 respondents is still in acceptable range ? is there any reference mention about it ?

    • @j.a.o.5535
      @j.a.o.5535 Před 8 lety

      +Wong Janice According to Mead and Craig (2012, Identifying Careless Responses in Survey Data), you may have up to 20 careless responders, especially if you used web-based questionnaires, so I would eliminate those 8 outliers to improve the quality of the data, although it is not always a straightforward rule.

  • @felipemcse
    @felipemcse Před 8 lety +1

    Thanks for the video, Todd. Do you have some references that explains why the number of degree of freedom should be the same of the number of variables?

  • @herix7342
    @herix7342 Před 3 lety

    Great contribution! Is there any reference for the described procedure?

  • @GeeWhit
    @GeeWhit Před 7 lety +1

    Thanks for the great video!
    Does this method expose two-tailed outliers? If not, how can this be achieved?

  • @evannadhim6631
    @evannadhim6631 Před 7 lety

    Todd, thank you so much for this clear explanation, but you've done the identification for multivariate outliers with Mahalanobis distance for the cases.
    My question: is there any differnce if we can do it for variables?
    As the variables have their onw distributions while they are affected by the outliers

  • @patfennell
    @patfennell Před 7 lety

    Great video - thanks for posting!

    • @DrGrande
      @DrGrande  Před 7 lety

      You're welcome - thanks for watching.

  • @rahimbehrad63
    @rahimbehrad63 Před 8 lety +1

    Thanks Dear Todd. great !

  • @MrFoganholo
    @MrFoganholo Před 9 lety +4

    Todd, great explanation! Thanks. One question: Why you used 3 as degree of freedom? Why you used .001 as reference? Can I use for any sample? Thanks again.

    • @DrGrande
      @DrGrande  Před 9 lety +9

      +André Foganholo Three degress of freedom were used because there were three variables in the analysis. Using the probability of .001 is a common practice when identifying multivariate outliers.

    • @n.einstein6088
      @n.einstein6088 Před 8 lety +18

      +André Foganholo as a reference for the .001 threshold I used Tabachnick, B.G., & Fidell, L.S. (2007). Using Multivariate Statistics (5th Ed.). Boston: Pearson. (p. 74). according to www-01.ibm.com/support/docview.wss?uid=swg21480128. just in case anyone needs that.

    • @wenyuanliu4602
      @wenyuanliu4602 Před 6 lety

      Thanks everyone!

    • @rich_guides
      @rich_guides Před 5 lety

      He used 3 because of the number of independent variables. Thank you

  • @jameslebron9412
    @jameslebron9412 Před 6 lety

    Dear Todd nice video clip. I have a question that in your video i think you are using 3 independent variable and 1 dependent variable so actually you are using 4 variables totally.
    I guess degree of freedom in this case is 4-1 = 3 since you are measuring distance on the 4 dimensional scales.

  • @omidmahdieh7882
    @omidmahdieh7882 Před 2 lety

    Hello Dr. Grande. Thanks for your helpful demonstration. Can items be used to calculate Mahalanobis distances? Or should I use variables. I mean composite variables.

  • @ljubomirpupovac2009
    @ljubomirpupovac2009 Před 7 lety

    Hi Todd. Thanks for the video. Just one question: your main independent variable is program? Shouldn't we compare MAH_1 value for samples that received treatment and ones that didn't? The things is, main independent variable is not used in the analyze, so whatever value I put there the results (removed cases will be the same). Regards

  • @frajtervivien
    @frajtervivien Před 8 lety +1

    Thank you so much it was a lifesaver!

  • @RichardMcCrory_Neph
    @RichardMcCrory_Neph Před 7 lety +1

    +Todd Grande - could I check the degrees of freedom for the Chi-Square distribution is n or n-1. e.g. for 20 variables, is the d.f. 20 or 19?

  • @farhanselfatan
    @farhanselfatan Před 7 měsíci

    Thank you dr

  • @karimatouati5256
    @karimatouati5256 Před 3 lety

    Thank you for this useful video. I have a question please : What to do in case of ordinal variables when checking for these outliers ? what method is the adequate one? Mahala Distance or Cook's Diastance ?
    Does it have sense to apply this method when my data is only composed with ordinal variables and not continuous ones ?

  • @94bfm
    @94bfm Před 6 lety

    Great explanation! Thank you so much!

  • @marinacuk1400
    @marinacuk1400 Před 8 lety

    Thanks you for this very helpfully video. Whether these method may be applied to lognormal datasets? Whether it is necessary the data to follow a normal distribution?

  • @chinchinhoh7893
    @chinchinhoh7893 Před 6 lety

    Dr Grande, 1 question. Frequently, the examples of identifying & handling outliers are about independent variables. Does it mean that we don;t have to identify & handle the outliers of dependent variables? TQ!

  • @moroomario4007
    @moroomario4007 Před 2 lety

    Sir, if I used a Likert scale, the DV should be the mean score of all the items and IV should be the score of each items?

  • @ravindarmadishetty736
    @ravindarmadishetty736 Před 7 lety

    Dear Todd good explanation. The outliers which we got are similar to Residual(Actual-Predicted) outliers to remove from the data?

  • @xunzhou962
    @xunzhou962 Před 8 lety

    Exactly what i need! Thank you!

  • @zohalh14
    @zohalh14 Před 2 lety

    Thanks for the video! Can you use Mahalanobis distance if your IVs are categorical in a mixed anova?

  • @polomarco1256
    @polomarco1256 Před 4 lety

    hi. Dr. Todd. Thanks for sharing knowledge. May I ask you something? Can I use Mahalanobis distance for identify multivariate outliers with ordinal data?

  • @cecyliaadamczak4301
    @cecyliaadamczak4301 Před 2 lety

    Hi Dr. Grande, can we include the outcome variable (DV) with the IV in the mahalanobis distance analysis?

  • @prof.thakshilakumari7847

    Thank you so much I followed your video and did the test with my sample. But I have a question on the degree of freedom? why you consider it 3?

  • @sskshats6453
    @sskshats6453 Před 7 lety

    Thanks Alot. May.Allah bless you

  • @chinhankim
    @chinhankim Před 5 lety

    Dr.Grande, I have two independent variables and three mediation variables of one dependent variable. Question is should I put five variables(independent plus mediation variables) to figure out outliers or should I put only two independent variables? Thanks.

  • @moeshams4504
    @moeshams4504 Před 4 lety +1

    Excellent!

  • @rashidsaid-ti3jz
    @rashidsaid-ti3jz Před 4 lety

    Thank you Dr.Todd for these useful lessons. Please can you mention for the reference of using formula which you wrote in compute variable.
    1-..chi(mahalanobis, df).
    Thanks alot

    • @nahk-lx2tn
      @nahk-lx2tn Před 4 lety

      rashid said he is not replying to actual questions. That’s sad

    • @rashidsaid-ti3jz
      @rashidsaid-ti3jz Před 4 lety

      @@nahk-lx2tn hi wasim, I found the reference (hair, 2014)

  • @maheshvykuntam2809
    @maheshvykuntam2809 Před 7 lety

    +Todd Grande - Thanks a lot for the great explanation. Could you please help me in understanding- 1. Will this process work even if we have missing values. Why do we use DF as 'n' y not n-1.? Thanks a lot for the help.

  • @khaledlahlouh6944
    @khaledlahlouh6944 Před 4 lety

    Dear Dr. Todd, how should we do when we have a model with many IV, two mediators and two VD ? should we consider the mediators as IV ?

  • @annabelleatkin1884
    @annabelleatkin1884 Před 6 lety

    Would you include control variables as predictors in the regression? And if you're testing a latent interaction in MPlus, do you simply input the observed variables into the regression in SPSS to do this test?

  • @ninab6136
    @ninab6136 Před 7 lety

    so i guess mahalonobis cant be calculated when you have missing values somewhere in the items. any other way i can include those cases?

  • @micahgardner7836
    @micahgardner7836 Před 2 lety

    what if one of your variables was excluded by SPSS when calculating Mahalonobis Distance? Are the degrees of freedom the same, or would you minus one? Example, 5 variables entered but one was excluded. Would degrees of freedom be 5 or 4?

  • @KristinColletteScott
    @KristinColletteScott Před 6 lety

    Hi Dr. Grande,
    I've got 7 constructs (3 IVs, 3 intermediary, and 1 DV) each with multiple items. How do do you recommend handling these when searching for D2? I also need to test for multivariate normality using the Wald statistic on the same data set. Do you have a video on that?

  • @barbaratoson6455
    @barbaratoson6455 Před 7 lety

    Great video. Could you recommend a method to identify outliers in an RM ANOVA set up? I am looking for something similar to INFLUENCE option in SAS MIXED procedure but for SPSS

  • @shafeekafadlikhzamri7068

    hello Dr.Todd. Your video helped a lot and the steps are easily understood. but i seemed to have too many outliers , i would like to have your contact to ask you regarding this matter.

  • @kathrinho9136
    @kathrinho9136 Před 8 lety

    Hi, I have one question on the method. Hope you can help me :). In your data set, you have your manipulations, descripted as "program" and then you said that you have your independents named "functioning, severity, motivation". 'Why do additional metric independents exist in your file? In my data set I have 2 independents but they are in a nominal scale. So, what do I put in the text box of the linear regression where it says "independents"? Thanks in advance!!

  • @godnkr236
    @godnkr236 Před 5 lety

    thanks for this amazing video!

  • @loversloss101
    @loversloss101 Před 5 lety

    So what happens when you follow these instructions and every number you get for the MAH_1 is the same?

  • @madiharazzam1098
    @madiharazzam1098 Před 6 lety

    i have a sample of 300 and 2 predictors. what would be the Mahalanobis Distance for it???

  • @chriskeran4480
    @chriskeran4480 Před 9 lety

    Dr. Grande--thank you kindly. Awesome demonstration. The question I have relates to the number of independent variables (IV) chosen when calculating a Mahalanobis Distance (MD). Should the particular IVs chosen be related in some way or can you through in all of your numeric variables into the one regression when attempting to find multivariate outliers using MD?

  • @Oz4rmEg
    @Oz4rmEg Před 3 lety

    Best vid ever

  • @drarsen33
    @drarsen33 Před 2 lety

    "Bloody hell. What is this Mahalanobis distance. I have never done it before...I am in bit of a pickle. Let me check youtube."
    clicks on first link. Video starts
    "Wait...I know this voice...is it...." scroll down.
    Well, thank you once again Dr Grande :D

  • @ainannur5836
    @ainannur5836 Před 6 lety

    Mr Todd, I have 4 variable; AsliG, AsliB, GreenBP, and BlueBP. I want to know the value of Mahalanobis distance between (AsliG AsliB) and (GreenBP BlueBP). Can I calculate its variable using Mahalanobis distance using SPSS? Why I cant input 2 variable in dependent and two independent other in SPSS?

  • @wpadilla72
    @wpadilla72 Před 4 lety

    Dear Dr. Grande, my variables are measured by likert scale...how must be applied the Mahalanobis test in this cases?...thanks

  • @oliviasimms3897
    @oliviasimms3897 Před 3 lety

    Hi, does anyone know why it won't give me output when I add two variables to the 'independents box? I can get output for them both separately but cannot get 1 output for them both

  • @alexandrafiedler3113
    @alexandrafiedler3113 Před 4 lety

    Do I use for CLP-Analysis (2-waved longitudinal design) the dependend variable time 1 or time 2? sorry but i am confused whether i compute Mahalanobis d for the regression term in my CLP-Model with: Dependend Variable (t2) regressed ON --> Dependend Variable (t1), independent Variable (t1), Moderator (t1). Or it wont matter if I do the mahalanobis for a simple regression time 1: Y1 regressed ON --> X1, M1 (and what about my second independent variable ? - should i put it into the regression for timepoint 1, too?)
    I would be very glad if anybody could help me with this confusion !! :D

  • @abdulmoeed4661
    @abdulmoeed4661 Před 2 lety

    If we have more than one independent latent variables, mediators and final dependent variable, how we would place them in the 'Independent & Dependent ' variables list box while doing this test? Thanks Waiting for response.

  • @next_trip_loading
    @next_trip_loading Před 6 lety

    can we apply ANOVA for the factor at 2 level? I have seen lot of studies using 2 levels and testing it with ANOVA.. Secondly, don't know how they check the normality when they use single item likert scale .. could you please explain me this concept

  • @devildman3128
    @devildman3128 Před 8 lety

    hi, are there any changes to be made if I find negative values for the probability_MD?

  • @desterward
    @desterward Před 6 lety

    Hi. Is it possible to use it in non-linear multivariate as well? Thanks

  • @adrianfajar323
    @adrianfajar323 Před 4 lety

    prof, i have 3 dependent variable and 6 independent variable, how to see mahalanobis ?

  • @zubairawan9088
    @zubairawan9088 Před rokem

    Why have you selected the p-value to be 0.001?

  • @priyas8052
    @priyas8052 Před 8 lety

    What if you get zero as a result for one of the rows?

  • @karolinakonciute9299
    @karolinakonciute9299 Před rokem

    God bless

  • @selamawitweldegebriel3421

    This was very helpful, how do we contact you. Cause I have an urgent problem

  • @sskshats6453
    @sskshats6453 Před 4 lety

    what if we have 5 dependent variables and just one independent ??

  • @nehakeshri933
    @nehakeshri933 Před rokem

    loved it