K-Means Cluster Analysis in SPSS (SPSS Tutorial Video #30)

Data Demystified

zhlédnutí 39 996

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 26. 07. 2024
In this video I describe how to conduct and interpret the results of K-Means Cluster Analysis in SPSS. I especially emphasize using Hierarchical cluster analysis analysis to first determine the number of clusters in your data and then use that result as an input to the k-Means algorithm.
This SPSS tutorial series is designed to teach you the basics of how to analyze and interpret the results of data using SPSS. I will cover everything from the very basics of the main windows within SPSS, to manipulating data, to running and interpreting meaningful analyses like t-tests, ANOVA, regression, and many more, and visualizing results.
Link to Hierarchical Cluster Analysis Video: • Hierarchical Cluster A...
Link to K-Mean Cluster Analysis Video: • K-Means Cluster Analys...
Link to Two Step Cluster Analysis Video: • Two Step Cluster Analy...
The data file used in this video can be found here: drive.google.com/file/d/1-Bbn...
Video tutorial and walkthrough of the data file used in this video: • Introduction to Data F...
Playlist of video covering INTUITION for statistics and data science: • Data Intuition
All the SPSS tutorial videos are in this playlist: • SPSS Tutorials
Learn more about who I am and why I'm doing this here: • Data Demystified - Who...
Follow me at:
LinkedIn: / jeff-galak-768a193a
Patreon: / datademystified
Website: www.jeffgalak.com/datademystified
Equipment Used for Filming:
Nikon D7100: amzn.to/320N1FZ
Softlight: amzn.to/2ZaXz3o
Yeti Microphone: amzn.to/2ZTXznB
iPad for Teleprompter: amzn.to/2ZSUkNh
Camtasia for Video Editing: amzn.to/2ZRPeAV

Komentáře • 68

@nawilliam2754 Před 2 lety
After a long search , finally something easy to understand
@jessicamartin1446 Před 2 lety
Great! I was able to complete my entire assignment, using only this video
@dsavkay Před 4 měsíci
Great advanced info, subscribed!
@anass2243 Před rokem
I really thank you for this great series of videos they have been so much useful in my research
@tacs3 Před 8 měsíci
thank you so much! for this one and the hierarchical one!
@bernardoluca6613 Před 3 lety ⁺³
Fantastic explanation! nothing to do with all those videos out there! keep going like this!
@DataDemystified Před 3 lety
Thanks!
@martinpeikert6746 Před rokem
So clear, thank you so much!
@xeniavlasenko9830 Před 3 lety ⁺²
This is the 5th video I wath on K-Means and it FINALLY made sense. Thank you so much!
@DataDemystified Před 3 lety
I'm so glad to hear that! Is there something in particular that made the content here more understandable? I ask so that I can make sure to incorporate that type of teaching in my other videos. Thanks!
@xeniavlasenko9830 Před 3 lety ⁺¹
@@DataDemystified I guess commenting along the way on how to interpret the results/ how all these program steps and numbers in the tables are part of the "story" was particularly helpful :)
@DataDemystified Před 3 lety ⁺¹
@@xeniavlasenko9830 Thank you for the feedback! I will make sure to incorporate it into new tutorial videos!
@StevenWang82 Před rokem
Thank you very much, this video is very easy to understand !!
@abdullahisani9746 Před rokem
Thanks for the demonstration
@miakirk7010 Před 2 lety
Very clear explanations. Thank you.
@DataDemystified Před 2 lety
Thanks!
@transitionperf_MPO Před 2 lety ⁺¹
Thank you for a great explanation! I was wondering how to view demographic characteristics between each established cluster. For example, viewing percentage breakdowns of age, gender, etc. in each cluster. Thanks!
@LXiao33 Před 2 lety ⁺¹
brilliant! thank you for uploading this video!
@DataDemystified Před 2 lety
My pleasure!
@LXiao33 Před 2 lety
@@DataDemystified I wonder whether I should choose cluster analysis in SPSS or perform latent class analysis using Mplus to identify the underlying groups in my data, I am still a bit confused. Can you kindly provide some advice? Thank you.
@DataDemystified Před 2 lety
@@LXiao33 That entirely depends on your research question. Without knowing that, I really can't answer your question. Sorry!
@GenuineReciprocity Před 2 lety ⁺⁴
Your videos are so easy to understand and its so amazing how many people your kindness has been helping! I have a small question and was wondering if you can share your insight about it if you have time available. A study that I am trying to replicate has categorized individuals based on whether they score above or below the mean on two variables (i.e., high high, high low, low high, low low - 4 categories). I was advised that that technique was crude and that I should instead use a cluster analysis to categorize the groups. Why would cluster analysis be a better statistical analyses than what the original authors did in categorizing the variables? Sorry to trouble you! I look forward to more of your incredibly helpful videos!
@erikailles9598 Před 2 lety ⁺¹
You are a hero!
@DataDemystified Před 2 lety ⁺¹
Ha. Thank you!
@lydialim1993 Před 3 lety ⁺¹
Wonderful series! Keep it up!
@DataDemystified Před 3 lety
Thank you! Any topics you'd specifically like to see covered?
@lydialim1993 Před 3 lety
@@DataDemystified Any chance you'll do one on Structural Equation Modelling? Like I know it's a bunch of regressions under the hood, but it would be nice to see a proper demo of how to use one in real life.
@DataDemystified Před 3 lety ⁺¹
@@lydialim1993 Great idea, but I don't know if that'll happen any time soon. The challenge is that you need the AMOS package for SPSS, which most people don't have (including me, at the moment). That said, I'll look into how much demand there is for something like this! Thanks for the suggestion!
@aarinwood4522 Před 2 lety
Great series of videos -- thank you! I do have one follow up question: What are the sample size requirements for Cluster Analysis? Thank you!
@ezeugochukukere1538 Před 2 lety
This is very helpful. Oddly enough the reason i came across this video was because i was searching on how to calculate the initial cluster centers in SPSS.
I need them for my R script to perfectly replicate the K mean clusters analysis i run in spss...inputting the initial cluster centers calculated in SPSS provides the exact same results for the final cluster solution in R
...it was the first thing you said we don't need but i am pretty desperate in my search to find out how those initial cluster centers are calculated. Any help you could provide would be huge
@tracyquetzal9477 Před 2 lety
Hi Professor, very good presentation. I would like to know how can you understand your cluster in order to label them? What patterns do you find to classify your cluster?
@user-ry2pb8zg7w Před rokem
thank you for the great video, would you please explain about how to apply elbow method to find cluster number?
@katiesharp8080 Před 2 lety ⁺⁴
Hi I love your videos, really helping me analysis my dissertation data :) I was wondering if you had any videos that touched on how to identify the characteristics of your clusters? i.e. age, gender, those sort of things?
@DataDemystified Před 2 lety ⁺²
I don’t, but basically you’re just going to run either t-tests/ANOVA or cross tabs. You’d use the cluster number as the independent variable and your demographic as the dependent variable. I have a bunch of videos on those techniques in the SPSS playlist on this channel. Good luck!
@GhadeerShm Před 10 měsíci
hi can I did references for the way how you had selected the variables ? or what it is called ?
@mahdifareghi3916 Před 9 měsíci
Hello if any video about anaaysis kmean results deeper
@zahraalinam62 Před 3 měsíci
Which method of hierarchical or K-means is the most appropriate for dichotomous variables with binary coding (0,1) showing the presence and absence of a variable?
@lingkan1984 Před 7 měsíci
To cluster analysis for multimorbidity, is there any special format to arrange the data?
@musiknation7218 Před 2 lety
I need to do assignments between kmean and improved kmean cluster analysis,can pls tell me how to do that
@zahraalinam62 Před 3 měsíci
In case the Sig for some variables is bigger than .001 what should we do? Should we screen and remove them and do the cluster analysis again?
@deborahhaile4191 Před 2 lety
How can run k-mean clustering algorithm for 40 sample with four variables to group the sample to into two?
@rabeeyafarooq2788 Před 3 měsíci
How do we define the names as to what is increasing and what is not
@vindaflyfox Před 2 měsíci
Hello, I am wanting to follow this process by doing a hierarchical cluster analysis to determine the k for my k-means analysis. My question is, my variables are not all on the same scale so in the hierarchical cluster analysis I will need to convert them into z-scores or something similar so they are comparable. How does this impact the k-means cluster analysis? Do I need to do an extra step here or will my variables already be converted and able to be used again after the hierachical analysis?
@musiknation7218 Před 2 lety
How to do improved k mean cluster analysis
@divyajaiswal4330 Před 11 měsíci ⁺¹
Can k means clustering data be represented graphically? If yes, how?
@tacs3 Před 8 měsíci
how can we plot this data in spss the way R does? is there a way?
@mehmettolgataner8878 Před 3 měsíci
Is it the same on SPSS29?
@mariabecker1803 Před 3 lety ⁺¹
Dear Jeff, I was wondering if I could ask you one more question. As I am working with z-scores and trying to compare the means (of z-scores) at the end of the cluster analysis in order to show the difference of variables within and between the clusters, I encountered very high means of z-scores ranging up to 4 or 5. Could this be an indication of outliers? Would you suggest me to remove all the outliers before the analysis or would this change the dataset too much and you would just report it as it is? Thank you!!
@DataDemystified Před 3 lety
4-5 on a z-score is pretty high. We typically consider statistical outliers as being more than 3 standard deviations from the mean (which translates to a z-score of 3 or more). The choice to remove data, based on outliers, however, is a lot more complex. Did you pre-specifiy that you would do so? Are you doing it because your results, inclusive of the outliers don't "Look good"? The point is to make sure that your exclusion isn't going to artificially inflate Type 1 error (p-hacking). Good luck!
@mariabecker1803 Před 3 lety ⁺¹
@@DataDemystified I did not pre-specify that I would do that. Just compared to other cluster analysis, with other data, and their results (mean z-scores), mine are very high, so I thought that I might have done a mistake and that it would be best to remove the outliers. However, I do not want to manipulate my data. Maybe it is enough to just mention the high z-scores but leave them in the data? Thank you!
@DataDemystified Před 3 lety
@@mariabecker1803 I don't know what context you're reporting in (academic paper, school assignment, etc...) but transparency is always a good thing. At minimum, add a footnote with the explanation. Better yet is a robustness check that is explicitly exploratory: see what happens when you drop those outliers. Do the results meaningfully change? If they do, report that and speculate as to why. If they don't, report that as well with a note about how your results are robust to their removal.
@mariabecker1803 Před 3 lety
@@DataDemystified Dear Jeff, it´s part of my dissertation so I really want to do a thorough job. I will definetly do a robustness check and am curious to see what will change. So thank you for your advice!
@joycethegreat9259 Před 3 měsíci
During my conjoint analysis, there is no important value and utilities because spss stated "no analysis is performed because there are no valid cases" how to solve this. I did cluster analysis to get the utilities and std.error of each cluster but after performing conjoint to my one cluster, conjoint won't show results. Please help. I have no missing values, no duplication and whatsoever.
@sachikogaming1137 Před 2 lety
Is it necessary to correlate first the variables before proceeding to clustering. Is it important to select only variables that are correlated, for analysis.
@DataDemystified Před 2 lety
Nope. Clustering does not require variables to be correlated.
@mariabecker1803 Před 3 lety ⁺¹
Hi, I was wondering how to read in cluster centers from an external file (after having done the hierarchical clustering) as SPSS always shows error messages (not correct format or one variable name is incorrect). Do you have a video for that? or any solution to my problem?
@DataDemystified Před 3 lety
Sorry you're having trouble with that. I don't have a video on the topic and don't often import cluster centers from an external file. Is there a reason you are doing it that way rather than natively running the analysis on the data?
@mariabecker1803 Před 3 lety
@@DataDemystified yes, I am using k-means clustering in order to validate the cluster centers/numbers of clusters that I have calculated with hierarchical clustering. Therefore, I want to use the cluster centers that I have (from the hierarcical clustering) as a starting point and see what changes when I do the k-means clustering. However, no matter what I do (even when I do everything according to the literature) I get error messages and SPSS has troubles reading in the cluster centres from an external file. Would you know what I could do to avoid the error messages and get my results?
@DataDemystified Před 3 lety
@@mariabecker1803 Got it. One option is to just re-run your hierarchical clustering with the original data and then, in the same data file, run the k-means clustering. Save the cluster membership for both analyses, and then do your comparison. If that's not possible and the import isn't working, you can always do it manually. As in, sort the data by some identifier and copy and paste the column of data from your original data (where the hierarchical analysis is) into the new data file (where you plan to run k-means). I hope that helps!
@mariabecker1803 Před 3 lety
@@DataDemystified Thank you! I have tried that already and it works to compare the two in the same data file. This is not the problem. However, I saw that the cluster memberships are completely different (hierarchical and kmeans), therefore I wanted to do the k-means clustering with the same cluster centers as I discovered in the hierarchical in order to see where the difference is when both have the same starting point, if that makes sense? It is just that there is no other way in order to put in the starting points (cluster centers) manually and only do it with the read in, I guess? which in my case is not working. Therefore, I do not know how to proceed.
@DataDemystified Před 3 lety
@@mariabecker1803 My only suggestion at this point is to make sure you are using Ward's Method in your hierarchical clustering. That tends to give results closest to k-means. Good luck!
@aviralbhatt1664 Před 2 lety
Hello, I have a doubt and I would really appreciate if you could clarify it. So do we use Hierarchial Cluster Analysis to identify the potential clusters and then K-Means to understand how those clusters are different from each other?
@DataDemystified Před 2 lety ⁺¹
We use Hierarchical Cluster analysis to identify the most likely # of clusters. We then use k-means to actually create those clusters and explore them. Hope that helps!
@aviralbhatt1664 Před 2 lety
@@DataDemystified yes it does thanks alot 🙌
@Netsi-ed6ee Před 3 měsíci
prof i need your support can you help me

Další v pořadí

Automatické přehrávání

Two Step Cluster Analysis in SPSS (SPSS Tutorial Video #31)