StatQuest: Hierarchical Clustering

Sdílet
Vložit
  • čas přidán 3. 07. 2024
  • Hierarchical clustering is often used with heatmaps and with machine learning type stuff. It's no big deal, though, and based on just a few simple concepts. If you want to draw a heatmap using R, I've put some sample code on my webiste: statquest.org/statquest-hiera...
    For a complete index of all the StatQuest videos, check out:
    statquest.org/video-index/
    If you'd like to support StatQuest, please consider...
    Buying The StatQuest Illustrated Guide to Machine Learning!!!
    PDF - statquest.gumroad.com/l/wvtmc
    Paperback - www.amazon.com/dp/B09ZCKR4H6
    Kindle eBook - www.amazon.com/dp/B09ZG79HXC
    Patreon: / statquest
    ...or...
    CZcams Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshirt.com/statques...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer.bandcamp.com/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    #statquest #ML #clustering

Komentáře • 361

  • @statquest
    @statquest  Před 2 lety +10

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @Aemilindore
    @Aemilindore Před 3 lety +146

    You're a person who saved me lots of time and pain. Thank you. I wish you the best

  • @anamulmbdu
    @anamulmbdu Před 6 lety +182

    The intro song removed my fear of clustering. Thanks for the awesome video.

  • @kristinomalley4519
    @kristinomalley4519 Před rokem +23

    You are, and I cannot stress this enough, a national treasure!! The ease in how you explain things that have eluded me for over a decade and make it click is truly a gift. Thank you so freaking much!!!

  • @julieboissiere4553
    @julieboissiere4553 Před 2 lety +14

    I used to watch your videos while I was a student. It’s been 3 years since my graduation and I’m still here (I’m changing jobs and need to review some stuff).
    Thank you a lot for your incredible work

    • @statquest
      @statquest  Před 2 lety +6

      Congratulations on the new job! BAM! :)

  • @davidescobar4449
    @davidescobar4449 Před 5 lety +3

    I have to congratulate you for this video, it gives the basic notions of the hierarchical cluster easy and fast. Bravo!

  • @rajshrestha9484
    @rajshrestha9484 Před 4 lety +55

    I can't thank you enough. Such clear and helpful explanations. Great.

  • @fadikhattar290
    @fadikhattar290 Před rokem +3

    I still don't believe how this content is free. Thank you sir!

  • @chikken007
    @chikken007 Před 4 lety +4

    I already watched some of your videos. This one I watched because I want to apply hierarchical clustering in my thesis. It is about time I buy one of your sweaters. I hope this supports you. Thanks for all the truly great explanations.THANK YOU!

  • @stephenwood9252
    @stephenwood9252 Před rokem +3

    Love your videos. The fact that you make it so simple shows the depth of your understanding.

  • @brunomartel4639
    @brunomartel4639 Před 4 lety +123

    this video proved that "hard" stuff =badly explained stuff

    • @sindhujas7807
      @sindhujas7807 Před 3 lety +1

      so fuckin true. Not sorry for swearing. Happy learning guys

    • @gummybear8883
      @gummybear8883 Před 3 lety +4

      if you can't explain something in simple terms, then you don't understand it that well.

    • @julius4858
      @julius4858 Před 3 lety +6

      @@gummybear8883 or you've been a professor for 20 years and are so deep into a topic that you completely forgot how people approach new problems. Your sentence really only applies to novices trying to be teachers.

    • @Joreselin
      @Joreselin Před 3 lety +5

      @@julius4858 We could just change it to: if you can't explain something in simple terms, then you can't teach it that well.

    • @julius4858
      @julius4858 Před 3 lety

      @@Joreselin Yeah, that is absolutely true. Many of my professors for theoretical computer science are experts on various fields but man do their explanations suck. That's why I have to watch youtube videos for stuff like this.

  • @yamikag8363
    @yamikag8363 Před 2 lety +4

    your videos help me see the "big picture" of concepts. after your videos, I can actually understand what is going on and why we are doing something. Thank you!

  • @scraps7624
    @scraps7624 Před 2 lety +7

    This channel is a treasure! Absolutely incredible job my man

  • @jingsilu5568
    @jingsilu5568 Před 2 lety +1

    Thank you for clearly explaining the details at a moderate speed! You save me lots of time!

  • @davidcartwright337
    @davidcartwright337 Před 5 lety +2

    great videos, I like the way you explain these topics

  • @pragyamishra9083
    @pragyamishra9083 Před 2 lety +5

    The visualizations and simplicity of explanations as well as great examples motivate me to keep learning. Thank you so much for making it so interesting. I'll try to do my bit by buying a t-shirt. 😊

  • @abhayjoshi2121
    @abhayjoshi2121 Před 2 lety +1

    You are simply amazing !! I love your style and simplicity and the word is BAM! .. your videos are very informative and worth going through... thanks for all your hard work in simplifying the complex topics

  • @99harshini
    @99harshini Před 4 lety +6

    Absolutely brilliant..Thank you sooo much for your time and effort!

  • @urjaswitayadav3188
    @urjaswitayadav3188 Před 7 lety

    Great explanation. Thanks StatQuest!

  • @calebsawe8307
    @calebsawe8307 Před 2 lety +1

    I am super grateful for this video. You are such an excellent teacher! Thank you for being such a "you"

  • @fellsantfernandoargentin2072

    Congratulations from Brazil!

  • @liranzaidman1610
    @liranzaidman1610 Před 4 lety +19

    Very nice.
    I use this in Python and it's a really good way to cluster.
    Another thing - from coding aspect, it's only 1 line of code in Seaborn, very easy.

  • @gurkanyesilyurt4461
    @gurkanyesilyurt4461 Před 3 lety +1

    you saved yet another day Josh. Thank you

  • @congchen170
    @congchen170 Před 7 lety

    Joshua's video is always helpful. Next time, probably k-means clustering.

  • @anastasiyakuznetsova8797
    @anastasiyakuznetsova8797 Před 2 lety +1

    The best as always! Love this channel! It's super easy to understand

  • @websciencenl7994
    @websciencenl7994 Před rokem +1

    StatQuest is the Best! Teaching is an art...and these are master pieces.

  • @LBsCuriosity
    @LBsCuriosity Před 5 lety

    really awesome video! This will help me with my test. Thank you!

  • @rodrigohaasbueno8290
    @rodrigohaasbueno8290 Před 5 lety +1

    I love this channel so much

  • @farzanaferdousi9885
    @farzanaferdousi9885 Před 3 lety +1

    Your explanation is very clear to me and i see all your video, you are very friendly to me. I like you very much.

  • @eamiller12
    @eamiller12 Před 2 lety +1

    THANK YOU! This is has been SO HELPFUL!

  • @proggenius2024
    @proggenius2024 Před 2 měsíci +1

    awesome content and delivery

  • @robertogff
    @robertogff Před 3 lety

    Congratulations! your video is so great! you explain is a very clear and simple way.

  • @LetWorkTogether
    @LetWorkTogether Před 4 lety +3

    I love this. Your video is wonderful!

  • @jonathanlam7204
    @jonathanlam7204 Před 7 měsíci +1

    Thank you. Better than university teaching

  • @user-vg8dp5tb9w
    @user-vg8dp5tb9w Před rokem

    This channels is truly a treasure trove! I was wondering if you could do a video on consensus clustering? I.e. how to evaluate clustering across multiple models and parameters. You are awesome!

  • @HiasHiasHias
    @HiasHiasHias Před 11 dny +1

    StatQuest never disappoints

  • @vishk123
    @vishk123 Před 6 měsíci +1

    Thank you for allowing me to ascend the stats hierarchy!

  • @shamanthrajreddy1230
    @shamanthrajreddy1230 Před 2 lety +1

    Excellent explanation!

  • @saikiranjajula2033
    @saikiranjajula2033 Před 4 lety +1

    Thank You Sir, It was awesome to learn from you.

  • @12bjab
    @12bjab Před 5 lety +2

    just beautiful!

  • @nnnyin6967
    @nnnyin6967 Před rokem +1

    I am preparing my actuarial exam and you saved me a lot❤

  • @alyssawang144
    @alyssawang144 Před 3 lety +1

    fantastic explanation, thank you so much for this video.

  • @loftyTHEOWNER
    @loftyTHEOWNER Před 2 lety +1

    I would like to add that:
    - single-linkage (comparing the closest points of 2 clusters) tends to form more elliptic clusters;
    - complete-linkage tends to form more globular clusters.
    So, that means that not scaling your data, scaling with a StandardScaler, or with a MinMaxScaler will affect your clustering.

  • @jovanmampusti4025
    @jovanmampusti4025 Před 2 lety +1

    Thank you so much sir! This is very helpful and very informative.

  • @balajicanchi5538
    @balajicanchi5538 Před 6 lety

    Explained in a simple manner.

  • @tymothylim6550
    @tymothylim6550 Před 3 lety +1

    Thank you very much for this video! It was really well done :)

  • @mojtabasardarmehni453
    @mojtabasardarmehni453 Před 3 lety +1

    Great as always! Thanks.

  • @saipanchajanya5980
    @saipanchajanya5980 Před 4 lety +1

    This is Awesome......
    Please Make a session on K Modes, KNN and K Prototypes

    • @statquest
      @statquest  Před 4 lety

      Here's a complete list of my videos so far: statquest.org/video-index/

  • @lukehebert6207
    @lukehebert6207 Před 4 lety +1

    Very helpful, thank you!

  • @oliviagallupova9199
    @oliviagallupova9199 Před 4 lety +1

    You saved me a week

  • @python_information601
    @python_information601 Před 2 lety +1

    Nice explanation 👍👍

  • @isha996
    @isha996 Před 6 lety +1

    Please add a video on Latin Square design, Joshua!
    I am going to pass my stats final tomorrow, only because of your videos :D
    your students are lucky.

    • @isha996
      @isha996 Před 6 lety +1

      The CPA and clustering question was worth 30% of total marks on my exam today, and I managed to write them so well only because of your videos. you're a savior. Thank you!!

  • @CapoeiraPiper
    @CapoeiraPiper Před 3 lety +1

    Man your videos are soo super helpful! THANK YOU (ps consider the color library viridis to make it easier for the colorblind)

  • @fabiomaia3433
    @fabiomaia3433 Před 3 lety +3

    Hey Josh! Your videos are great! Thank you for the effort you've put on it!
    If you allow me... have you considered making videos explaining DBSCAN and HDBSCAN?

    • @statquest
      @statquest  Před 3 lety +2

      Yes, I've thought about those topics and may make a video about them.

  • @subhabrataghosh9831
    @subhabrataghosh9831 Před 3 lety +1

    Excellent Sir

  • @Paulamiz
    @Paulamiz Před 3 lety +2

    Watching this after watching your more recent videos. Missed your 'BAM's a lot!!! You should remake these old videos again! Thanks :)

    • @statquest
      @statquest  Před 3 lety +2

      bam! :)

    • @Paulamiz
      @Paulamiz Před 3 lety +2

      @@statquest 😍

    • @vakarthi4
      @vakarthi4 Před 2 lety

      Found this gem of a channel today. Agreed on the fun rhymes and puns.

  • @Sean-lz2dh
    @Sean-lz2dh Před rokem +1

    great video. thank you very much

  • @ardaugurlu8673
    @ardaugurlu8673 Před 5 lety +2

    Good job mr josh.

  • @sonakshigarg4273
    @sonakshigarg4273 Před 4 lety

    You can explain the same concept with may be some other datasets and better visualisation other than heatmap

  • @yyma8037
    @yyma8037 Před 4 lety

    Great video!
    Do you have any plans to talk about co-clustering, look forward to it.

  • @raghavmoar3211
    @raghavmoar3211 Před 5 lety

    Thanks for the video

  • @2327853
    @2327853 Před 4 lety +2

    @StatQuest please explain probability and Naive Bayes. Thanks in advance! I am a huge fan of your way of teaching and your small songs creations. Keep up the good work!

  • @preranadas4037
    @preranadas4037 Před 4 lety +4

    Hello Josh! The videos are soooooooo goooood! These are BAMMMMM Good!!
    1 request - Could you please create a video on LCA - Latent Class Analysis? Maybe by comparing it to k-means clustering? I cannot be more thankful!

  • @italosayan4747
    @italosayan4747 Před 6 lety

    beautiful BRO!

  • @maikfranke2303
    @maikfranke2303 Před rokem +1

    Amazing! Your Videos are so much comrehensible. I really enjoy watching!!!*_*

  • @AdnanGora
    @AdnanGora Před 5 lety +1

    Awesome video

  • @cfonsecaparis812
    @cfonsecaparis812 Před 2 lety +1

    Hi Josh, I am really enjoying your videos specially the wha whas and bam !! , you make stats sound easy but also fun! Thank you! I wonder if you could please do a video to explain the different uses of PCA and HCA, when do you use one or the other? In the mean time I will watch your videos on PCA and HCA :) hooray!

    • @statquest
      @statquest  Před 2 lety

      BAM! Thank you very much! I'll keep that topic in mind.

  • @Argho555
    @Argho555 Před 2 měsíci +1

    Thank You

  • @setareht7546
    @setareht7546 Před 2 lety

    Thank you for all your videos clearly explaining complex concepts. Can you also make video(s) on different bi-clustering methods?

  • @MrKingoverall
    @MrKingoverall Před 4 lety +2

    I LOVE YOU JOSH !

  • @iranziemiler8135
    @iranziemiler8135 Před 4 lety +1

    Thank you

  • @locdaikathegreat3689
    @locdaikathegreat3689 Před 4 lety +1

    So cool the video!

  • @marahakermi-nt7lc
    @marahakermi-nt7lc Před 11 měsíci +1

    ohh my god thanks josh u are so brilliant i think marvel should add another new superhero "josh starmer the life saver"

  • @khawlaou5385
    @khawlaou5385 Před rokem +1

    You're THE BEST

  • @samhobbs4996
    @samhobbs4996 Před 4 lety +1

    Great video

  • @snay6869
    @snay6869 Před rokem +1

    thank you so much!

  • @muhammadiqbalmarzuki
    @muhammadiqbalmarzuki Před 4 lety +1

    This video is super duper bam bam double double bam!
    Will you cover more advanced clustering techniques such as model-based clustering (MCLUST) and weighted gene co-expression network analysis (WGCNA)? I'm learning about these things now for my research, and will be very grateful if you can cover these topics for me. Thanks! :)

  • @surbhardwaj1721
    @surbhardwaj1721 Před 3 lety

    Amazing explanation. Please make a video on Cluster evaluation. :)

  • @chuxbouch2793
    @chuxbouch2793 Před 3 lety +1

    You're amaaaaaazing

  • @mountainsunset816
    @mountainsunset816 Před rokem +1

    The opening is always funny

  • @ankitabhavsar886
    @ankitabhavsar886 Před 20 dny +1

    the intro.......nice one bro🖐

  • @veloisamascarenhas7531
    @veloisamascarenhas7531 Před 5 lety +1

    how can clustering be applied on spectral data?

  • @soffapute
    @soffapute Před 3 lety +1

    Love the song!

  • @hamidkiangaikani
    @hamidkiangaikani Před 2 lety +1

    4.4 K likes, zero dislikes! You're awesome. Thanks very much

  • @oasdfe1691
    @oasdfe1691 Před 4 lety +1

    thank you

  • @user-ib9lp8zx6x
    @user-ib9lp8zx6x Před 6 lety +1

    Hi, Joshua. Do you know the basics of pseudotime analysis in single-cell RNA-seq. Can you make a short video talking about the basics? Thanks!

    • @statquest
      @statquest  Před 6 lety +1

      I'll put that on the to-do list!

  • @zzzluke8906
    @zzzluke8906 Před 8 měsíci +1

    Hi Josh, amazing video as always. Think you can come up with video on how to determine the best number of clusters to have? I get the Elbow method, but I really struggle with the inconsistent method. I was looking at the inconsistency coefficients, and I am confused to do they include singleton clusters, or are singleton clusters excluded. I am also confused about what exactly is the "jump" in the inconsistent coefficient that we are supposed to look out for.

    • @statquest
      @statquest  Před 8 měsíci

      I'll keep that topic in mind.

  • @kanacaredes
    @kanacaredes Před 3 lety +1

    Hi Josh!! We need a DBSCAN tutorial please!!!!

  • @MihirSriramVadali
    @MihirSriramVadali Před 4 dny

    Great channel. Clearly explained all most all the topics i watched on ML. Here one question what does gene stands for is it features of the data ?

  • @yvonnemadegwa967
    @yvonnemadegwa967 Před 5 lety +1

    Thank you very much! Can you teach software's? Like R-basic introduction, basics of how to arrange date with various commands?

    • @statquest
      @statquest  Před 5 lety +1

      I have a handful of videos that teach you how to do certain things in R. They don't start at the very beginning, but I still go one step at a time. You can find these videos on the index page: statquest.org/video-index/

    • @yvonnemadegwa967
      @yvonnemadegwa967 Před 5 lety

      @@statquest Thank you very much.

  • @danielwikstromshemer5947
    @danielwikstromshemer5947 Před 11 měsíci +1

    You are amazing

  • @naturelove9396
    @naturelove9396 Před 3 lety

    Hey you explain this very well and in very simple form thanks for this, I request you could you please make one video on DEGseq2, means finding DEG gene between the time points and then drawing the heatmap, volcano plot and cluster lines.
    Thanks

    • @statquest
      @statquest  Před 3 lety

      I'll keep that in mind. I already have a few videos on DESeq2 here: statquest.org/video-index/

  • @sandipansarkar9211
    @sandipansarkar9211 Před 2 lety +1

    finished watching

  • @user-gd2zf9ym4h
    @user-gd2zf9ym4h Před 5 měsíci +1

    You saved my life😇 Thank you very much.
    And I think the link for the sample code in R isn't available right now...

    • @statquest
      @statquest  Před 5 měsíci

      Yep, that's a really old link. Here's a new one: statquest.org/statquest-hierarchical-clustering/

  • @solibozorgmehr6524
    @solibozorgmehr6524 Před 3 lety

    Thanks for the explanation. Can you please make a video about consensus NMF clustering?

  • @daminithandele7237
    @daminithandele7237 Před 4 lety +1

    Hi Josh! Can you please make a video on DBSCAN, if possible? Especially the parameter tuning part of it, I'm sure that would be of great help to lots of people.

  • @anthonychan4478
    @anthonychan4478 Před 5 lety +3

    Hi Joshua, can you do a video on Gaussian Mixture Models? Also, your videos are awesome! Keep it up.

    • @statquest
      @statquest  Před 5 lety +6

      The good news is that is already on the To-Do list. I'll bump it up a notch since you requested it as well.

    • @jordanmakesmaps
      @jordanmakesmaps Před 5 lety +2

      @@statquest, make that two requests! Thanks!

    • @statquest
      @statquest  Před 5 lety

      @@jordanmakesmaps Cool! It's in the top 10 things for me to do, so hopefully I'll get to it soon.

    • @jacobmoore8734
      @jacobmoore8734 Před 5 lety +1

      @@statquestYes! Anytime people start talking about gaussian mixture models, EM, "sampling the posterior", and MCMCs - I get cold sweats.

  • @shandra5923
    @shandra5923 Před 5 lety

    Thank you!!!!!!!!!!!!!! :)

  • @the_data_panda
    @the_data_panda Před 5 lety +2

    @StatQuest with Josh Starmer, in this video you are clustering and combining genes (the attributes of data), aren't you supposed to cluster and combine the samples? that's the inverse of the approach shown

    • @statquest
      @statquest  Před 5 lety +5

      You can cluster the samples or the genes, or both! It all depends on the question you are asking. For example, if I have some healthy people and some sick people, I might be interested in clustering the people (to see if healthy people form one cluster and unhealthy people form another) or I might be interested in clustering the genes. In this case I would find out which genes are correlated and up-regulated in healthy people compared to unhealthy people. Or I could do both. Does that make sense?

  • @manuelsokolov
    @manuelsokolov Před rokem +1

    Dear StatQuest! Thank you for the explanation.
    1. What is the best would you would evaluate the algorithm (silluete score,...) to decide which clustering method and distance to use ( i undestand that silluete score is good to choose the number of k but not to decide between algorithms)?
    To decide the best algorithm i have been ploting PCA and color label by clusters created this way understanding if the clusters make sense or not? (however it is known by literature that PCA does not work well to evaluate binary data)
    2. In the case that the data is binary, (e.g instead of expression data, genomic alteration data) what kind of distance would you use?
    Best Regards, Manuel

    • @statquest
      @statquest  Před rokem

      1) I guess it depends. If I had "training" data, with known categories, I would compare how many times the data were correctly and incorrectly grouped. Otherwise, it really just boils down to subjective preference.
      2) If you measure a lot of things, the euclidian distance will still work in this situation.