Data Analysis 7: Clustering - Computerphile

Sdílet
Vložit
  • čas přidán 8. 07. 2019
  • Grouping similar things together - either users with similar habits, or products in an online shop. Dr Mike Pound on Clustering. This is part 7 of the Data Analysis Learning Playlist: • Data Analysis with Dr ...
    This Learning Playlist was designed by Dr Mercedes Torres-Torres & Dr Michael Pound of the University of Nottingham Computer Science Department. Find out more about Computer Science at Nottingham here: bit.ly/2IqwtNg
    This series was made possible by sponsorship from by Google.
    / computerphile
    / computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottscomputer
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Komentáře • 44

  • @Computerphile
    @Computerphile  Před 5 lety +10

    Check out the full Data Analysis Learning Playlist: czcams.com/play/PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba.html

  • @heyandy889
    @heyandy889 Před 4 lety +48

    13:58 "It's worked pretty well; it's not perfect."
    I feel like that should be the slogan for this course and for data science in general.

  • @AubreyBarnard
    @AubreyBarnard Před 5 lety +41

    To hopefully make it clearer to everyone, the iris labels were hidden from the clustering algorithms and then only used after the fact to see how well the clusters recovered the true labels. So the task was still unsupervised because the supervision was hidden. This is a standard technique for evaluating unsupervised machine learning algorithms.

  • @LittleLightCZ
    @LittleLightCZ Před rokem +2

    I love videos like these. I believe that one day people won't say "I learned that on college", but "I went on CZcams and it was all there" instead.

  • @sajadmalik9097
    @sajadmalik9097 Před 3 lety +5

    I love this guy! He is amazing at these things.... Thanks friend this is a lifetime charity. Everyone is going to learn from this at free of cost.

  • @Wolves2314
    @Wolves2314 Před 5 lety +30

    I always click on a new Mike Pound Computerphile video at first sight.

  • @PaulMuston
    @PaulMuston Před 4 lety +8

    How meta. This WAS the video that was recommended for me to watch.

  • @lumine2205
    @lumine2205 Před 5 lety +8

    You're gonna be so happy after google recommends you the Saw movie! A whole movie dedicated to a SAW!

  • @kieranklaassen
    @kieranklaassen Před 5 lety +5

    best series ever! :) thanks so much for this

  • @neptunefinance
    @neptunefinance Před 6 dny

    Title of the video should be : clustering (with Kmeans and pam). I thought we would have some insights on graph theory (PMFG), on hierarchical (with complete and ward linkage), on fuzzy etc... Great video on Kmeans anyway !

  • @ec92009y
    @ec92009y Před 2 lety

    Wood turning videos are fascinating. Can’t wait to see what the CZcams algorithm recommends next for me.

  • @HighlyShifty
    @HighlyShifty Před 3 lety +1

    Thanks so much for this, a great look at clustering

  • @MartinMaat
    @MartinMaat Před 2 lety +1

    If you ever wondered about the expression "high brow", here's your example.

  • @jacques5301
    @jacques5301 Před 5 lety +10

    Just submitted my last paper for my masters degree that HEAVILY relies on clustering algorithms. I really wish this video was released 2 years ago.

    • @jacques5301
      @jacques5301 Před 5 lety +1

      Also please consider making a video on big o complexity. I think it would go really good with this topic.

  • @JohnsonLobster
    @JohnsonLobster Před 5 lety +5

    I watch
    Computerphile videos the same way Mike watches woodturning videos.

  • @rnarith855
    @rnarith855 Před 2 lety

    Very clear explanation

  • @vsandu
    @vsandu Před rokem

    Brilliant, cheers!

  • @williamchamberlain2263

    Depending on which language(s) you're using, DBSCAN libraries can be worth a look

  • @Andrewsarcus
    @Andrewsarcus Před 5 lety +7

    I purchased a wood turning lathe

  • @hans-edwardhoene8333
    @hans-edwardhoene8333 Před 3 lety +2

    If you use the PAM clustering algorithm with an outlier, as in your example, is it possible that PAM would assign the outlier to its own group? In other words, is it possible that the lowest error would be achieved by assigning the outlier to one group and everything else to another group?

  • @adamcetinkent
    @adamcetinkent Před 5 lety +2

    Is there a degree to which the dimensions are weighted when you cluster? Or would you apply the weighting to your data before clustering them?

    • @Jupiter__001_
      @Jupiter__001_ Před 5 lety

      I assume you would do this with the dimensions that come out of PCA, which implies that they have already been weighted.

  • @Ma8t
    @Ma8t Před 5 lety +8

    Hi, thanks for these great videos.
    At 13:35, I'm missing maximise_diag function, in which package is it?

    • @sebastiangilbert9105
      @sebastiangilbert9105 Před 5 lety +1

      I am having the same problem, did you get an answer?

    • @jasper939393
      @jasper939393 Před 4 lety +1

      I can't find any function for this too. ill come back if i find it.

    • @user-hq8tl4oc9o
      @user-hq8tl4oc9o Před 4 lety +1

      Hello. I'm having same issue, did you get an answer?

    • @xfactor7923
      @xfactor7923 Před 2 lety +1

      Same question...No answers yet

  • @Peter-fy3zj
    @Peter-fy3zj Před rokem

    Could I use kmeans for hyperspectral images ?

  • @alecksandrborovkov7602

    :) thanks

  • @guitarislife01
    @guitarislife01 Před 3 lety

    This video was recommended to me after watching an MIT lecture on clustering lol

  • @SM-vo5gj
    @SM-vo5gj Před rokem

    We all end up watching the wood turning videos, lol

  • @ArunKumar-yb2jn
    @ArunKumar-yb2jn Před 2 lety +2

    If CZcams can't subtitle properly a British accent, I have no hope in Artificial Intelligence.

  • @rabidbigdog
    @rabidbigdog Před 5 lety +3

    The joy of wood? Dr Mike could become the Nick Offerman of the UK.

  • @darceysinclair8929
    @darceysinclair8929 Před 2 lety

    PSA: CZcams uses a mix between association analysis and clustering analysis

  • @ElPasoJoe1
    @ElPasoJoe1 Před 4 lety

    K nearest neighbors...

  • @vivekupadhyay6663
    @vivekupadhyay6663 Před 2 lety

    Jared aka Donald Dunn spotted

  • @SixTough
    @SixTough Před 3 měsíci

    PCA got almost equally wrong result 😂

  • @ramixnudles7958
    @ramixnudles7958 Před 5 lety +1

    Only at 11 min in, but erm confused. You're clustering, but you don't have any idea of what you're clustering on? You aren't clustering on a dimension? I.e., music genre
    What gives you a red/blue division in the first place - "it just looks like here's a cluster... And that, there, is a cluster. Now, let's make 'em fit..."? I'm not understanding.

    • @AubreyBarnard
      @AubreyBarnard Před 5 lety +5

      Clustering is just defining groups based on similarity. That similarity could be based on one or on any number of attributes. It is unsupervised which means there are no labels in the data. Now, the algorithms label each point with the same label as the closest cluster center, thereby labeling each cluster. Then the cluster centers are adjusted to better reflect the cluster members, and then the labels are updated based on the new cluster centers (which can assign points to different clusters compared to the previous round). Repeat until convergence, which is when no points are assigned to a different cluster and when the cluster centers stay the same. This is when the "best fit" is achieved.

  •  Před 5 lety +1

    I remember firefly.