UMAP explained | The best dimensionality reduction?

Sdílet
Vložit
  • čas přidán 25. 07. 2024
  • UMAP explained! The great dimensionality reduction algorithm in one video with a lot of visualizations and a little code.
    Uniform Manifold Approximation and Projection for all!
    ➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak.creator-spring....
    📺 PCA video: • PCA explained with int...
    📺 Curse of dimensionality video: • The curse of dimension...
    💻 Babyplots interactive 3D visualization in R, Python, Javascript with PowerPoint Add-in! Check it out at bp.bleb.li/
    ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
    Patreon: / aicoffeebreak
    Ko-fi: ko-fi.com/aicoffeebreak
    ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    Outline:
    * 00:00 UMAP intro
    * 01:31 Graph construction
    * 04:49 Graph projection
    * 05:48 UMAP vs. t-SNE visualized
    * 07:31 Code
    * 08:12 Babyplots
    📚 Coenen, Pearce | Google Pair blog: pair-code.github.io/understan...
    📄 UMAP paper: McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arxiv.org/abs/1802.03426
    📺 Leland McInnes talk ‪@enthought‬ : • UMAP Uniform Manifold ...
    🎵 Music (intro and outro): Dakar Flow - Carmen María and Edu Espinal
    -------------------------------
    🔗 Links:
    CZcams: / aicoffeebreak
    Twitter: / aicoffeebreak
    Reddit: / aicoffeebreak
    #AICoffeeBreak #MsCoffeeBean #UMAP #MachineLearning #research #AI

Komentáře • 103

  • @lelandmcinnes9501
    @lelandmcinnes9501 Před 3 lety +96

    Thanks for this -- it is a very nice short succinct description (with good visuals) that still manages to capture all the important core ideas. I'll be sure to recommend this to people looking for a quick introduction to UMAP.

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +13

      Wow, we feel honoured by your comment! Thanks.

  • @gregorysech7981
    @gregorysech7981 Před 3 lety +17

    Wow, this channel is a gold mine

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +10

      I beg to differ. It is a coffee bean mine. 😉

  • @CodeEmporium
    @CodeEmporium Před 2 lety +3

    This is really good. Absolutely love the simplicity 👍

  • @bosepukur
    @bosepukur Před 3 lety +3

    didnot know about babyplot...thanks for sharing !

  • @dengzhonghan5125
    @dengzhonghan5125 Před 3 lety +4

    That baby plot really looks amazing!!

  • @fleurvanille7668
    @fleurvanille7668 Před 3 lety +5

    I wish you are the teacher of all subjects in the world! Many thanks

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +4

      Wow, this is so heartwarming! Thanks for this awesome comment! 🤗

  • @ShubhamYadav-xr8tw
    @ShubhamYadav-xr8tw Před 3 lety +8

    I didn't know about this before! Thanks for this video Letitia!

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +3

      Glad it was helpful! UMAP is a must-know for dimensionality reduction nowadays.

  • @Shinigami537
    @Shinigami537 Před 2 lety +2

    I have seen and 'interpreted' so many UMAP plots and have not understood its utility until today. Thank you.

  • @dexterdev
    @dexterdev Před 3 lety +10

    wow! that is a very well dimensionally reduced version of UMAP algo

  • @dzanaga
    @dzanaga Před 3 lety +4

    Thanks for making this clear and entertaining! I love the coffee bean 😂

  • @ImbaFerkelchen
    @ImbaFerkelchen Před rokem +1

    Hey Letitia, really amazing Video on UMAP. Love your easy to follow explanations :D Keep up the good work

  • @python-programming
    @python-programming Před 3 lety +3

    This is incredibly helpful. Thanks!

  • @willsmithorg
    @willsmithorg Před 2 lety +3

    Thanks. I'd never heard of UMAP. Now I'll definitely be trying it as a replacement the next time I reach for PCA.

  • @20Stephanus
    @20Stephanus Před 2 lety +2

    1st video i saw. Loved it. Subscribed.

  • @capcloud
    @capcloud Před rokem +1

    Love it, thanks Ms. Coffee and Letitia!

  • @gurudevilangovan
    @gurudevilangovan Před 2 lety +2

    2 videos in and I’m already a fan of this channel. Cool stuff! 😎

  • @denlogv
    @denlogv Před 3 lety +7

    Great work, Letitia! Needed this kind of introduction to UMAP :) And thanks for the links!

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +4

      Glad it was helpful, Denis!
      Are you interested in UMAP for word embedding visualization? Or for something entirely different?

    • @denlogv
      @denlogv Před 3 lety +3

      @@AICoffeeBreak yeah, something similar. Actually I found its use in BertTopic very interesting, where we reduce dimensionality of document embeddings (which leverage sentence-transformers) to later cluster and visualize different topics :)
      towardsdatascience.com/topic-modeling-with-bert-779f7db187e6

  • @emanuelgerber
    @emanuelgerber Před 3 měsíci +1

    Thanks for making this video! Very helpful

  • @ehtax
    @ehtax Před 3 lety +4

    very fun and educative explanation of a difficult method! keep the vids coming ms coffeebean!!

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +3

      Thank you! 😃 There will be more to come.

  • @DerPylz
    @DerPylz Před 3 lety +8

    I finally understand!

  • @marianagonzales3201
    @marianagonzales3201 Před 2 lety +1

    Thank you very much! that was a great explanation 😊

  • @damp8277
    @damp8277 Před 3 lety +2

    Fantastic! Such a good explanation, and thanks for the babyplot tip. Awesome channel!!!

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +3

      So glad you like it! ☺️

    • @damp8277
      @damp8277 Před 3 lety +1

      @@AICoffeeBreak It'll be very helpful. In geochemistry we usually work with 10+ variable, so having a complement to PCA will make analysis more robust

  • @ylazerson
    @ylazerson Před 2 lety +1

    Awesome as always!

  • @rohaangeorgen4055
    @rohaangeorgen4055 Před 3 lety +3

    Thank you for explaining it wonderfully 😊

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +1

      So nice of you to leave this lovely comment here! 😊

  • @vi5hnupradeep
    @vi5hnupradeep Před 3 lety +3

    Thank you so much !

  • @floriankowarsch8682
    @floriankowarsch8682 Před 3 lety +3

    Very nice explanation!

  • @babakravandi
    @babakravandi Před 9 měsíci +1

    Great video!

  • @arminrose4946
    @arminrose4946 Před 2 lety +5

    This is really fantastic stuff! Thanks for teaching it in such an easy-to-grasp way. I must admit I didn't manage the original paper, since I am "just" a biologist. But this video helped a lot.
    I would have a question: I wanted to project the phenological similarity of animals at certain stations, to see which stations were most similar in that respect. For each day at each station there is a value of presence or absence of a certain species. Obviously there is also temporal autocorrelation involved here. My first try with UMAP gave a very reasonable result, but I am unsure if is a valid method for my purposes. What do you think, Letitia or others?

  • @DungPham-ai
    @DungPham-ai Před 3 lety +5

    Love you so much.

  • @user-vg3qj1cv8h
    @user-vg3qj1cv8h Před 2 lety +3

    Find a great channel! Thanks for sharing

  • @sumailsumailov1572
    @sumailsumailov1572 Před 2 lety +2

    Very cool, thanks for it!

  • @hiramcoriarodriguez1252
    @hiramcoriarodriguez1252 Před 3 lety +8

    The visuals are amazing

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +4

      You're amazing! *Insert Keanu Reeves meme here* 👀

  • @dionbridger5944
    @dionbridger5944 Před rokem +1

    Very nice explanation. Do you have any other videos with more information about umap? What are the limitations as compared with e.g. deep neural nets?

  • @HighlyShifty
    @HighlyShifty Před rokem +1

    Great introduction to UMAP, thanks

  • @BitBlastBroadcast
    @BitBlastBroadcast Před 2 lety +2

    great explanation!

  • @jcwfh
    @jcwfh Před 2 lety +2

    Amazing. Reminds me of Gephi.

  • @HoriaCristescu
    @HoriaCristescu Před 3 lety +4

    felicitari pentru un canal excelent

  • @ChocolateMilkCultLeader
    @ChocolateMilkCultLeader Před 3 lety +3

    Great vid

  • @talithatrost3813
    @talithatrost3813 Před 3 lety +5

    Wow! Wow! Ich mag es!

  • @AmruteshPuranik
    @AmruteshPuranik Před 3 lety +2

    Amazing!

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +1

      You're amazing! [Insert Keanu Reeves meme here] 👀
      Thanks for watching and for dropping this wholesome comment!

  • @furek5
    @furek5 Před 2 lety +2

    Thank you!

  • @shashankkumaryerukola
    @shashankkumaryerukola Před 8 měsíci +1

    Thank you

  • @MachineLearningStreetTalk

    Hello 😎

  • @kiliankleemann4251
    @kiliankleemann4251 Před 8 měsíci

    Very nice :D

  • @klammer75
    @klammer75 Před rokem

    This almost sounds like an extension of KNN to the unsupervised domain….very cool🥳🧐🤓

  • @pl1840
    @pl1840 Před 2 lety +7

    I would like to point out that the statement around 6:44 that says that changing the hyperparameters of tSNE completely changes the result of the embedding is very likely to be the result of a random initialisation on tSNE, whereas the UMAP implementation you are using brings the same initialisation for each set of hyperparameters. It is good practice to initialise tSNE with PCA; if that was the case in the video, the results between hyperparameter changes in tSNE and UMAP would be comparable.

  • @cw9249
    @cw9249 Před rokem +1

    interesting how the 2d graph of the mammoth becomes kind of like the mammoth on its stomach with its limbs spread out

  • @arnoldchristianloaizafabia4657

    Hello, What is the complexity of UMAP? . Thanks for the video.

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +3

      I think the answer to your question is here 👉github.com/lmcinnes/umap/issues/8#issuecomment-343693402

  • @thomascorner3009
    @thomascorner3009 Před 3 lety +2

    Great introduction! What is your background if I may ask?

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +1

      I'm from physics and computer science. 🙃 Ms. Coffee Bean is from my coffee roaster.

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +1

      What is your background if we may ask? And what brings you to UMAP?

    • @thomascorner3009
      @thomascorner3009 Před 3 lety +1

      @@AICoffeeBreak Hello :) I thought as much. My background is in theoretical physics, but I am making a living in analyzing neuroscience (calcium imaging) data. It seems that neuroscience is now very excited in using the latest data reduction techniques, hence my interest in UMAP. :) I really like the "coffee bean" idea: friendly, very approachable and to the point.

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +1

      Theoretical physicist in neuroscience! I'm impressed.

  • @terryr9052
    @terryr9052 Před 2 lety +1

    I am curious if anyone knows if it is possible to use UMAP (or other projection algorithms) in the other direction: From a low dimensional projection -> a spot in high dimensional space?
    An example would be picking a spot between clusters in the 0-9 digit example (either 2d or 3d) and seeing what the new resulting "number" looked like (in pixel space).

    • @AICoffeeBreak
      @AICoffeeBreak  Před 2 lety +3

      What you are asking for is a generative model. But let's start from the bottom.
      I don't want to say that dimensionality reduction is easy, but let's put it like this: summarizing stuff (dim. reduction) is easier than inventing new stuff (going from low to high dimensions). Because the problem you are asking about is a little loser defined since all these new dimensions have to be filled *meaningfully*.
      Happily, there are methods that do these kinds of generations. In a nutshell, one trains them on lots and lots of data to generate the whole data sample (an image of handwritten digits) from summaries. Pointer -> you might want to look onto (variational) Autoencoders and Generative Adversarial Networks.

    • @terryr9052
      @terryr9052 Před 2 lety +1

      @@AICoffeeBreak Thank you for the long response! I am moderately familiar with both GANs and VQ-VAEs but did not know if a generated sample could be chosen from the UMAP low dimensional projected space.
      For example, the VAE takes images, compresses it to an embedded space and then restores the original. UMAP could take that embedded space and further reduce it to represent it in a 2D graph.
      So what I want is 2D representation -> embedding -> full reconstructed new sample. I was uncertain if that 1st step is permitted.

    • @AICoffeeBreak
      @AICoffeeBreak  Před 2 lety +2

      ​@@terryr9052 I would say yes, this is possible and I think you are on the right track, so I'll push further. :)
      With GANs, this is minimally different, I will focus on VAEs for now:
      *During training* a VAE does exactly as you say: image (I) -> low. dim. embedding (E) -> image (I), therefore the name AUTOencoder. What I think is relevant for you is that E can be 2-dimensional. The dimensionality of E is actually a hyperparameter and you can adjust it like the rest of your architecture flexibly. Choosing such a low dimensionality of E might only mean that when you go from I -> E -> I, the whole process is lossy. I -> E (the summary, encoder) is simple. But E -> I, the reconstruction or in a sense: the re-invention of information (decoder) in many dimensions is complicated to achieve from only 2 dimension. Therefore it is easier when the dimensionality of E is bigger (something like 128-ish in "usual" VAEs).
      In a nutshell, what I just described in the I -> E step is what any other dimensionality reduction algorithm does too (PCA; UMAP; t-SNE). But this time, it's implemented by a VAE: The E -> I step is what you want, and here it comes for free. Because what you need is the *testing step*.
      You have trained a VAE that can take any image, encode it (to 2 dims) and decode it. But now with the trained model, you can just drop the I -> E and position yourself somewhere in the E space (i.e. give it an E vector) and let the E -> I routine run.
      I do not know how far I should go, because I also have thoughts for the case where you really, really want to use I -> E to be forcibly the UMAP routine and not a VAE encoder. Because in that case, you would need to train only a decoder architecture. Or a GAN. Sorry, it gets a little too much to put into a comment. 😅

    • @terryr9052
      @terryr9052 Před 2 lety +2

      @@AICoffeeBreak Thanks again! I'm going to read this carefully and give it some thought.

  • @TooManyPBJs
    @TooManyPBJs Před 3 lety

    I think it is import umap-learn instead of import umap. Great video. Just weird I cannot get it to run on google colab. When I run cell with bp variable, it is just blank. No errors. Weird.

  • @hannesstark5024
    @hannesstark5024 Před 3 lety +2

    Nice video! And 784 :D

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +2

      Thank you very much! Did Ms. Coffee Bean say something wrong with 784? 😅

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +2

      Ah, now I noticed. She said 764 instead of 784. Seems like Ms. Coffee Bean cannot be trusted with numbers. 🤫

  • @renanmonteirobarbosa8129
    @renanmonteirobarbosa8129 Před měsícem

    I am afraid you did not fully understood the mechanism of Information geometry behind UMAP and how the KL-divergence acts as the "spring-dampener" mechanism. Keenan Crane and Melvin Leok have great educational materials on the topic.

  • @divergenny
    @divergenny Před rokem

    will be here tSNE ?

  • @nogribin
    @nogribin Před rokem +1

    wow.

  • @lisatrost7486
    @lisatrost7486 Před 3 lety +4

    Hoffentlich viele Freunde Vertrauen! Ich bringe meine Freundin, ihr Haus zu kaufen!

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +2

      It looks like we have a strong Cold Mirror fanbase here. Ms. Coffee Bean is also a fan of hers, btw.

  • @quebono100
    @quebono100 Před 2 lety +1

    ValueError: cannot reshape array of size 47040000 into shape (60000,784)

    • @quebono100
      @quebono100 Před 2 lety +1

      What the matter with this xD

    • @quebono100
      @quebono100 Před 2 lety +2

      Ok I solved this, I had 6k instead of 60k

  • @pvlr1788
    @pvlr1788 Před 2 lety

    Does the Babyplots librari still supported? It does not work for me in all envs I've tried.. :(

    • @DerPylz
      @DerPylz Před 2 lety +4

      Hi! I'm the creator of babyplots. Yes, the library is still actively supported. If you're having issues with getting started, please join the babyplots discord server, which you'll find on our support page: bp.bleb.li/support or write an issue on one of the github repositories. I'll be sure to help you there.

  • @Skinishh
    @Skinishh Před 2 lety

    How do you judge the performance of UMAP on your data? In PCA you can look at the explained variance, but what about UMAP?

  • @search_is_mouse
    @search_is_mouse Před 3 lety

    와드

  • @luck3949
    @luck3949 Před 3 lety

    You can't say that PCA "can be put in company with SVD". SVD is one of available implementations of PCA. PCA means "a linear transformation, that transform data into a bases with first component aligned with direction of maximum variation, second component aligned with direction of maximum variation of data, projected on hyperplane orthogonal to first component, etc". SVD is a matrix factorisation method. It turns out, that when you perform SVD you get PCA. But it doesn't mean that SVD is dimensionality reduction algorithm - SVD is a way to represent a matrix. It can be used for many different purposes (ex. for quadratic programming), not necessarily reduction of dimensionality. Same for PCA, it can be performed using SVD, but other numerical methods exist as well.

    • @AICoffeeBreak
      @AICoffeeBreak  Před 3 lety +6

      You make some good observations, but we do not entirely agree. We think there are important differences between SVD and PCA. In any case, there by "put into company" we did not mean to go into the specific details about the relationship between these algorithms. It was meant more like "if you think about PCA, you should think about matrix factorization like SVD or NMF", this is what we understand by "put into company" as we do not say "it is" or "is absolutely and totally *equivalent* with".

  • @joelwillis2043
    @joelwillis2043 Před rokem

    I saw no proof of best so you failed to answer your own question.

  • @MrChristian331
    @MrChristian331 Před 3 lety

    that coffee bean looks like a "shit"