Video není dostupné.
Omlouváme se.

StatQuest: MDS and PCoA

Sdílet
Vložit
  • čas přidán 10. 12. 2017
  • MDS (multi-dimensional scaling) and PCoA (principal coordinate analysis) are very, very similar to PCA (principal component analysis). There really only one small difference, but that difference means you need to know what you're doing if you're going to use MDS effectively. This video make sure you learn what you need to know to use MDS and PCoA.
    There is a minor error at 4:14: The difference for gene 3 should be (2.2 - 1)². Instead the distance for gene 2 was repeated.
    For a complete index of all the StatQuest videos, check out:
    statquest.org/...
    If you'd like to support StatQuest, please consider...
    Buying The StatQuest Illustrated Guide to Machine Learning!!!
    PDF - statquest.gumr...
    Paperback - www.amazon.com...
    Kindle eBook - www.amazon.com...
    Patreon: / statquest
    ...or...
    CZcams Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshi...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer....
    ...or just donating to StatQuest!
    www.paypal.me/...
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    #statquest #MDS #PCoA

Komentáře • 191

  • @statquest
    @statquest  Před 2 lety +3

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @taoyang563
    @taoyang563 Před 4 lety +18

    This is such a great video.
    To answer a student's question in one sentence demonstrates the teacher's complete understanding of the knowledge.
    The more the teacher talks to answer, the less the teacher knows what you are asking and the more confused you become.

  • @dsagman
    @dsagman Před rokem +13

    Honestly the best machine learning and stats videos available. How did we live before Statquest?

  • @lade_edal
    @lade_edal Před 7 měsíci +1

    runn around all over the internet none the wiser then come across this channel and bam! It all fits so easy. Why do some people over complicate such simple things? Thanks Josh!

  • @son681
    @son681 Před 4 lety +7

    Thank you so much for such an easy and bite-size content that I can understand to the fullest. It's way much better visualized and informative compared with other videos I've seen !!!

    • @statquest
      @statquest  Před 4 lety

      Thank you very much! :)

    • @MSuriyaPrakaashJL
      @MSuriyaPrakaashJL Před 4 lety

      @@statquest This is a great video, but where can I find the maths behind it?

    • @statquest
      @statquest  Před 4 lety

      @@MSuriyaPrakaashJL Start here: en.wikipedia.org/wiki/Multidimensional_scaling

  • @Ivaniushina
    @Ivaniushina Před 6 lety +5

    Brilliant! so clear. Now I understand (at last!) the relations between PCA and MDS.

  • @초롱초록
    @초롱초록 Před 4 lety +3

    Thank you so much! I was confused with the concept of difference about PCA and MDS. Thanks to your explanation, I could understand.

  • @AlonKedem1000
    @AlonKedem1000 Před 6 měsíci

    I love your videos. Just want to mention that in 4:18 you calculated the euclidian distances for gene 2 twice while saying its gene number 3. :)

  • @nikhiljoyappa687
    @nikhiljoyappa687 Před 2 lety +1

    very helpful in the world of people who are always helpfool.

  • @MrZanvine
    @MrZanvine Před 6 lety +2

    Brilliant video, you're awesome! Thanks for taking the time to make these :)

  • @malteneumeier3274
    @malteneumeier3274 Před 5 lety +1

    @Josh Starmer: in minute 4:14 there is a tiny mistake in the formula: the difference for gene 3 should be (2.2 - 1)². Instead the distance for gene 2 was repeated.

    • @statquest
      @statquest  Před 5 lety

      Thanks a lot for pointing that out. I've added this to the "Errata" page that I maintain so that one day, when I create new editions of these videos, I can correct all the little mistakes.

  • @rlh4648
    @rlh4648 Před 2 lety +1

    Thanks Josh
    You're feckin awesome.

  • @takethegaussian7548
    @takethegaussian7548 Před 4 lety +3

    Thank you very much! This is a really really good explanation.

  • @ahmetlacin5748
    @ahmetlacin5748 Před 2 lety +1

    ı just have no idea how to thank you. Viva Josh!

  • @alejandrotenorio2327
    @alejandrotenorio2327 Před 4 lety +3

    In MDS where does the minimizing of the Raw Stress go? I'm not getting how you can do that while performing EVD to reduce the dimensions

  • @Stephanbitterwolf
    @Stephanbitterwolf Před 6 lety +1

    Very helpful. Not sure if this has been pointed out yet but at around 4:17 you talk about distance for gene 3 but the numbers aren't accurate for that gene difference.

  • @poojakunte6865
    @poojakunte6865 Před 6 lety +7

    difference for Gene 3 should be (2.2 - 1)^2 right ?

    • @statquest
      @statquest  Před 6 lety +4

      Yes! That's just a typo in the video.

  • @abcd123456789zxc
    @abcd123456789zxc Před 3 lety +2

    Thanks so much for your video, but I still have a question; I really don't understand what is the difference between PCoA and MDS.
    It would be a great help if anyone could explain the difference between PCoA and MDS.

    • @statquest
      @statquest  Před 3 lety +2

      MDS has two versions: "Classical" and "Non-Metric". This video shows how "Classical" MDS works. Classical MDS is the exact same thing as PCoA. There is no difference. However, there is a difference between PCoA and "Non-Metric" MDS. Maybe one day I'll make a video on "Non-Metric" MDS.

    • @abcd123456789zxc
      @abcd123456789zxc Před 3 lety +1

      @@statquest Thank you so much for your time and consideration.

  • @sofiagreen9742
    @sofiagreen9742 Před 4 lety +2

    Hello Josh and thank you for your videos, they are really helpful. Would you mind making a video on Canonical Correlations please?

  • @Mako0123
    @Mako0123 Před 6 lety

    Nice explanation as always!

  • @dist321
    @dist321 Před 5 lety +1

    Hi Josh! I´ve been here many times and love your channel. I have a question about the axis. I understand that each one accounts for "x" percentage variation of the dataset, being axis one, the higher percentage, however if I look at samples along PC1, can I assume any biological meaning for those samples far to the right or far to the left?

  • @HOMESTUDY247
    @HOMESTUDY247 Před 2 lety +1

    Great video

  • @madihamariamahmed8727
    @madihamariamahmed8727 Před 2 lety

    Please make videos on Deep clustering methods!

  • @medazzouzi2649
    @medazzouzi2649 Před rokem +1

    Heyy josh i m confuse in the pca statement "correlations among samples" isn't suppose to be correlation among variables? Since we are reducing dimension of variables in this case ( genes) not the samples?

    • @statquest
      @statquest  Před rokem +1

      The goal of the plot is to show correlations among the samples - so each sample has a lot of gene measurements, and correlations among sample would mean that a lot of those measurements are similar (or the exact opposite of similar) and we want to preserve those relationships. We want things that are highly correlated to appear close to each other in a graph.

    • @medazzouzi2649
      @medazzouzi2649 Před rokem +2

      @@statquest ahhh okayyyy i gettt itt 😍😍😍

    • @medazzouzi2649
      @medazzouzi2649 Před rokem +2

      @@statquest thanks josh

  • @shahbazsiddiqi74
    @shahbazsiddiqi74 Před 4 lety +4

    Unlike PCA where we compared Genes variation in order to give weight to calculate the value for each cell and then map them accordingly to PC1 and PC2. Here we are calculating the distance between cells with reference to each genes. What is the calculation for MDS1 and MDS2 . I am confused because we are taking 2 cells at a time, instead of one and are we plotting the difference of each gene with respect to cell 1 along x axis and cell 2 along y axis. Could you please explain what to consider for MDS1 and MDS2 ? Thanks a ton

    • @chrisjfox8715
      @chrisjfox8715 Před 4 lety

      If this is in reference to the LogFold Change graph then I too agree that it isn’t explained what those two axis distinctly represent. I get how the lfc was calculated before then (between every single pair of datapoints), but those axis could theoretically be anything at the discretion of the investigator...and what it is here hasn’t been made clear.

  • @marchino1981
    @marchino1981 Před 6 lety +1

    Very nice and clear! Thank you!

  • @trinh123456
    @trinh123456 Před 4 lety +1

    Your videos are amazing!

  • @alexlee3511
    @alexlee3511 Před 4 měsíci

    Thank you for the effort! but i am wondering if we are going to reduce the dimension of genomic data, do people prefer PCA or PCoA?

    • @statquest
      @statquest  Před 4 měsíci

      MDS with log fold change is the default for DESeq2 and possibly other programs. However, I feel like PCA is more commonly used.

  • @khajariazuddinnawazmohamme3092

    Hi Josh, I really like your videos and they are very intuitive. Could you do a StatQuest video on Partial Least Squares if possible? Thanks in Advance :)

    • @statquest
      @statquest  Před 5 lety +3

      Partial Least Squares is on the to-do list, so, with your vote, I'll bump it up a notch so that it is closer to the top.

    • @khajariazuddinnawazmohamme3092
      @khajariazuddinnawazmohamme3092 Před 5 lety

      @@statquest thank you so much Josh 😊

    • @melaniee467
      @melaniee467 Před 5 lety +1

      @@statquest cant wait for your Partial Least Square explanation!

    • @statquest
      @statquest  Před 5 lety

      @@melaniee467 Sounds good! I'll bump it up another notch!

  • @simonhunter-barnett6616

    If MDS and PCA have the same outputs, why would you chose one over the other? What's the importance of correlation vs distance? P.S. I've been trying to understand PCA and MDS for months now and this was so much easier than reading articles and books :D

    • @statquest
      @statquest  Před 3 lety +1

      Starting at 4:48 I give examples of using MDS with different distance metrics, which result in outputs that are different PCA.

  • @DungPham-ai
    @DungPham-ai Před 6 lety

    best video. Can you make video explain Non-negative matrix factorization (NMF) ?

  • @liranzaidman1610
    @liranzaidman1610 Před 4 lety +1

    Hi Josh,
    have you ever encountered a clustering model where there were more than 3-4 clusters? I've done it many times, and it looks like the number of optimal clusters (3-4) is "natural".

    • @statquest
      @statquest  Před 4 lety

      Very interesting. I'll try to remember to keep track of these things in the future to see if I get similar results.

  • @rekhasharma4962
    @rekhasharma4962 Před rokem

    How to adjust overlapping labels in PCA biplot???

  • @doremekarma3873
    @doremekarma3873 Před 5 měsíci

    can someone please explain how do we calculate MDS1 and MDS2 after obtaining the distance between each pair of cells

    • @statquest
      @statquest  Před 5 měsíci

      You use eigendecomposition.

  • @manueltiburtini6528
    @manueltiburtini6528 Před 3 lety +1

    Hi Josh from Italy! Are the assumptions of this methods always the same? (Normality, independence, homosch., linearity)

    • @statquest
      @statquest  Před 3 lety +1

      The same as PCA? I'm not sure. However, I do know that whatever assumptions there are are often ignored and people just try PCA or MDS and see what happens.

    • @manueltiburtini6528
      @manueltiburtini6528 Před 3 lety

      @@statquest this could lead to false interpretations. isn't it? I'm using such technique and LDA to analyze taxonomic data and I'm scared that my dataset is not independent due to phylogenetic common origin.

    • @statquest
      @statquest  Před 3 lety +1

      @@manueltiburtini6528 I don't really think that's a big problem for MDS or PCA. These methods are just designed to reduce dimensionality for drawing graphs or to plug into some other analysis (like regression).

  • @jxaskcijiaxhsic9943
    @jxaskcijiaxhsic9943 Před 12 dny

    How do you exactly find the axis of MDS? What do you do after you calculate the distances?

    • @statquest
      @statquest  Před 12 dny

      To get a sense of how it works, see: czcams.com/video/FgakZw6K1QQ/video.html

    • @jxaskcijiaxhsic9943
      @jxaskcijiaxhsic9943 Před 12 dny

      ​@@statquest is it the same thread as calculating the PCs when calculating the axis of MDS? Like finding the best fitted line by minimizing the SSR. If it is, what role does calculating the distances between points play?

    • @statquest
      @statquest  Před 12 dny

      @@jxaskcijiaxhsic9943 It's a related technique. It's not the same, but related. Based on the distances we can calculate variances and covariances and from those we can find the directions that there is the most variation in the data.

    • @jxaskcijiaxhsic9943
      @jxaskcijiaxhsic9943 Před 11 dny

      @@statquest okay so it is still finding the best fitted line but remain the distance between the points same after dimension reduction

  • @ranitchatterjee5552
    @ranitchatterjee5552 Před 2 lety

    To plot the data, do we select the cells with maximum distances? Like for example if cell 1 & 2 and cell 3&4 have maximum distances, do we plot with respect to them?

    • @statquest
      @statquest  Před 2 lety

      To get a better understanding of how it works, check out the StatQuest on PCA: czcams.com/video/FgakZw6K1QQ/video.html

  • @swarnimkoteshwar
    @swarnimkoteshwar Před rokem +1

    Thank you!

  • @mohsenvazirizade6334
    @mohsenvazirizade6334 Před 4 lety +1

    Hi, Thank you so so much for such a good explanation. Do you mind if I ask the reference book/paper for the terminologies? I am a little bit confused since I assume the same methods are a little bit different in various reference books. Thank you

    • @statquest
      @statquest  Před 4 lety +2

      To be honest, I can't remember what my original sources are for this video. More recently I've been putting the sources in the description below the video, but this video is too old for that.

  • @adelutzaification
    @adelutzaification Před 6 lety

    Wow. The PCA and MDS really are very similar, just like the videos describing them (clearly explained and overall awesome ;) It seems to me that PCA is just a particular case of MDS, as in the case of MDS one can adjust the distance metric to get various outputs, including the one given by PCA. If that is the case, why aren't people use MDS more? It seems under-utilized. Is it trickier to implement?

  • @hannahnelson4569
    @hannahnelson4569 Před 2 měsíci

    Ok. I'm going to admit. I don't understand what this video is saying. It says to just replace the dot product with other distance metrics. And that sounds fine? But it doesn't make sense that we are using the same computations mathmatically for a distance matrix and a correlation matrix. The correlation matrix (dot product distance) makes sense because its special properties allow it to have a decomposition with a diagonal component which we can sort and then reduce in dimension to produce our PCA plot. It is not at all clear to me why an arbitrary distance plot of the predictors will be diagonalizable in the same way. So the rest of the mathmatical interpretation breaks down from there.
    Basically. The math and the interpretation feels a bit off to me. I'll have to do more research on the topic.

  • @BeateSukray
    @BeateSukray Před 5 lety +2

    I love you, man

  • @SophieLemire
    @SophieLemire Před 11 měsíci +1

    Thanks!

    • @statquest
      @statquest  Před 11 měsíci

      Hooray! Thank you so much for supporting StatQuest! TRIPLE BAM! :)

  • @KayYesYouTuber
    @KayYesYouTuber Před 4 lety

    Are you saying we compute Eigen Values and Eigen vectors on the distance matrix instead of the covariance matrix? Is that the only the only difference between PCA and MDS ?

    • @statquest
      @statquest  Před 4 lety +1

      And you get your choice of distance metrics.

  • @urjaswitayadav3188
    @urjaswitayadav3188 Před 6 lety

    Great video!

  • @bitsajmer
    @bitsajmer Před 3 lety

    Hi Josh,
    1. How do we plot the values of MDS on the graph. because with distances we only have a single value.
    DO we plot it on a number line? but you showed a graph with 2 axis

    • @statquest
      @statquest  Před 3 lety

      MDS converts a matrix of distances into different axes in much the same way that we do it for PCA. For details, see: czcams.com/video/_UVHneBUBW0/video.html

  • @siddheshb.kukade4685
    @siddheshb.kukade4685 Před 11 měsíci +1

    Thanks😊

  • @drzun
    @drzun Před 4 lety

    Hi Josh, thanks for the video. I'm a bit confused that when you said "PCA starts by calculating the correlation among samples", did you mean the plotting of each sample on multi-dimensions like your previous PCA video? If so, how about PCoA? Do we also "plot" the distances among samples first and then try to get the top 2 PCs as well? If that's true, then how is the number of dimensions determined in the case of PCoA? I watched all of your PCA videos and I can understand how to get a PCA, but somehow I still don't know how a PCoA is done... thank you!

    • @statquest
      @statquest  Před 4 lety +1

      There are two ways to do PCA - an old method that is based on covariances and correlations (described in this czcams.com/video/HMOI_lkzW08/video.html and this czcams.com/video/_UVHneBUBW0/video.html ) and a new method that uses Singular Value Decomposition (described in this czcams.com/video/FgakZw6K1QQ/video.html ) . This video on PCoA/MDS references the older method (using covariances and correlations). To calculate the covariances and correlations among the samples, you follow the steps outlined in these videos on covariance statquest.org/2019/10/08/covariance-and-correlation-part-1-covariance/ and correlation statquest.org/2019/10/08/covariance-and-correlation-part-2-pearsons-correlation/ . That gives you a single number for every pair of samples. We then do Eigen Decomposition of those numbers to get the PCs. With PCoA, we calculate distances (using the euclidian distance or some other metric) between each pair of samples and do Eigen Decomposition of those numbers to get the PCs.

  • @whatyouwantyouare
    @whatyouwantyouare Před 3 lety

    Hi Josh, thanks so much ... Confusion: the new table with distances will have columns d12. d13 d14 .... d23 d23 .... so when we plot stuff why would we still have clusters corresponding to cell1 cell 2... wouldn;t the colours correspond to d12 d13 ... etc. ?

    • @statquest
      @statquest  Před 3 lety

      The first column in the distance matrix will be cell1, the second will be cell2, etc, the first row in the distance matrix cell1 and the second will be cell2 etc. The distances are then the values in the matrix. The distance between cell1 and cell1 (in the upper left hand corner of the matrix) is 0, etc.

  • @ninakoch1799
    @ninakoch1799 Před rokem +1

    THANK YOUU❤️

  • @raquelpurpleboxes
    @raquelpurpleboxes Před 4 lety +1

    You're amazing!!!

  • @YasmineNazmy
    @YasmineNazmy Před 3 lety +1

    Brilliant thank you

    • @statquest
      @statquest  Před 3 lety

      Wow! You're going through them all! BAM! :)

  • @kaynkayn9870
    @kaynkayn9870 Před 9 měsíci

    I like to learn using videos (mainly from your channel) and gpt for the maths equation. I checked wikipedia just to be sure but it looks like you skipped the step about "Double Centering and Matrix Transformation" entirely.

    • @statquest
      @statquest  Před 9 měsíci

      I talk about that in my PCA videos: czcams.com/video/FgakZw6K1QQ/video.html and czcams.com/video/oRvgq966yZg/video.html

    • @kaynkayn9870
      @kaynkayn9870 Před 9 měsíci

      @@statquest I must have missed it, ill review it again. Thank you.

    • @statquest
      @statquest  Před 9 měsíci +1

      @@kaynkayn9870 Those video specifically talk about the centering of the data - how and why we need to do that. I don't talk about matrix transformations explicitly because those are just one of several ways to specifically perform PCA.

  • @jihadrachid9044
    @jihadrachid9044 Před 3 lety

    Thank you for this great video but I want to understand for nMDS graph I should transform my values from % to square root?
    I have like 28 species. Your help will be highly appreciated.

    • @statquest
      @statquest  Před 3 lety

      Unfortunately this video only covers classical MDS.

    • @jihadrachid9044
      @jihadrachid9044 Před 3 lety

      @@statquest Can I contact you by email to understand more my case?

  • @oliseh2285
    @oliseh2285 Před 4 lety

    Hi Josh, thanks a lot your amazing videos!!!
    I have a question, with molecular markers (SSRs or SNPs) what would you personally choose?
    PCA or PCoA?

    • @statquest
      @statquest  Před 4 lety +1

      If you use the euclidian distance, then they are the same.

    • @oliseh2285
      @oliseh2285 Před 4 lety

      Yes, I got it seeing the video. But I'm not sure which kind of distance should I use in case I want to perform a PCoA with microsatellites in R, and also if PCoA is better than PCA when you use a specific distance for microsatellites.
      It's weird because when I used the Adegenet function [dudi.pca()] for my df of 5 SSRs with 23 alleles, the function instead of considering 5 variables (the 5 SSRs) took 23 variables (the 23 alleles) and for this reason, the explanation of variance of PC1 and PC2 is quite low.
      Hope you can suggest me something based on your experience as a geneticist.
      Thanks a lot.

    • @statquest
      @statquest  Před 4 lety +1

      PCA is the most commonly used method in genetics.

    • @oliseh2285
      @oliseh2285 Před 4 lety +1

      Thanks a lot for the answer and for making statistics accessible to all and funny. Please continue your terrific job. We love you!!!

  • @yulinliu850
    @yulinliu850 Před 6 lety +1

    Excellent!

  • @CWunderA
    @CWunderA Před 5 lety +2

    Good video, but it was not very clear to me why you would choose one over the other (MDS vs PCA)

    • @statquest
      @statquest  Před 5 lety

      If you're working with distances, then MDS is the way to go.

    • @CWunderA
      @CWunderA Před 5 lety +1

      My question was more why would someone choose to cluster/reduce dimensionally using distances over correlations?

    • @statquest
      @statquest  Před 5 lety +2

      At 6:20 in the video I mention that a Biologist might choose to use MDS to show clustering using log-fold changes because, traditionally, gene measurements are analyzed in terms of log-fold changes.
      Alternatively, it could be you want to cluster locations in a city based on how far they are away via taxi (so blocks and one-way streets are a factor) - MDS can do this.

    • @CWunderA
      @CWunderA Před 5 lety +2

      Ah I see, so it is more that MDS allows you to cluster via any distance metric of interest, where as PCA limits you to correlation/euclidian distance. Thanks for taking the time to help me out!

    • @statquest
      @statquest  Před 5 lety +2

      You are correct - MDS lets you cluster stuff using any distance metric. The coolest thing about that, which I forgot to mention, is that, via Random Forests, you can use MDS to cluster any data, regardless of type. Check it out in "Random Forests Part 2:" czcams.com/video/nyxTdL_4Q-Q/video.html

  • @thourayaaouledmessaoud9223

    Thanks for this video, i just have one question does MDS only accept symmetric (square) matrix as input?

  • @bibinkalirakath
    @bibinkalirakath Před 3 lety

    i have seen pcoa graphs with 3 dimensions, is there any video explaining about them

    • @statquest
      @statquest  Před 3 lety

      That would be a lot like seeing a 3-dimensional PCA plot. For more details, see: czcams.com/video/FgakZw6K1QQ/video.html

    • @bibinkalirakath
      @bibinkalirakath Před 3 lety +1

      @@statquest Thank you very much. This helped me a lot.

  • @chrischoir3594
    @chrischoir3594 Před 4 lety

    Hi, What software do you use here?
    thanks

  • @jcb0trashmail
    @jcb0trashmail Před 4 lety

    I still don't get why you would would choose MDS over PCA or the other way around...

    • @statquest
      @statquest  Před 4 lety

      MDS can work with any distance metric, not just euclidian. Here's a great example: czcams.com/video/sQ870aTKqiM/video.html

  • @mahdimohammadalipour3077

    Where can I find a numerical example ? I googled but couldn't find anything :(

    • @statquest
      @statquest  Před 2 lety

      See: czcams.com/video/pGAUHhLYp5Q/video.html

  • @kartikmalladi1918
    @kartikmalladi1918 Před rokem

    What value is plotted exactly on MDS?

    • @statquest
      @statquest  Před rokem

      It depends on what metric you use.

    • @kartikmalladi1918
      @kartikmalladi1918 Před rokem

      @@statquest if mds is plotted between 2 genes, then the distance itself became single variable. Any combination and their distance can be pointed on number scale. So if this is x coordinate of the plot, what is the y coordinate for a point

  • @DaisyKB123
    @DaisyKB123 Před 5 lety

    What does it mean by the "percentage of variation each axis accounts for"?

    • @user-du8sc9kz6x
      @user-du8sc9kz6x Před 5 lety

      The principle component axis 1, 2, 3, 4, 5... explained Rate in the PCA plot.

  • @Diegocbaima
    @Diegocbaima Před 5 lety +1

    Great, dude!

  • @EmilyBoInvests
    @EmilyBoInvests Před 2 lety

    Hi Josh, how do you choose among PCA, LDA and MDS methods?

    • @statquest
      @statquest  Před 2 lety +1

      LDA is supervised, so you can only use it when you know what groups you want to supervise. MDS is useful when you want to change the distance metric. And if you don't want to change the distance metric, MDS and PCA are the same.

    • @EmilyBoInvests
      @EmilyBoInvests Před 2 lety +1

      @@statquest Thank you, Josh! very helpful!

  • @ketalesto
    @ketalesto Před 2 lety +1

    Day 40 of #66DaysOfData
    Yeah baby! Let's go!

  • @Retko85
    @Retko85 Před 2 lety

    Hi Josh.. I am a little confused, regarding features and samples . For example here on 6:56, you say that PCA create plots based on correlations among samples. Only concept of correlation that I know is between features, so when 2 features change together, correlation is big. But I got confused here. I tried to search about sample correlations, and what I found was correlations on samples, as part of a population, but here samples should be like rows/instances/observations. Also your computation of Euclidean distance got me confused, Since you have features as rows - gene1, gene2, and samples as columns, cell 1, cell 2. Can you please confirm my understanding - Does PCA create plot based on correlations among FEATURES, like, person age, weight etc., where each person is a sample?? Thank you :)

    • @statquest
      @statquest  Před 2 lety

      To get a better sense of how PCA works, see: czcams.com/video/FgakZw6K1QQ/video.html

  • @yudiherdiana4979
    @yudiherdiana4979 Před 3 lety +1

    Thank you!!

  • @lalala90348
    @lalala90348 Před 5 lety +1

    “Reduce them to a 2-D graph”? How exactly?

  • @nutzanut9817
    @nutzanut9817 Před 4 lety

    How can we draw 2D graph after calculate distance of all pair ?
    we've got nC2 value for n feature .
    Thanks.

    • @statquest
      @statquest  Před 4 lety

      You do it just like PCA. For more details on how PCA does it, check out this video: czcams.com/video/FgakZw6K1QQ/video.html

  • @rrrprogram8667
    @rrrprogram8667 Před 6 lety

    Great Video.... Actually I am elevating my self from Excel data analysis to machine learning... Right now I am in stage to grab everything I can....What are ur advise to excel users to machine learning enthusiasts...

  • @user-ib9lp8zx6x
    @user-ib9lp8zx6x Před 6 lety

    Hi, Joshua. I noticed that you mention "the data is not linear" in the reply of comments. I am really confused about this concept for some time. What does non-linear data mean(I guess it is not the same kind of concept of linear model right, haha)? A bioinformatician told me that single-cell data is non-linear and we'd better used tSNE rather than PCA. How to explain the bulk RNA-seq data is linear data and single-cell RNA-seq data is non-linear. I really really hope you could answer my question because it really really confuses me for quite a long time.

    • @user-ib9lp8zx6x
      @user-ib9lp8zx6x Před 6 lety

      Haha, thank you Joshua. The spiral pattern is the so-called "Swiss roll" model I think. Someone says that linear dimensional reduction focus more on global pattern(like distance), while the non-linear dimensional reduction methods focus more on local pattern.
      Why not talking about zero-inflation in single-cell next time and the normalization methods used in single-cell data analysis?

  • @trinh123456
    @trinh123456 Před 4 lety

    Hi Josh, it is me again. Thanks for the great video! I am wondering if you have a video on nMDS because I saw this quite often in biological studies, but still quite blur..

    • @statquest
      @statquest  Před 4 lety

      Unfortunately I don't have a video on non-metric MDS.

    • @trinh123456
      @trinh123456 Před 4 lety

      No worries. Are you going to do it any time soon? I am quite looking forward to it because it is quite common in Biology. Thanks Josh!

    • @statquest
      @statquest  Před 4 lety +2

      @@trinh123456 Unfortunately, I do not have plans to do it anytime soon. My to-do list is huge (it has 100s of items on it) and I can only make a few videos each month. I work as fast as I can, and I work all the time, but it's not enough to keep up with the requests.

    • @datenfritz9860
      @datenfritz9860 Před 4 lety

      Hi Tien, maybe I can provide some help for nMDS based on Josh's tripple BAM video! (As always amazing job Josh!). To my knowledge NMDS is a ranked based approach. Like MDS you start with computing the distance between samples. The these distance values get then ranked. After the ranking you perform the "fancy math" thing to get the coordiantes for a graph. Be aware that you loose quantitative information when clustering on ranks.
      You can check this website for more details: mb3is.megx.net/gustame/dissimilarity-based-methods/nmds

  • @jamesayukayuk1151
    @jamesayukayuk1151 Před 6 lety

    Hey Joshua. I have not found anything on the non-metric version of MDS. Any videos, please?

    • @jamesayukayuk1151
      @jamesayukayuk1151 Před 6 lety

      Thank you. Will keep an eye out for it when done. Thanks for the good work.

  • @jameelahharbi2714
    @jameelahharbi2714 Před 10 měsíci

    i need more details for PCA

    • @statquest
      @statquest  Před 10 měsíci

      For more details about PCA, see: czcams.com/video/FgakZw6K1QQ/video.html

  • @darkredrose7683
    @darkredrose7683 Před 2 lety

    Thank you! And how about the CAP analysis? I'm so confused >< Thank you in advance!

    • @statquest
      @statquest  Před 2 lety

      I'll keep that topic in mind.

  • @rncg0331
    @rncg0331 Před 5 lety

    do you have a python version for mds?

  • @adelutzaification
    @adelutzaification Před 6 lety

    One more comment. The fact that mds uses a precomputed distance reminds me hierarchical clustering. Does it mean that MDS is a 2d representation of hierarchical clustering?

    • @adelutzaification
      @adelutzaification Před 6 lety

      That would be cool. I am brewing something. I might have an idea. Not sure how good at this moment. I need to write it up. I'll keep u posted to see if it is worth anything. Ta ta

    • @adelutzaification
      @adelutzaification Před 6 lety

      I went down in flames :) It turns out I was thinking of re-inventing the wheel :) My inclination was to further dissect the PCA results/"clouds" and see the relationship between the comprising datapoints. I was deflated to see that this problem was solved many years ago by clustering (either kmeans or hierarchical). ;(
      On the good side, I found a few useful things. A paper that confirms the relatedness between PCA and Kmeans as you were anticipating:. ranger.uta.edu/~chqding/papers/KmeansPCA1.pdf
      I also found out about the HCPC package in R that can do hierarchical clustering after factor analysis. It seems kinda cool as on the graphical side it does pseudo 3D hc. Imagine the first 2 PC as a horizontal plane and the clusters roots coming from the top... www.r-project.org/conferences/useR-2009/slides/LeRay+Molto+Husson.pdf . In the usual 1D hc, I don't like the fact that some less related points are adjacents. This HCPC plotting is not perfect either as it obscures some datapoints.
      I was thinking that 2D-density could be used to further "cluster" the PC plot ; for example with geom_density_2d()/stat_density_2d() in ggplot2. with the right arguments and aesthetics (with the right function) might be able to pick up some "clusters" but not relationship between the point inside of a contour. Maybe adding relatedness by connecting the dots somehow on a zoomed in plot (by adjusting the axes) my help to see further details. ..
      What other ways of showing relatedness besides hc and correlation matrices do people use ?

  • @sathsarawijerathna9325

    Hi Josh. Do you have any videos for NMDS?

    • @statquest
      @statquest  Před 5 lety

      Not yet. You can find an organized listing of all of my videos here: statquest.org/video-index/

  • @marahakermi-nt7lc
    @marahakermi-nt7lc Před rokem

    hmmm i guess the covariane matrix in this case is a matrix with o diatances in ths diagonal

    • @statquest
      @statquest  Před rokem +1

      That would mean the variance was 0.

    • @marahakermi-nt7lc
      @marahakermi-nt7lc Před rokem

      @@statquest yessss since subtracting the same distance=0

  • @YooToobins
    @YooToobins Před 5 lety +8

    Recommend speeding this up to 1.25x while viewing

  • @kathik595
    @kathik595 Před 5 lety +1

    Do complete statistical predictive modeling using python & R

  • @noobshady
    @noobshady Před 5 lety +1

    where can we read about the fancy math related?

    • @statquest
      @statquest  Před 5 lety +3

      Wikipedia is always a great place to start: en.wikipedia.org/wiki/Multidimensional_scaling

  • @neckar6006
    @neckar6006 Před rokem

    4:15, maybe distance for gene3 is wrong

  • @alecvan7143
    @alecvan7143 Před 4 lety +1

    awesome :)

  • @Cuicui229
    @Cuicui229 Před 3 lety

    hi Josh! Thanks for the video! I still didn't get the point how we can do the same thing on the distance matrix as we do on PCA(czcams.com/video/FgakZw6K1QQ/video.html) I watched this video, and thanks for your wonderful explanation, I could imagine that for serveral samples with 2 genes, we can draw the dot on the 2-D plot(gene1 and gene2), and we find the best fit line, which is the PC1 and then a PC1 vertical line as PC2, both with the largest distance to the origin. But when it comes to the distance matrix, how can we draw the dot, because there is no gene. Only sample1, sample2 ...et al. I really confused. Truly thankful!

    • @statquest
      @statquest  Před 3 lety +1

      There are two methods for doing PCA - the one I present in that video is called "Singular Value Decomposition" and it works the way I presented in that video. Alternatively, we can do something called "Eigen Value Decomposition" and this is based on using the covariance or correlation matrix of the data. It is through this second way that PCA ends up giving us results similar to MDS. Unfortunately, I don't have a good video for explaining how this second way works. :(

  • @fatihbaltac1482
    @fatihbaltac1482 Před 5 lety +1

    BAAAM !!