Importance Sampling - VISUALLY EXPLAINED with EXAMPLES!

Sdílet
Vložit
  • čas přidán 29. 08. 2024
  • This tutorial explains the Importance Sampling technique and its variant for unnormalized distribution functions called Self Normalized Importance Sampling.
    Notebook to go with this tutorial:
    colab.research...
    #sampling
    #statistics
    #montecarlo

Komentáře • 53

  • @sudhagarraghavan9696
    @sudhagarraghavan9696 Před 3 lety +13

    Outstanding tutorial...probably the best in the CZcams on this topic. Your knowledge is very impressive and your presentation of the subject is excellent ! Thank you!

  • @malcolmyang5662
    @malcolmyang5662 Před 3 lety +5

    One of the best explanation I have found so far in internet. GJ

  • @proxyme3628
    @proxyme3628 Před rokem +1

    Finally I can understand the importance sampling. Great way to start with uniform distribution p(x) and then introduce sampling from importance region with normal distribution q(x). Great way to get the idea of importance sampling. Thank you so much.

  • @atomsmasher
    @atomsmasher Před 2 lety +1

    Incredibly done tutorial, I looked at 4 videos before landing on this one and it is superb. Thank you for taking the time to enrich humanity by sharing knowledge clearly and intuitively

    • @KapilSachdeva
      @KapilSachdeva  Před 2 lety

      🙏 many thanks for these kind words. I am happy you found the tutorial helpful.

  • @prajwalchauhan6440
    @prajwalchauhan6440 Před 8 měsíci +1

    This series is the best.

  • @avishkarsaha8506
    @avishkarsaha8506 Před 2 lety +1

    best video on importance sampling ive seen

  • @sanjaykrish8719
    @sanjaykrish8719 Před 4 měsíci +2

    This is pure gold ❤

  • @schrodingerac
    @schrodingerac Před 3 lety +2

    This is a great tutorial..... I really appreciate the effort done to make it looks like this

  • @akapo98
    @akapo98 Před rokem +1

    Very high quality video! Thank you very much!

  • @jonasschmitt3548
    @jonasschmitt3548 Před 3 lety +2

    Great video, thank's a lot :)

  • @Matteo-uq7gc
    @Matteo-uq7gc Před 3 lety +1

    Another awesome video! Sequential Importance Sampling would be amazing also!!!

    • @KapilSachdeva
      @KapilSachdeva  Před 3 lety +2

      🙏… will do it; most likely in conjunction with particle filtering.

    • @Matteo-uq7gc
      @Matteo-uq7gc Před 3 lety

      @@KapilSachdeva I will be on the lookout!!! Thank you

  • @ajazhussain2919
    @ajazhussain2919 Před 4 měsíci +1

    thanks

  • @mohsenvazirizade6334
    @mohsenvazirizade6334 Před 10 měsíci +2

    Fantastic tutorial. Way to go! what are the other resources you reommend for other importance sampling methods?

  • @yogeshmathpati1472
    @yogeshmathpati1472 Před 2 lety +1

    Keep up the good work sir. Tysm!

    • @KapilSachdeva
      @KapilSachdeva  Před 2 lety

      🙏

    • @yogeshmathpati1472
      @yogeshmathpati1472 Před 2 lety

      Can I get the code of importance sampling of some problem to know how it's done ? Actually I'm new to coding and I'm not able to find it anywhere. It would be huge favour if you help. And something about Latin Hypercube Sampling too. Thanks

    • @KapilSachdeva
      @KapilSachdeva  Před 2 lety

      Here is a notebook in which I have added the example that I showed in the tutorial.
      colab.research.google.com/drive/1Xm2e0MDVsoxgPIKIl-u60iu4fNARz3Kw?usp=sharing
      Hope this helps.

  • @vi5hnupradeep
    @vi5hnupradeep Před 2 lety +1

    Thank you so much sir !

  • @bodwiser100
    @bodwiser100 Před 8 měsíci

    @KapilSachdeva, very nice video! Super helpful as always. Can you please shed light on one sticky point that has always eluded me when studying sampling methods (including MCMC): what did you mean when you mentioned that one of the problems in computing the average is "we can't sample from p"? I thought it meant that we do not know the PDF p(x). But as I continued to watch the video further on, it seemed that it was not the case. We _can_ have access to p(x). But then I was wondering that if we know p(x), why can we not sample from it? In the same vein, I have always wondered that if we knew p(x), there would never be any need for the proposal function, and the transforms etc. and all that machinery. Isn't the unavailability of p(x) the whole reason why we are doing all this?

    • @KapilSachdeva
      @KapilSachdeva  Před 8 měsíci +2

      Quite many people have this confusion. Here are two things that should help:
      - just because you know the functional form of the PDF it does not mean you can sample from it. Most of the sampling algorithms are the transformations of uniform sampling.
      - in the example I have used function whose analytical form I knew but when you are doing Bayesian statistics and have posterior prob then most of the time you do not even know the functional form.

  • @chandreshmaurya8102
    @chandreshmaurya8102 Před 2 lety +1

    Nice info on importance sampling. Can you tell me which software you used to make the videos?

  • @yiweikwok8735
    @yiweikwok8735 Před rokem +1

    Thanks for the great video!
    I wanted to ask about the "variance" you plotted (timestamp 10:10) in Monte Carlo sampling. At first time I didn't know what variance you were referring to, but by looking at the colab notebook, I realized you were referring to the variance of random variable $10*h(X)$, instead of the variance of the Monte Carlo estimate $10*\sum_{i=1}^N h(X_i) / N$.
    So here I want to share some of my thoughts when I was learning it. If we are talking about variance of $10*h(X)$, then it's always 4, and the plot only shows an estimate of this ground truth variance, with $N$ getting larger and larger. For a small $N$, the estimate of variance could also have high variance, resulting a very inaccurate estimate of variance, i.e. could be much larger or smaller than 4. With a sufficiently large $N$, we can finally obtain an accurate estimate: 4. Running the program multiple times, one can observe the great fluctuations at the first of the plot.
    It's another thing when it comes to the variance of the Monte Carlo estimate $10*\sum_{i=1}^N h(X_i) / N$. For this random variable, the variance is $4/N$, depending on the number $N$. Hence actually Monte Carlo reduces the variance of its estimate of the mean by increasing number of samples $N$.
    However, the more important thing is, with the large variance of $10*h(X)$, one would require a very large $N$ to get the "law of large numbers" really working well. By Chebyshev's inequality, if we are expecting the Monte Carlo estimate to have an error less than \epsilon, the number of samples is inversely proportional to the "risk" (the probability that we fail to accomplish the goal). With a large variance of $10*h(X)$, the number of samples $N$ needs to be even larger. That's why we need importance sampling to reduce the sample variance, so that we don't need so many samples to perform Monte Carlo.
    That's my own opinion, hope it helps 😄

  • @eceserin
    @eceserin Před 2 lety

    Very nice, thank you.

  • @animatedmath4307
    @animatedmath4307 Před 2 lety +1

    @14.41 you state you are interested in computing P(X\geq 2), but this has nothing to do with h. Do you mean P(h(X)\geq 2)? In such a case the rest makes sense

    • @KapilSachdeva
      @KapilSachdeva  Před 2 lety

      Based on your comment, in the hindsight I agree it is not proper to say we are interested in probability of X > 2. Indeed my sentence will create confusion as it has done for you. Apologies.
      Here is how I should have explained:
      In this scenario, we are interested in calculating the expectation of a function of a random variable X (i.e. h(X)), where X is normally distributed __but__ we only want to consider situations when X >=2
      I hope it makes sense now.

    • @animatedmath4307
      @animatedmath4307 Před 2 lety +1

      @@KapilSachdeva i see: then it is not even what i thought.
      P(X\geq 2)= integral of Ind*p(x)dx, where Ind is the indicator function e p(x)=pdf(X).
      P(h(X)\geq 2)=integral of Ind(h(x))*p(x) dx
      E[restriction of h(X) to the domain [2, \infty)]=integral h(x)*Ind* p(x) dx
      which is the second integral @15.50.
      Now it is clear.

    • @KapilSachdeva
      @KapilSachdeva  Před 2 lety

      🙏

    • @lakex24
      @lakex24 Před 8 měsíci

      Thank you for raising this question. It's indeed quite confusing for me as well. imo, the indicator function should be applied to h(x), where we should set h(x) = 1, for h(x) >= 2; and set h(x) = 0, for h(x) < 2. Since we are interested in the probability of h(x) >= 2, NOT the expectation of h(x) ...?

  • @100rabhkr_India
    @100rabhkr_India Před 2 lety +1

    Dear Kapil Sir, thanks for this great video. At 4:10, you mention that P_theta is not normalized. I want to ask, why do we need our P_theta to be normalized? What problem would it create if its not normalized?

    • @KapilSachdeva
      @KapilSachdeva  Před 2 lety

      Thanks Saurabh.
      A function can be only be considered a probability distribution function (PDF) if it is normalized. You use PDF to compute the probability of a state of your random variable. Probability is a number between 0 and 1.
      In the expectation equation, the right most term is the pdf (p(x)). Therefore, your distribution better be normalized to use it.
      In the importance sampling case you see that we need to compute the ratio p(x)/q(x). This ratio is that of PDFs and hence u need the normalizing constants for both of them.
      Now, in most of the real world probabilistic modeling the computation of this normalizing constant is the most the difficult thing (computational cost).

    • @100rabhkr_India
      @100rabhkr_India Před 2 lety

      @@KapilSachdeva Thanks for the quick response. Can you please provide an example of a PDF which is not normalized

    • @KapilSachdeva
      @KapilSachdeva  Před 2 lety +1

      The unnormalized PDF arise when you multiply/mix/convolve two high dimensional probability functions. Most of the time you see this happening when doing Bayesian Statistics. Read this page - en.wikipedia.org/wiki/Normalizing_constant .... The posterior distribution is result of mixing likelihood and prior divided by normalizing constant. You also saw that in the Kalman Filter tutorial. If you use non-linear functions for transforms you would end up resulting function for which computing the normalizing constant will be extremely costly.
      Also look at the first 5-10 min of this video where I explain this problem of normalizing constant in another (but related) context (czcams.com/video/68dJCDyzB0A/video.html)

    • @100rabhkr_India
      @100rabhkr_India Před 2 lety +1

      @@KapilSachdeva Understood. Thank you very much. Waiting for your videos soon on Particle filter, Bootstrap filter etc. Loved your videos. Thank you very much

    • @KapilSachdeva
      @KapilSachdeva  Před 2 lety

      🙏

  • @ArashSadr
    @ArashSadr Před 2 lety

    Thanks for the great video, I was wondering this is a typo or not? At 13:17, on the bottom of right column texts, THe last formulation should be 10/N..... ? The 1/N is missing or I am wrong?

    • @KapilSachdeva
      @KapilSachdeva  Před 2 lety +1

      🙏No you are correct. Thanks for catching it … someone is really paying attention :)

    • @medomed1105
      @medomed1105 Před 2 lety

      @@KapilSachdeva I think 10 should not be existed . It was for uniform