Accept-Reject Sampling : Data Science Concepts

Sdílet
Vložit
  • čas přidán 29. 08. 2024
  • How to sample from a distribution WITHOUT the CDF or even the full PDF!
    Inverse Transform Sampling Video: • Inverse Transform Samp...

Komentáře • 155

  • @jochem2710
    @jochem2710 Před 3 lety +66

    Great explanation! A lot of professors just go though the formulas without providing an intuition to what's going on. I love that in the field of AI and Data Science there are so many great lectures and tutorials online. Makes you wonder how useful university really is. Keep up the good work!

    • @shivamverma3686
      @shivamverma3686 Před 2 lety

      Can you please name a few.. i am new to this and would very much appreciate your help. Thanks in advance. :)

  • @Peter1992t
    @Peter1992t Před 2 lety +25

    I am so glad I found this channel right as I started my PhD program in biostatistics. You straddle the line between proof/mechanics and intuition perfectly. So many videos on these topics are either way too surface level that I can't immediately apply it, or way too technical that I don't develop the intuition for what's going on. This is not the first video of yours that I have felt this way about.
    These videos are so good that sometimes after you reach a milestone in an explanation (like explaining how the acceptance probability drives the mechanics for this method), I just sit in awe at how good of an explanation it was that I lose track of the 15 seconds that has passed and I have to rewind the video. You are doing an incredible job; please keep it up.

    • @ritvikmath
      @ritvikmath  Před 2 lety +4

      Your words mean a lot, thank you. And I wish you best of luck in your PhD!

  • @swapnajoysaha6982
    @swapnajoysaha6982 Před 2 měsíci

    I can't thank you enough. Although I understood the concept in my class, still I wasn't able to visualize how this is working until I saw your video. You are helping thousands of us students. Thank you sooo much.

  • @vutsuak
    @vutsuak Před 3 lety +9

    Huge fan! By taking the pain to explain the intuition (often ignored), application as well as maths, you've created an amazing series.

  • @jiayuzhou6051
    @jiayuzhou6051 Před 2 lety +2

    It's really nice for tutors to promise to come back later each time they want to introduce state-of-the-art solutions. It keeps students on track and motivates thinking.

  • @MarioAguirre-jr1pm
    @MarioAguirre-jr1pm Před 3 měsíci

    i'm doing my internship in a pretty heavy statistics role where i have to sample from very weird custom distributions, thanks for saving my life with these sampling vids.

  • @yuanhu7264
    @yuanhu7264 Před 3 lety +2

    This should be the quality of all UCLA discussion sessions, great job!

  • @milescooper3322
    @milescooper3322 Před 3 lety +6

    The world needs people like you as teachers. Thanks.

  • @sinextontechnologies9484
    @sinextontechnologies9484 Před 3 lety +5

    Couple of tricks for sampling: If you need to sample from a normal distribution, then you can take N uniformly distributed random numbers and add them up (rand + rand + rand ...), then you can scale this result horizontally and vertically if you need it, the result will be normally distributed - also many times I need sampling from exponential distributions to have an extreme behave for the random variable, for this I take 1/rand or ln(rand)^2, these methods are pretty fast and robust.

  • @brady1123
    @brady1123 Před 3 lety +8

    Very nice explanation!
    We use a similar technique in physics for molecular monte carlo simulations where we don't know the value of the partition function (i.e. normalization constant) but we do know a state's Boltzmann factor (i.e. the numerator value). So when a new molecular state is proposed during the MC sim, you take a ratio of the two states' Boltzmann factors and that gives you the accept/reject probability.

    • @ritvikmath
      @ritvikmath  Před 3 lety +5

      Hey, that's super cool! I'm clearly more of a math person so I always love hearing when people have an actual application of some of these topics. Thanks!

  • @thegreatestshowstopper5860

    0:00 - 1:10 Why we need Accept-Reject Sampling?
    1:10 - 4:00 The case problem
    4:00 - 6:25 How to sample from p(s) from g(s)
    6:25 - 8:20 Criteria for accept/reject the sample from g(s)
    8:20 - 12:20 Mathematical explanation of why the acceptance criteria works for the samples from g(s) using Bayes Theorem
    12:20 - 15:08 Intuitive explanation of why the acceptance criteria works by really understanding what f(s)/g(s) means
    15:08 - 17:20 Limitations of the Accept-Reject Sampling and importance of choosing the right g(s)
    17:20 - end of video Conclusions and ending
    Thanks for the video! I love your explanations of this concept especially the intuitive understanding part.

  • @patrick_bateman-ty7gp
    @patrick_bateman-ty7gp Před 6 měsíci

    many articles go through the algorithm, but it never really made sense to me as to why this works. This is a crazy good explanation of why it works(especially the bayes theorem part for accepting a sample).

  • @sheetalmadi336
    @sheetalmadi336 Před 9 měsíci

    You are very muchhhh underrateddd man!!You deserve similar appreciation as any other highly rated channel like 3B1B or Veritasium.May be people now a days go for animated videos only but your words are very valuable I could see that. You are trying to teach us exact same way you have learned it from scartch and that helps a lot.

  • @fengjeremy7878
    @fengjeremy7878 Před 2 lety

    Intuition is very important for understanding math. You make my learning journey much more comfortable!

  • @RECIPE4DISASTR
    @RECIPE4DISASTR Před 2 lety

    Thank you! After looking at six other sources that all explained it the same way and coming up short, I really appreciated the effort you took to explain it differently and intuitively. And great pajamas!

  • @nayabkhan7564
    @nayabkhan7564 Před rokem

    the only person that knows how to teach data science

  • @accountantguy2
    @accountantguy2 Před 10 měsíci

    Thank you! This explanation is so much better than what I got in my masters program.

  • @asevlad
    @asevlad Před 2 měsíci

    watching your 2nd video. Great explanation! The best thing is intuitive understanding. Thank you for help in learning)

  • @mino99m14
    @mino99m14 Před rokem

    Thank you for the amazing video. It's very useful when someone gives some intuition. Just one observation.
    By looking at wikipedia I can tell your proof is a bit misleading. You forgot to mention that the probability of acceptance is defined using a uniform distribution, instead of just getting there using the fact that
    P(A) = int(g(s)*p(A|s)ds).
    With this you get to the same expresion you use for P(A), but also you let your audience know that you need to use a uniform distribution to decide whether you reject or accept a sample.

  • @prajwalomkar
    @prajwalomkar Před 3 lety +1

    You're just brilliant. I wish my professors made it this easy. Thanks Ritvik

  • @azamatbagatov4933
    @azamatbagatov4933 Před 2 lety +1

    I just recently discovered your channel and I am glad I did! Clear, concise instruction. Thank you!

  • @seansteinle2950
    @seansteinle2950 Před 5 měsíci

    Thank you so much for these videos! You are a life-saver.

  • @ingenierocivilizado728
    @ingenierocivilizado728 Před 6 měsíci

    All your videos are very clear and useful. Thank you very much for your help and your effort!!!

  • @julialikesscience
    @julialikesscience Před 11 měsíci

    The method is so well-explained. Thanks a lot!

  • @nicolebaker2902
    @nicolebaker2902 Před 3 lety +1

    This is what I needed! I have gone through video after video trying to understand this. Fantastic job -- thank you!

  • @bhujaybhatta3239
    @bhujaybhatta3239 Před 2 lety

    Truly Amazing Explanation

  • @hmingthansangavangchhia4913

    I was looking for accept/reject algorithm for generating rv's. So not actually what I was looking for but glad I stumbled on your channel. Subscribed.

  • @SarthakMotwani
    @SarthakMotwani Před 4 měsíci

    Beautifully Explained.

  • @daveamiana778
    @daveamiana778 Před 3 lety +1

    Thanks for clarifying this to me. It greatly helped me get through.

  • @user-im7yo7hn5z
    @user-im7yo7hn5z Před 7 měsíci +1

    definitely should have more followers!

  • @peterszyjka7928
    @peterszyjka7928 Před rokem +1

    Magnifique ! Do another AR video with some ( one or two ) examples!. ....Maybe you did and Ijust haven't seen it....I jumped on this one since it was very good, easy to follow, and as you stress, intuitive ! "Right On" as we used to say back in the 60's out there in LA.

  • @momotabaluga2417
    @momotabaluga2417 Před rokem

    such a good explanation. 10/10

  • @prasanthdwadasi6449
    @prasanthdwadasi6449 Před 2 lety

    Your video was a great help. Thanks for taking time and explaining the math and intuition clearly.

  • @maxgotts5895
    @maxgotts5895 Před rokem

    An excellent explanation of some really beautiful data science!! Thank you so much!

  • @qiguosun129
    @qiguosun129 Před 2 lety

    Clear explanation and the most intuitive ideas, cool!

  • @yaaaaaadiiiiiii
    @yaaaaaadiiiiiii Před 8 měsíci

    Excellent! better than my teacher's 1 hour rambling words

  • @masster_yoda
    @masster_yoda Před 2 měsíci

    Amazing insights! Thank you!

  • @yachtmasterfig
    @yachtmasterfig Před rokem

    ur so good at explaining this concept! Wow

  • @Phosphophyllite-lz4mb
    @Phosphophyllite-lz4mb Před 10 měsíci

    Great videos! Have been learning from them for a long time.👍👍👍

  • @RaviShankar-de5kb
    @RaviShankar-de5kb Před rokem

    Its like magic!!! Thanks for explaining, 7:38 was a big key for me, I didn't get the magic at first

  • @katieforthman3384
    @katieforthman3384 Před 2 lety

    Thank you for this great explainer! I would love to see a video on importance sampling.

  • @ec-wc1sq
    @ec-wc1sq Před 3 lety

    great explanation, so much better than my professor....thanks for creating this video

  • @abroy77
    @abroy77 Před 3 lety +1

    thanks a ton for all your content. It's incredibly helpful and beautifully composed. Best wishes

  • @phuvuong9062
    @phuvuong9062 Před rokem

    Thank you very much. Great explanation.

  • @phoebesteel5874
    @phoebesteel5874 Před 2 lety

    love your videos bro they got me through my statistics paper xx

  • @zakariaaboulkacem7098
    @zakariaaboulkacem7098 Před 3 lety +3

    Nice, thank you

  • @Underwatercanyon
    @Underwatercanyon Před 2 lety +4

    Great explanation! 1 question though, if we have a f(s) that is easy to sample from, why can't we just directly sample from it and be done with, rather than going through the sample from g(s) steps?

  • @olivier306
    @olivier306 Před rokem

    Legendary explainer thanks!!

  • @neelabhchoudhary2063
    @neelabhchoudhary2063 Před 16 dny

    I get it now. Thank you

  • @Gabriel-oy5kw
    @Gabriel-oy5kw Před 2 lety

    Happy Holidays my fellow! Your content is marvelous......

  • @ankushkothiyal5372
    @ankushkothiyal5372 Před 2 lety

    Thank you for these lectures.

  • @levmarcus8198
    @levmarcus8198 Před 3 lety +3

    I've been hooked and watching through your videos in the past week. Do you have any favorite books or resources that you use for reference on the mathematical side of data science?

    • @ritvikmath
      @ritvikmath  Před 3 lety +6

      Hey, thanks for the kind words. I get this question often and the admittedly unsatisfying answer is no. I've found that different resources out there do a really good job at different things or at least offer different ways of viewing the same problem. When I try and learn a new topic, or review an old topic when making a video, I'll look around at lots of different resources to see which path I want to take in explaining it. That said, I think the most important part for learning (in my opinion) is to write some basic code, doesn't have to be pretty, which implements the method. That way, you can do sanity checks to see if your understanding matches to how real data would behave. Plus, you get some coding experience out of it. Sorry to not have an answer to your initial question but I hope this helps regardless!

    • @levmarcus8198
      @levmarcus8198 Před 3 lety

      @@ritvikmath No problem. Thanks for the long response!

  • @andreveiga1
    @andreveiga1 Před 2 lety

    Great! Proof + intuition. Awesome!

  • @komuna5984
    @komuna5984 Před rokem

    Great explanation!

  • @sksridhar1018
    @sksridhar1018 Před 2 lety

    Great explanation!!

  • @luisrperaza
    @luisrperaza Před 2 lety

    Great explanation, many thanks for the video.

  • @khalilibrahimi3807
    @khalilibrahimi3807 Před 3 lety +1

    Man you're good. Thanks

  • @RomanNumural9
    @RomanNumural9 Před 3 lety +2

    Amazing video. Keep up the amazing work :)

  • @timlonsdale
    @timlonsdale Před 2 lety

    Thanks, this is great!

  • @MrTSkV
    @MrTSkV Před 3 lety +1

    I think this looks kinda similar (-ish) to MCMC algorithm? Maybe it's a good idea to cover MCMC in one of the next videos, since they are related.
    Anyway, that was a great video, I really enjoyed it. Keep up the good work!

    • @ritvikmath
      @ritvikmath  Před 3 lety +3

      You're reading my mind. I put this video out first so that in the MCMC videos (releasing next week onward), we can compare it against this. Stay tuned :)

  • @andblom
    @andblom Před 4 měsíci

    Well done!

  • @SpazioAlpha
    @SpazioAlpha Před 2 lety

    WAO! Great explanation! thanks thanks thanks!

  • @Briefklammer1
    @Briefklammer1 Před 3 lety +6

    if you need a good g in ARS, why not using g for p? The aim is to find a good unknown density for your samples right? So by finding a good enough g for ARS you find your good density approx. You dont need ARS at all in my intuition. What is the advantage of ARS? Maybe make an approx even more better?

    • @ritvikmath
      @ritvikmath  Před 3 lety +2

      You bring up a very interesting question. Usually, the distributions, p, that we want to sample from are not very nice looking (can have many peaks, noisy, etc.), so finding a distribution g that is "similar" to p can be challenging or impossible. So, instead we use a g that is "close enough" to the target and use ARS to actually sample from the target.

    • @Briefklammer1
      @Briefklammer1 Před 3 lety

      @@ritvikmath thx for answering my question. So ARS can smooth the potentially noisy density or find an easy alternative for p, if you have an approx/good candidate g for p, right? But what is the advantage against kernel density estimation KDE with special kernel k?

  • @maximegrossman2146
    @maximegrossman2146 Před 3 lety +1

    Excellent video!

  • @PatrickSVM
    @PatrickSVM Před 2 lety

    Thanks, very good explanation!

  • @sneggo
    @sneggo Před 3 lety

    Amazing explanation!! Thank you

  • @emiliaverdugovega7189
    @emiliaverdugovega7189 Před 2 lety

    thanks!! it was very helpful

  • @pri6515
    @pri6515 Před 15 dny

    Great video! Wouldn’t it make sense to always choose uniform distribution as g(s). Ofcourse it could be uniform within a large interval for practical purposes. What would be the reasons to choose any other g(s)?

  • @Jameshazfisher
    @Jameshazfisher Před 4 měsíci

    Maybe we don't need f(s) to always be lower than Mg(s), if we allow outputting the sample multiple times. E.g. if f(s)/Mg(s) = 2, then we'd output s twice.

  • @FUCKOFFYOUTUBEWITHTHISBULLSHIT

    Life saver!

  • @zgbjnnw9306
    @zgbjnnw9306 Před 2 lety +1

    At 7:50, how do you decide which sample is accepted or rejected? Is the prob(f(x)/(Mg(x)) > 0.5?

    • @mino99m14
      @mino99m14 Před rokem

      The ratio you get from f(X)/Mg(X) gives you the probability of acceptance. Then you use a uniform distribution from 0 to 1 and if the value is less than the ratio, you accept. If it's bigger you reject.

  • @MasterMan2015
    @MasterMan2015 Před rokem

    Amazing content! Maybe we need a video about diffusion models and particle filter 😀

  • @eliacharles5835
    @eliacharles5835 Před 3 lety +2

    Love the video. This may sound like a silly question but do you use some sort of threshold to decide whether you accept or reject something ? I get the intuition behind the ratio but whats the process of actually accepting or rejecting ?

    • @awangsuryawan7320
      @awangsuryawan7320 Před 3 lety

      Up

    • @zgbjnnw9306
      @zgbjnnw9306 Před 2 lety

      I have the same question.

    • @xinzhou4360
      @xinzhou4360 Před 2 lety

      Hi, the threshold is "f(s)/(Mg(s))", which is in (0,1). Since the larger result, say, f(s) greater, g(s) smaller, indicates g(s) can represent f(s) better, this sample should be accepted with greater probability. So now we can just generate u~U(0,1), and s~g(s), if u < the threshold, we accept. The process means, the more f(s)/(Mg(s)) close to 1, the higher probability u

  • @edwardmartin100
    @edwardmartin100 Před 3 lety

    Brilliant. Thanks so much

  • @sharmilakarumuri6050
    @sharmilakarumuri6050 Před 3 lety

    Ur explanation was awesome

  • @omidomatali4510
    @omidomatali4510 Před 2 lety

    13:10 , dude, if the ratio is high doesnt mean that f(s) is high, its a ratio. and u started with we know f, which i think would be the problem, that we dont. and g and f modes and the whole curvature of f and g are kind of parallel, in the case of multimodal g, and unimodal f, i think this is not a good way to calculate p(s), it would damp the data beneath one of the modes of g. still good explanation

  • @nudelsuppe2090
    @nudelsuppe2090 Před 2 lety

    Thank you!!

  • @jonathanparlett1172
    @jonathanparlett1172 Před rokem

    You say we need g(s) to be close to p(s), but you also say we don't know p(s) and in the illustration you show g(s) close to f(s), not p(s). Also in most of the other materials I see covering this method they seem to say that f(s) is in fact the known target distribution you want to sample from, just that it might be difficult to sample from directly.

  • @tianjoshua4079
    @tianjoshua4079 Před 3 lety +2

    This is a great explanation! I do have a specific practical question though. In the student score example, how do we practically get f(s)? Since the issue is we know f(s) yet we don't know but want to know p(s), it seems very curious to me how we could get f(s) in such an abstract math form or any math form?

    • @ritvikmath
      @ritvikmath  Před 3 lety +3

      This is a valid question and indeed something that I also had confusion over for a long time. This is a common case in Bayesian stats.
      For example, P(A|B) is proportional to P(B|A)P(A) / P(B) by Bayes theorem. We might care about sampling from P(A|B) which is the posterior but don't know its full form since the denominator P(B) might be difficult to compute. So, we can use Accept-Reject sampling to still sample from the posterior given only the numerator in Bayes theorem.

    • @tianjoshua4079
      @tianjoshua4079 Před 3 lety

      @@ritvikmath That makes much sense. P(B) is the normalizing constant. It is interesting questions like this come up all over the place in engineering.

  • @vs7185
    @vs7185 Před 2 lety

    Excellent explanation and mathematical proof! By the way, is it same as or related to "Importance sampling"?

  • @sukursukur3617
    @sukursukur3617 Před 7 měsíci

    Very good

  • @snp27182
    @snp27182 Před 2 lety

    So just to be sure, M*g(s) isn't technically a probability density because integrating M*g(s) over s would give a value greater than 1 right?
    ie: M scales probabilities of s, not the observable values of s?
    [edit] Actually the scaling makes sense I think, I was confusing your f(s) which isn't a pdf, for p(s)

  • @juanete69
    @juanete69 Před 2 lety

    When here you say "a sample" do you mean all the observations of a sample with a given size? Or do you mean the mean of that sample?

  • @rashedulalam2882
    @rashedulalam2882 Před měsícem

    thanks

  • @shrill_2165
    @shrill_2165 Před 7 měsíci

    Thanks dawg

  • @Kazzintuitruaabemss
    @Kazzintuitruaabemss Před rokem

    Thank you for the great explanation. I am studying this concept for an actuarial exam, and my textbook says the probability of accepting a sample is 1/M "on average." Is this just because they are assuming f(x) is a pdf already? The book doesn't mention normalizing constants at all.

  • @_STEFFVN_
    @_STEFFVN_ Před 2 lety

    Wouldn't the NC multiply to the integral of f(s)*ds to make it equal 1? Therefore it should be 1/NC = integral of f(s)*ds, no?

  • @renemanzano4537
    @renemanzano4537 Před rokem

    Superb

  • @Flaaazed
    @Flaaazed Před 4 měsíci

    you're saying its hard to integrate -inf to +inf of some difficult pdf f(x), but that integral is equal to 1 right since its a pdf? so its not hard?

  • @samwhite4284
    @samwhite4284 Před 3 lety +1

    Question - is it assumed that the threshold for classification (Accept vs Reject) from that probability function [f(x)/Mg(x)] is at 0.5?

    • @ritvikmath
      @ritvikmath  Před 3 lety +1

      So, [f(x)/Mg(x)] will be some number, say its 0.1. Then, we accept that sample with probability 0.1. That means, we generate some random number "u" from the Uniform distribution and if it is

  • @JakeGreeneDev
    @JakeGreeneDev Před 2 lety

    Great video but I have a follow-up question: we were told to assume that our equation for P(A|s) can be interpreted as a probability. Why? Can you point to a proof for this?

  • @phy_dude
    @phy_dude Před rokem

    Thanks a bunch

  • @neelabhchoudhary2063
    @neelabhchoudhary2063 Před 4 měsíci

    how do you know whether to accept or reject your probability?

  • @BruinChang
    @BruinChang Před 2 lety +1

    I am a little bit confused about inverse sampling. If I already had a pdf, why does I still need to bother inverse transformation to simulate a random number of the pdf I already obtained?

  • @user-or7ji5hv8y
    @user-or7ji5hv8y Před 3 lety +1

    Why do we know the pdf? Can you provide a real example of how we know the pdf, even though it may be hard to sample from it? Like the example you provided above, with exponentials. How did we even know that the pdf had that analytical form?

  • @soumyajitganguly2593
    @soumyajitganguly2593 Před rokem

    I dont get where the numerator assumption comes from. In real life I would just have the scores of the students like 75, 63, 91 etc... Yes I can create an histogram from them but what is the numerator here?

  • @bennettcohen130
    @bennettcohen130 Před 2 lety

    Holy fuck this is so clear

  • @amithanina25
    @amithanina25 Před 2 lety

    Thanks for the great explanation!
    Do you have any references for books about Accept-Reject Sampling?

  • @Pruthvikajaykumar
    @Pruthvikajaykumar Před 2 lety

    With this method, we're trying to find p(s) right? and p(s) is f(s)/constant. To use this method we need to know what f(s) is. Then don't we already know what p(s) is? Can someone explain?

  • @segevNr
    @segevNr Před 2 lety

    Damn, that was fuckin-tastic. really. you're deffinitely falling into einstein definition of genius. Thanks a lot!!!