Tutorial on Denoising Diffusion-based Generative Modeling: Foundations and Applications

Sdílet
Vložit
  • čas přidán 21. 07. 2024
  • This video presents our tutorial on Denoising Diffusion-based Generative Modeling: Foundations and Applications. This tutorial was originally presented at CVPR 2022 in New Orleans and it received a lot of interest from the research community. After the conference, we decided to record the tutorial again and broadly share it with the research community. We hope that this video can help you start your journey in diffusion models.
    Visit this page for the slides and more information:
    cvpr2022-tutorial-diffusion-m...
    Outline:
    0:00:00 Introduction (Arash)
    0:08:17 Part 1: Denoising Diffusion Probabilistic Models (Arash)
    0:52:14 Part 2: Score-based Generative Modeling with Differential Equations (Karsten)
    1:47:40 Part 3: Advanced Techniques: Accelerated Sampling, Conditional Generation (Ruiqi)
    2:37:39 Applications 1: Image Synthesis, Text-to-Image, Semantic Generation (Ruiqi)
    2:58:29 Applications 2: Image Editing, Image-to-Image, Superresolution, Segmentation (Arash)
    3:20:42 Applications 3: Discrete State Models, Medical Imaging, 3D & Video Generation (Karsten)
    3:35:20 Conclusions, Open Problems, and Final Remarks (Arash)
    Follow us on Twitter:
    Karsten Kreis: / karsten_kreis
    Ruiqi Gao: / ruiqigao
    Arash Vahdat: / arashvahdat
    #CVPR2022 #generative_learning #diffusion_models #tutorial #ai #research
  • Věda a technologie

Komentáře • 67

  • @piyushtiwari2699
    @piyushtiwari2699 Před rokem +186

    It is amazing to see how one of the biggest companies in the world are collaborating to produce tutorials but couldn't invest in 10$ mic.

    • @listerinetotalcareplus
      @listerinetotalcareplus Před rokem +4

      lol cannot agree more

    • @Vanadium404
      @Vanadium404 Před 10 měsíci

      That was hard 💀

    • @TheAero
      @TheAero Před 10 měsíci

      Just goes to show that lot's of people and companies don't try for exellence.

    • @atharvramesh3793
      @atharvramesh3793 Před 10 měsíci +3

      I think they are doing this on their personal capacity. It's a re-recording

    • @conchobar0928
      @conchobar0928 Před 4 měsíci

      lmao they rerecorded it and gave a shout out to the people who commented about the audio!

  • @mipmap256
    @mipmap256 Před rokem +25

    Don't complain about the audio is terrible. It is diffused with Gaussian noises. You need to decode the audio first.

  • @redpeppertunaattradersday1967

    Thanks for comprehensive introduction! it is really helpful :)

  • @saharshbarve1966
    @saharshbarve1966 Před rokem

    Fantastic presentation. Thank you to the entire team!

  • @maerlich
    @maerlich Před rokem +7

    This is a brilliant lecture!! I've learned so much from it. Thank you, prof. Arash, Ruigui and Karsten!

  • @Vikram-wx4hg
    @Vikram-wx4hg Před 2 lety +13

    Dear Arash, Karsten, and Ruiqui, thanks a ton for putting this up!
    Was referring to your tutorial slides earlier, but this definitely helps much better.

  • @ksy8585
    @ksy8585 Před 11 měsíci +11

    So my question is if any diffusion model can de-noise audio of this fantastic tutorial.

  • @karthik.mishra
    @karthik.mishra Před 4 měsíci

    Thank you for uploading! This was very helpful!!

  • @linhanwang4937
    @linhanwang4937 Před měsícem

    Very insightful and comprehensive presentation. Thank you al so much!

  • @weihuahu8179
    @weihuahu8179 Před 6 měsíci

    Amazing tutorial -- very helpful!

  • @deeplearningpartnership

    Thanks for posting.

  • @amortalbeing
    @amortalbeing Před rokem

    Good stuff. Thanks guys

  • @danielhauagge
    @danielhauagge Před rokem +25

    Awesome video, thanks for posting. One thing though, the audio quality is pretty bad (low volume, sounds very metallic).

  • @nikahosseini2244
    @nikahosseini2244 Před rokem

    Thank you great lecture

  • @salehgholamzadeh3368
    @salehgholamzadeh3368 Před měsícem

    One the best videos! Thanks @All

  • @mehdidehghani7706
    @mehdidehghani7706 Před rokem +1

    Thank you very much

  • @danielkiesewalter3097
    @danielkiesewalter3097 Před rokem +1

    Thanks for uploading this video! Great resource, which covers the topics in a good depth and pace. Only point of critique I have would be to use a better microphone next time, as it can be hard at times to understand what you are saying. Other than that great video.

  • @ankile
    @ankile Před rokem +9

    The content is fantastic, but the sound quality makes it materially harder to follow everything. A good microphone would lift the quality enormously!

  • @prabhavkaula9697
    @prabhavkaula9697 Před rokem +5

    Thank you for recording and uploading the tutorial. It is helpful in understanding the sudden boom in diffusion models and compares the techniques very well.
    I wanted to know if the slides for the part 3 are slightly different from the slides on the website? (for eg. the slide 11)

  • @parsarahimi71
    @parsarahimi71 Před rokem

    Nice job arash ...

  • @Farhad6th
    @Farhad6th Před rokem

    The quality of the presentation was so good. Thanks.

  • @Vikram-wx4hg
    @Vikram-wx4hg Před 2 lety +23

    Why do I feel that there is an audio issue (in recording) with this video for the first two speakers?

    • @windmaple
      @windmaple Před rokem +33

      Would have been great if they run diffusion process on the audio

    • @90bluesun
      @90bluesun Před rokem

      same here

  • @bibiworm
    @bibiworm Před rokem

    3: 1:58:35 On the previous slide, it reads that Variational diffusion models, unlike diffusion models using a fixed encoder, include learnable parameters in the encoder. So the training objectives on this paper are for reverse diffusion process or the entire forward and reverse diffusion process? I also do not understand when the speaker said "if we want to optimize forward diffusion process in the continuous time setting, we only need to optimize the signal-to-noise ratio at the beginning and the end of the forward process."

  • @Vikram-wx4hg
    @Vikram-wx4hg Před rokem +1

    A request: Please record this video again. I come to it many times, but the audio recording quality makes it very difficult to comprehend.

  • @bibiworm
    @bibiworm Před rokem

    2. 2:00:38. can take a pre-trained diffusion model but with more choices of sampling procedure。What does it mean? Would it be possible to find answers in the paper listed in the footnote? Thanks.

  • @bibiworm
    @bibiworm Před rokem

    This feels like a semester's course in 4 hours. I have so many questions. I am just gonna ask hoping someone can shed some light.
    1. 2:05:26 I don't quite understand the conclusion there: since these three assumptions hold, no need to specify q(x_t|x_t-1) as markovian process? What's the connection there? Thanks.

  • @theantonlulz
    @theantonlulz Před rokem +11

    Good god the audio quality is horrible...

  • @maksimkazanskii4550
    @maksimkazanskii4550 Před rokem +1

    Guys please apply the diffusion process to the audio. Excellent material but almost impossible to listen due to the quality of the audio.

  • @awsaf49
    @awsaf49 Před rokem +2

    Thank you for the tutorial. Feedback: The audio quality is quite bad, having hard time understanding the words even using youtube transcription.

  • @howardkong8927
    @howardkong8927 Před rokem +3

    Part 3 is a bit hard to follow. A lot of formulae are shown without an explanation of what they mean.

  • @Cropinky
    @Cropinky Před 2 měsíci

    thx

  • @zongliangwu7461
    @zongliangwu7461 Před 2 měsíci

    Do you have an open source denoiser to denoise the recording? Thanks

  • @liuauto
    @liuauto Před rokem +4

    22:08 is there any theory to explain why we can use a Gaussian to do the approximation as long as beta is small enough?

    • @piby2
      @piby2 Před rokem +1

      Yes, search for Kolmogorov's forward and backward Markov chains

  • @smjain11
    @smjain11 Před rokem

    At around 16:24 marginal is equated to joint . Didnt quite comprehend it. Can you please explain

    • @smjain11
      @smjain11 Před rokem

      Is the reason that we are generating the gaussian at time t by multiplying a diffusion kernel (which itself is a gaussian) and gaussian at time t-1. So joint representation of t-1 gaussian with kernel at t-1 is the marginal at t. And the problem setup is to learn the reverse diffusion kernel at each step

    • @dhruvbhargava5916
      @dhruvbhargava5916 Před rokem +1

      the point of the equation is to demonstrate that as we don't have the exact PDF at all time steps we can't just sample x(t) from q(x(t)) directly instead we sample x(0) from initial distribution(sample a data point from the dataset) and then transform the data point using diffusion kernel to obtain a sample at timestep t do this for enough data points and you can approximate the distribution q(x(t)).
      q(x(t)) is marginal i.e. independent of x(0).
      Now coming to the equated part, marginal probability is not being equated to joint probability, let's see how!
      the x(0) is not a value in it self it is a random variable, to avoid confusions let's change x(0) with i.
      >now i is a random variable which can take any value in the given domain(data set),let us assume an image data set.
      >then q(i) describes the distribution of our data set.
      >q(i=image(1)) describes the probability of i being image(1)
      >now let i(t)=x(t)
      >q(i(t)) describes the approximate distribution of noisy images which we got by repeteadly sampling images from initial dataset at time step 0 and then carrying out the diffusion process for t time steps(t convolutions).
      >q(i(t),i) describes the joint probability distribution for all possible pairs of values of i and i(t).
      >q(i(t)=noisy_image,i=image(1)) describes the joint probability of the pair occuring, i.e. the probability that we started with i = image(1) and then after t time steps ended up with i(t) = noisy_image,which is = q(i=image(1)).q(i(t)=noisy_image|i=image(1)),Here image(1) is the first image in data set and noisy_image is a certain noisy image sampled from q(i(t)).
      > now imagine we calculated q(i(t)=noisy_image ,i) for all possible values of i(i.e. all possible images in the data set as starting point) then added all these probabilities what we would end up with is the probability of getting i(t) = noisy_image independent of what value we chose for the random variable i, this value is represented by q(i(t)=noisy_image).
      > the integral q(i).q(i(t)| i).di gives us the above mentioned quantity, now an important thing to note is the video explanantion assumes the dataset to be continuos, while explaining the said part, where as in my explanation I assumed a training set of images which is discrete, so the integral can be substituted by a summation over all samples.

  • @RishiYashParekh
    @RishiYashParekh Před rokem

    What is the probability distribution of q(X0). Because that is the original image itself. So will it be a distribution ?

    • @meroberto8370
      @meroberto8370 Před rokem

      It's not known . Check 26:47 . You try to approximate it by decreasing the divergence between the distribution you get when adding some noise to the image (it becomes Gaussian as you do it) and the reverse process where you generate the image through model distribution (also Gaussian ). So in other words by decreasing q(x|noise)/p(x) divergence you approximate the data distribution without knowing it.

    • @bibiworm
      @bibiworm Před rokem

      @@meroberto8370 So by equations on page 25, all we need is x_t, produced by the forward diffusion, some hyper-parameters, such as beta, etc., forward diffusion epsilon which is known, backward diffusion epsilon which is estimated by U-Net....

  • @houzeyu1584
    @houzeyu1584 Před 4 měsíci

    Hello, I have question in 10:36, I known N(mu, std^2) is a Normal distribution, how to understand N(x_t; mu, std^2) ?

    • @user-jp5cb8gm7y
      @user-jp5cb8gm7y Před 4 měsíci

      It indicates that x_t​ is a random variable distributed according to a Gaussian distribution with a mean of μ and a variance of σ^2.

  • @piby2
    @piby2 Před rokem +2

    Fantastic tutorial, I learnt a lot. Please buy a good microphone for future, or make a Gofundme for a mic, I will be happy to donate.

  • @mehmetaliozer2403
    @mehmetaliozer2403 Před 7 měsíci

    waiting for diffusion workshop 2023 records 🙏

  • @steveHoweisno1
    @steveHoweisno1 Před rokem

    I am confused by a very basic point. At 22:38, he says that q(x_{t-1}|x_t) is intractable. But how can that be?? It's very simple. Since
    x_t = sqrt(1-beta)*x_{t-1}+sqrt(beta)*E where E is N(0,I).
    Therefore
    x_{t-1} = (1-beta)^{-1/2}x_t - sqrt{beta/(1-beta)}*E
    = (1-beta)^{-1/2}x_t + sqrt{beta/(1-beta)}*R
    where R is N(0,I) (since negative of a Gaussian is still Gaussian.
    Therefore
    q(x_{t-1}|x_t) = N(x_{t-1}; (1-beta)^{-1/2}x_t, beta/(1-beta)I).
    What gives?

    • @Megabase99
      @Megabase99 Před rokem

      A little bit late, but, from what i have understood the real problem is that, q(x_{t-1}|x_t) depends on x_t, but, x_t is a distribution that need the entire dataset to be calculated, because you don't have a sample that defined x_t|x_0, but you are just asking for the distribution of x_t for all x_0.
      However, q(x_{t-1}|x_t,x_0) is calculable, given a start sample x_0 you can describe the distribution of x_t related to x_0

  • @coolarun3150
    @coolarun3150 Před rokem +1

    It's a nice lecture, but DDPM explanation can't be justified as a tutorial here, but if you already have a basic idea of the DDPM process, then it will make sense. good for revising math behind DDPM but not a detailed tutorial if you are new to it.

  • @Vanadium404
    @Vanadium404 Před 10 měsíci

    You can add captions please? The voice quality is so absurd and the generated captions are not accurate. Thanks for the tutorial btw Kudos

  • @jeffreyzhuang4395
    @jeffreyzhuang4395 Před rokem +1

    The sound quality is horrible.

  • @rarefai
    @rarefai Před rokem +1

    You may still need to re-record this tutorial to correct and improve the audio quality. Resoundingly poor.

  • @andreikravchenko8250
    @andreikravchenko8250 Před rokem +2

    audio is horrible.

  • @MilesBellas
    @MilesBellas Před měsícem

    The audio is too garbled.
    Maybe use an Ai voice for clarity?

  • @anatolicvs
    @anatolicvs Před rokem +2

    Please, purchase a better microphone......

  • @manikantabandla3923
    @manikantabandla3923 Před rokem +1

    Part 2 voice is terrible.

  • @aashishjhaa
    @aashishjhaa Před 3 měsíci

    bro buy a mic please and rerecord this your voice sound so much muddy

  • @wminaar
    @wminaar Před rokem +1

    poor video quality, pointless delivery

  • @Matttsight
    @Matttsight Před 11 měsíci

    why the f all these people didnt have a common sense to buy a good mic, I dont know , what is the use of this video if it is not delivering its value ? and Another bunch of people who is writing research papers so that no one in the world should not understand. Ml community will improve better only if these knowledge is accessible.

  • @Bahador_R
    @Bahador_R Před 9 měsíci

    bah bah! lezzat bordam!