Variational Autoencoder from scratch in PyTorch

Sdílet
Vložit
  • čas přidán 30. 06. 2024
  • ❤️ Support the channel ❤️
    / @aladdinpersson
    Paid Courses I recommend for learning (affiliate links, no extra cost for you):
    ⭐ Machine Learning Specialization bit.ly/3hjTBBt
    ⭐ Deep Learning Specialization bit.ly/3YcUkoI
    📘 MLOps Specialization bit.ly/3wibaWy
    📘 GAN Specialization bit.ly/3FmnZDl
    📘 NLP Specialization bit.ly/3GXoQuP
    ✨ Free Resources that are great:
    NLP: web.stanford.edu/class/cs224n/
    CV: cs231n.stanford.edu/
    Deployment: fullstackdeeplearning.com/
    FastAI: www.fast.ai/
    💻 My Deep Learning Setup and Recording Setup:
    www.amazon.com/shop/aladdinpe...
    GitHub Repository:
    github.com/aladdinpersson/Mac...
    ✅ One-Time Donations:
    Paypal: bit.ly/3buoRYH
    ▶️ You Can Connect with me on:
    Twitter - / aladdinpersson
    LinkedIn - / aladdin-persson-a95384153
    Github - github.com/aladdinpersson
    Useful article on VAEs:
    / paper-summary-variatio...
    Timestamps:
    0:00 - Introduction
    2:45 - Model architecture
    15:50 - Training loop
    31:10 - Inference example
    39:10 - Ending

Komentáře • 64

  • @nikitaandriievskyi3448
    @nikitaandriievskyi3448 Před rokem +22

    100% Agree that if you write everyting from scratch line by line it is much better than having it prewritten

  • @marcocastangia4394
    @marcocastangia4394 Před rokem +23

    Great content. I've always loved your "from scratch" tutorials.

  • @Uminaty
    @Uminaty Před rokem +9

    As usual, it's amazing content ! Thank you so much for your work

  • @chrisoman87
    @chrisoman87 Před rokem +23

    Hey just watched your video, really good! But its obvious this is a new area for you (which is not bad), so I thought I'd give you some pointers to improve your algorithm.
    1. In practice VAE's are typically trained by estimating the log variance not the std, this is for numerical stability and improves convergence of the results so your loss would go from:
    `- torch.sum(1 + torch.log(sigma.pow(2)) - mu.pow(2) - sigma.pow(2))` ->
    `-0.5 * torch.sum(1 + log_var - mu.pow(2) - log_var.exp()`
    (where log_var is the output of your encoder, also your missing a factor 0.5 for the numerically stable ELBO)
    Also, the ELBO is the Expectation of the reconstruction loss (the mean in this case) and the negative sum of the KL divergence
    2. The ELBO (the loss) is based on a variational lower bound its not just a 2 losses stuck together as such arbitrarily weighting the reconstruction loss and the KL divergence will give you unstable results, that being said your intuition was on the right path. VAEs are getting long in the tooth now and there are heavily improve versions that focus specifically on "explainable" if you want to understand them I would look at the Beta-VAE paper (which weights the KL divergence) then look into Disentagled VAE (see: "Structured Disentangled Representations", "Disentangling by Factorising") these methodologies force each "factor" into a normal Gaussian distribution rather than mixing the latent variables. The result would be for the MNIST with a z dim of 10 each factor representing theoretically a variation of each number so sampling from each factor will give you "explainable" generations.
    3. Finally your reconstruction loss should be coupled with your epsilon (your variational prior), typically (with some huge simplifications) MSE => epsilon ~ Gaussian Distribution, BCE => epislon ~ Bernoulli distribution

  • @nathantrance7558
    @nathantrance7558 Před rokem +2

    You are truly a life saver sir. Thank you for keeping everything simple instead of using programming shenanigans just to make it more complicated and unreadable.
    Love your tutorials, I learned a lot from your line of thinking, including the ranting things.

  • @pheel_flex
    @pheel_flex Před rokem +8

    Great video, thank you! Please don't change to having pre-written code. Your approach is the best that can be found these days.

  • @starlite5097
    @starlite5097 Před rokem +2

    Awesome work. Please do more stuff with GANs or visual transformers.

  • @dr_rahmani_m
    @dr_rahmani_m Před rokem

    I like the thought process. So, thanks for the 'from scratch' tutorials.

  • @lorandy_lo2283
    @lorandy_lo2283 Před rokem

    Thanks for your amazing implemention and interpretation!

  • @tode2227
    @tode2227 Před 3 měsíci

    Again an awesome from-scratch video! I have never seen programming videos in which it is so simple to follow what the person is coding, thank you.
    Currently, there are no videos about stable diffusion from scratch, which include the training scripts.
    It would be great to see a video on this!

  • @manolisnikolakakis7292

    Thank you very much for your tutorials. They have been incredibly helpful and insightful.

  • @davidlourenco6989
    @davidlourenco6989 Před rokem

    I prefer from scratch too for all the reasons you've mentioned. Thanks for the content .

  • @kl_moon
    @kl_moon Před 9 měsíci

    I love "from scratch" series, plz make more videos..!! and thank you so much!!!

  • @zetadoop8910
    @zetadoop8910 Před rokem +2

    Your videos are shockingly good! Among programming channels it is the best one.

  • @sangrammishra699
    @sangrammishra699 Před rokem

    very informative love the explanation of content and implementation from scratch

  • @AmeyMindex
    @AmeyMindex Před rokem

    Awesome Dude!!!! So great!!

  • @danyahhussein1073
    @danyahhussein1073 Před 2 měsíci

    Thanks Aladdin, you helped me a lot, thanks for the unique explanation, keep up the good!

  • @mizhou1409
    @mizhou1409 Před rokem +1

    Really helpful! you are awesome!!!

  • @prabhavkaula9697
    @prabhavkaula9697 Před rokem

    Awesome implementation tutorial❤️

  • @0liver19
    @0liver19 Před měsícem

    you are awesome. thank you for this immensely valuable resource!!

  • @mitchellphiri5054
    @mitchellphiri5054 Před rokem

    Finally kicking off this series, I've been waiting for years. Curious if you'll do VQ-VAEs like in the Jukebox example from OpenAI?

    • @AladdinPersson
      @AladdinPersson  Před rokem +1

      Yeah.. Don't have a structured plan for what's next but VQ-VAEs would be cool to understand

  • @avp300
    @avp300 Před rokem

    great explanation, thanks!

  • @edgarromeroherrera2886
    @edgarromeroherrera2886 Před 8 měsíci

    lovely video man, thankyou

  • @LiquidMasti
    @LiquidMasti Před rokem

    Very informative content . Also can you make shorts that explains small stuffs

    • @AladdinPersson
      @AladdinPersson  Před rokem

      Thanks Dhairya! Good idea, haven't figured out what to make on yet but will think about it:)

  • @parthsoni1076
    @parthsoni1076 Před 9 měsíci

    Thanks for the tutorial, it was simple yet insightful. Can you also make a video where you can combine different architecture such as Transformers or Residual blocks in Encoder-Decoder block of VAE.

  • @gtg238s
    @gtg238s Před rokem

    Dope stuff!

  • @donfeto7636
    @donfeto7636 Před rokem +1

    23:09 since you used sigmoid your pixels will be between 0 and 1 so it's okay to use sigmoid in this case otherwise if you use no activation function in the last layer of the decoder you need to use the new loss function of MSE +Reconstartion loss
    that what i think

  • @francescolee8233
    @francescolee8233 Před rokem

    So fast!Awesome!

  • @user-hq1jz5pb8w
    @user-hq1jz5pb8w Před rokem

    Great tutorials!! I can understand how to work on VAE!! ☺☺☺☺

  • @bennisyoutube
    @bennisyoutube Před rokem

    Amazing!

  • @ensabinha
    @ensabinha Před rokem

    I see few videos of you about GAN, so probably you want to have a look at Adversarial Autoencoders. Instead of using KLD, you can impose a prior on the latent using a discriminator.

  • @TsiHang
    @TsiHang Před 7 měsíci

    Had to learn about VAE with zero experience in coding or ML. Thank God I found this video 😅

  • @teetanrobotics5363
    @teetanrobotics5363 Před rokem +1

    ALADDIN PERSSON. YOUR CONTENT IS AMAAAZZZIIINNGGG !!! THANK YOU FOR PRACTICAL DEEP LEARNING WITH PYTORCH

    • @AladdinPersson
      @AladdinPersson  Před rokem

      Thanks & np!!

    • @teetanrobotics5363
      @teetanrobotics5363 Před rokem

      @@AladdinPersson If possible, please include this in the playlist and make more tutorials please. Loving it !!

  • @nikitaandriievskyi3448

    I'm speechleess, the content is too good

  • @carlaquispe9738
    @carlaquispe9738 Před rokem +1

    Maybe the "Attention Is All You Need" is worth to go through

  • @travelthetropics6190
    @travelthetropics6190 Před rokem

    Thanks a lot, is there any recommendation on TensorFlow VAE tutorial ?

  • @sahhaf1234
    @sahhaf1234 Před 7 měsíci

    First of all, thank you very much...
    Secondly, in line 74, should'nt we have epsilon = torch.randn_like(1) instead of epsilon = torch.randn_like(sigma)? Because we want an epsilon distributed in N(0,1) and then the next line will generate z which will be distributed in N(sigma, epsilon).

  • @chyldstudios
    @chyldstudios Před rokem

    Doing it from scratch is way better than just typing some pre-written code.

  • @silencedspec1711
    @silencedspec1711 Před rokem

    Miss it!

  • @mjmoosavizade8355
    @mjmoosavizade8355 Před rokem

    PyTorch has a loss function for KL divergence, I was wondering if it's possible to use that instead of writing it?

    • @AladdinPersson
      @AladdinPersson  Před rokem

      Yeah that should be possible.. haven’t tried it though.

    • @AladdinPersson
      @AladdinPersson  Před rokem

      Yeah that should be possible.. haven’t tried it though.

  • @benx1326
    @benx1326 Před rokem

    do more vids about vision transformers

  • @fizipcfx
    @fizipcfx Před rokem +1

    isnt
    self.activation = nn.relu
    better?

    • @AladdinPersson
      @AladdinPersson  Před rokem +1

      Yeah, maybe slightly confusing if we’d be using multiple activations?

    • @fizipcfx
      @fizipcfx Před rokem

      @Aladdin Persson i guess you are right that way its more clear

  • @GoldenMunkee
    @GoldenMunkee Před rokem

    I just have to say that, even as someone with a Master's in Data Science from a top university, I still use your tutorials for my work and my projects. Your stuff is incredibly helpful from a practical perspective. In school, they teach you theory with little to no instruction on how to actually build anything. Thank you so much for your hard work!!

  • @user-fb9zv9cf1s
    @user-fb9zv9cf1s Před 5 měsíci

    Code from 15:05 so you don't need to type it all:
    import torch
    import torchvision.datasets as datasets
    from tqdm import tqdm
    from torch import nn, optim
    from model import VariationalAutoEncoder
    from torchvision import transforms
    from torchvision.utils import save_image
    from torch.utils.data import DataLoader

  • @user-cd2cu6dy6k
    @user-cd2cu6dy6k Před rokem

    why machine learning is easy to learn? Because a lot of amazing guys are making videos about explaining papers and writing codes line by line.

  • @chickenp7038
    @chickenp7038 Před rokem +2

    please do not have the code prewritten

    • @AladdinPersson
      @AladdinPersson  Před rokem +3

      Agree. I get overwhelmed if someone shows the entire code. Much easier to get guided through it step by step imo, but open to the idea that there might be better ways to explain code

  • @riis08
    @riis08 Před rokem

    Its always to good write the code from scratch....

  • @LucaBovelli
    @LucaBovelli Před měsícem

    are you the son of notch (markus persson)?

  • @agsantiago22
    @agsantiago22 Před rokem

    Merci !

  • @marcel2711
    @marcel2711 Před 6 měsíci

    mnist dataset lol. all samples/videos using the same DS. so boring. create your own dataset, implement something interesting