Beyond Surface Statistics: LDM deep dive

Sdílet
Vložit
  • čas přidán 8. 09. 2024

Komentáře • 3

  • @wolpumba4099
    @wolpumba4099 Před rokem +1

    - *Goal* : To find out if the stable diffusion model has an internal understanding of 3D scenes and objects.
    - *Architecture of Stable Diffusion Model* :
    - *Latent Diffusion Model (LDM)* : Works in conjunction with the variational autoencoder (VAE) within the Stable Diffusion framework.
    - *Variational Autoencoder (VAE)* : The VAE's encoder compresses the image into a latent variable, which is then transformed by the LDM through a diffusion process. The VAE's decoder then converts the denoised latent variable back to the image space.
    - *Bottleneck* : The bottleneck in the VAE represents the compressed latent space where the LDM operates, specifically containing 8x8x1280 variables.
    - *Probes* :
    - *Linear Probes* : Used to identify if the layers contain depth information.
    - *Training Data* : Synthetic data and MiDaS (for Monocular Depth Estimation) generate ground truth.
    - *Example of MiDaS* : czcams.com/video/UjaeNNFf9sE/video.html
    - *Binary Depth Map* : A second approach that distinguishes foreground and background.
    - *Latent Modification* : Experiment to shift objects in the image.
    - *Conclusion* :
    - Insightful paper on where depth information is stored.
    - Depth is present very early in the diffusion iteration, even when the image is still very noisy.
    - Using the linear probes, less depth information can be found in the bottleneck layer (8x8x1280) compared to neighboring layers. This is likely because the bottleneck has only 8x8 spatial resolution vs 16x16 of the neighbors.

  • @wolpumba4099
    @wolpumba4099 Před rokem +1

    Start of rapid review: czcams.com/video/3updXylOFvY/video.html

  • @wolpumba4099
    @wolpumba4099 Před rokem +1

    The paper you are referring to here czcams.com/video/3updXylOFvY/video.html about the flaw in stable diffusion is discussed in this video: czcams.com/video/o1eJAUe-czQ/video.html (Diffusion is flawed)