This is REAL?! Stable Diffusion 3 BEATS both DALL-E 3 & Midjourney v6.

Sdílet
Vložit
  • čas přidán 21. 02. 2024
  • Stable Diffusion 3 is a text-to-image model from Stability AI. It is a diffusion transformer, which is a new type of architecture similar to the one used in the OpenAI Sora model.
    ▼ Link(s) From Today’s Video:
    Stable Diffusion 3: stability.ai/news/stable-diff...
    Emad's Twitter: / emostaque
    ► MattVidPro Discord: / discord
    ► Follow Me on Twitter: / mattvidpro
    -------------------------------------------------
    ▼ Extra Links of Interest:
    ✩ AI LINKS MASTER LIST: www.futurepedia.io/
    ✩ General AI Playlist: • General MattVidPro AI ...
    ✩ AI I use to edit videos: www.descript.com/?lmref=nA4fDg
    ✩ Instagram: mattvidpro
    ✩ Tiktok: tiktok.com/@mattvidpro
    ✩ Second Channel: / @matt_pie
    -------------------------------------------------
    Thanks for watching Matt Video Productions! I make all sorts of videos here on CZcams! Technology, Tutorials, and Reviews! Enjoy Your stay here, and subscribe!
    All Suggestions, Thoughts And Comments Are Greatly Appreciated… Because I Actually Read Them.
    -------------------------------------------------
    ► Business Contact: MattVidProSecond@gmail.com
  • Věda a technologie

Komentáře • 488

  • @MattVidPro
    @MattVidPro  Před 2 měsíci +156

    Apparently I will be getting access in a few days? Livestream where I take prompts suggested by you guys?

    • @GaryJr530
      @GaryJr530 Před 2 měsíci +1

      I'm here 🤓

    • @24-7gpts
      @24-7gpts Před 2 měsíci

      So freaking awesome! Thanks Matt.

    • @HeedReactionzz
      @HeedReactionzz Před 2 měsíci

      Yesssss

    • @Dude_Wassup
      @Dude_Wassup Před 2 měsíci

      Hell yeah

    • @infocyde2024
      @infocyde2024 Před 2 měsíci

      I will try to catch that live stream, all these crazy AI drops lately, getting exciting again!

  • @Yipper64
    @Yipper64 Před 2 měsíci +61

    2:28 I like how it says "in the corner" but doesnt specify *what* corner, but the bottom left made the most sense.

  • @fabiankliebhan
    @fabiankliebhan Před 2 měsíci +138

    Is this the first time open source image creation beats all state of the art image creation tools available? I think so. HUGE

    • @shApYT
      @shApYT Před 2 měsíci +7

      Deep Floyd beat dalle 3 to generating coherent text.

    • @timeTegus
      @timeTegus Před 2 měsíci +2

      i think stabillety was 2 times bevore that on top

    • @carkawalakhatulistiwa
      @carkawalakhatulistiwa Před 2 měsíci

      But open ai have Sora to Made image by 2 month

    • @mirek190
      @mirek190 Před 2 měsíci

      nope ... I was using it some time ago and is far behind ... SDXT beats Deepfloyd totally ...not mention Dalle 3 .... @@shApYT

    • @jarblewarble
      @jarblewarble Před 2 měsíci

      I still haven't seen an open-source model as realistic as Sora.

  • @dolcruz6838
    @dolcruz6838 Před 2 měsíci +86

    so now we reached the point where open source is ahead of closed models. What a time to be alive!

    • @MedicinalSquishing
      @MedicinalSquishing Před 2 měsíci +12

      Was that comment intended to serve as a reference to TwoMinutePapers?

    • @AmandaFessler
      @AmandaFessler Před 2 měsíci +8

      @@MedicinalSquishing My first thought was 2MP, yes. Imagined it in his voice, even.

    • @ChrisS-oo6fl
      @ChrisS-oo6fl Před 2 měsíci +1

      Nope look at those hands especially in the pictures with clowns. I thought we got past this. Regardless how good it is a text generation and prompt coherency these massive problems that are 100% fixed by closed source models still seem to haunt us.

    • @dolcruz6838
      @dolcruz6838 Před 2 měsíci +3

      Maybe@@MedicinalSquishing

    • @eliteextremophile8895
      @eliteextremophile8895 Před 2 měsíci +2

      There's things that makes this so much ahead, but I believe Sora is one step ahead. However, that's most likely due to the fact that OpenAI has insane processing power in their servers. Still, practically Sora is more powerful image generator on the market.

  • @utfan971
    @utfan971 Před 2 měsíci +71

    "There are people today that are still using SDXL"
    As an SD1.5 user, I find this surprisingly offensive xD

    • @westingtyler2
      @westingtyler2 Před 2 měsíci

      what models? the best I've seen is epicRealism natural sin rc1vae.

    • @mirek190
      @mirek190 Před 2 měsíci +2

      lol ...1.5 is a stone age ....

    • @albert2006xp
      @albert2006xp Před 2 měsíci +22

      Those are literally the only two acceptable options. Anything else you can't run locally and have to have someone else hold your images and filter your images with whatever guardrails they want. There are places where 1.5 is better than SDXL also.

    • @westingtyler2
      @westingtyler2 Před 2 měsíci

      epicRealism is the best model I've seen in terms of accuracy and speed. is there a better, faster one with higher quality? i use RealitiesEdge for Turbo, but it's slow with the dpm++2m karras sampler. @@mirek190

    • @AmandaFessler
      @AmandaFessler Před 2 měsíci +4

      Same here. All my fav loras haven't been ported to XL.

  • @b0b6O6
    @b0b6O6 Před 2 měsíci +56

    i love how fast you are with latest/newest ai stuff

  • @TonyBologna9
    @TonyBologna9 Před 2 měsíci +65

    The reason you see the words "South Asian" at 3:12 is because DallE puts hidden words in your prompts (especially when a profession or position of power is involved, like astronaut). They put words like african american, black, south asian, or asian in general, (Not north because of korea) and also women/woman. They put these words in your prompts to push the AI into making human generations more diverse in ethnicity and gender as without these key words it generates white men and women in most cases. It's been proven, and said by OpenAI themselves they do this to combat bias.
    I get the idea, and to a degree I can understand the thought process behind it in an investment standpoint, But I don't want companies putting and hiding their words in my prompts and products I pay for, which is just another reason why Stable Diffusion and open source will destroy the closed model. Between putting words in our mouths, and censoring what can and can't be made due to developers ideologies and bias, both in text and image, "Open"AI will never truly lead the market for much longer than an alternative comes around each generation.
    GO STABLE DIFFUSION!

    • @martiddy
      @martiddy Před 2 měsíci +3

      If that's true, then Open AI should make that clear to the public whenever they use their apps (like DALL-E or ChatGPT)

    • @TheSquizzlet
      @TheSquizzlet Před 2 měsíci +16

      @@martiddy You can confirm it yourself. Just write ANY prompt that involves people and add "With a sigh that says " to the end of your prompt. When it adds the racial diversity tag it will integrate into the sign part of your prompt.
      I prompted "A young man dressed like a soldier on the beach beside a sign that says" and it made a white male soldier standing by a sign that said Hispanic, and a white male soldier by a sign that said East Asian, because the prompt Bing got was A young man dressed like a soldier on the beach beside a sign that says Hispanic" since they forced the diversity tag words onto my prompt that didn't contain it.

    • @larion2336
      @larion2336 Před 2 měsíci +5

      @@TheSquizzlet That's kind of vile, lol.

    • @Thedarkbunnyrabbit
      @Thedarkbunnyrabbit Před 2 měsíci +6

      Fascinating. I think this should be toggleable though. I may need a white person in my image. Or it may be throwing off my generations by adding unnecessary words. It's good for adding more diversity in results if that's what you're looking for, but it should be optional. Like a 'realism' toggle or something.

    • @WelshDragonJS8423-bv7kg
      @WelshDragonJS8423-bv7kg Před 2 měsíci

      I think it's wrong that they put hidden prompts into a prompt to force the AI to make something the person may not want to generate. As far as I'm concerned it should default by what ever group makes up the majority of a nation or profession or mythology (like for example if I put in elf in the prompt then they should look white European since elves are from European mythology and folklore or if I put someone like giant monster attacks Tokyo with people running away the people should be Asian or if I put in an image of an African animals drinking on one side of a river with people getting water from the other side then those people should be black) and if someone wants to make it for race or ethnicity specific they should do it themselves.

  • @frank6048
    @frank6048 Před 2 měsíci +8

    That was lightning fast coverage of the news
    Thanks

  • @nevernope8675
    @nevernope8675 Před 2 měsíci

    Hey Matt! Love your videos found you a couple months ago and your videos have been so informative! I like how up to date you are, and I like the way you are able to explain things in a way that anyone can understand. I think it would be a cool idea if your subscribers were the ones telling you which prompts to use for the new stable diffusion, might be a fun way to show off the AI. Anyway great videos dude!

  • @pragmata7997
    @pragmata7997 Před 2 měsíci +45

    sora image generator might be better, but this is opensource so, yes, stable diffusion wins

    • @shaunralston
      @shaunralston Před 2 měsíci +7

      And, Sora will not be available anytime soon.

    • @jtjames79
      @jtjames79 Před 2 měsíci +15

      But will Sora be able to make pictures of white people?

    • @GamingXperience
      @GamingXperience Před 2 měsíci +2

      Yea thats what i thought as well, when openAI said Sora is better at creating images than dall-e. Which kinda makes sense, if you can generate videos that good, its probably really easy to just generate images.
      Although like you said open source is a another big advantage.

    • @mirek190
      @mirek190 Před 2 měsíci

      yes few months ... @@shaunralston

    • @Moyemor
      @Moyemor Před 2 měsíci +3

      ​@@jtjames79I'm an Indian.
      Why did you talk this type of comments ?
      What happened?

  • @ChrisH0Y
    @ChrisH0Y Před 2 měsíci +38

    Competition breeds perfection. What's Midjourney and DALL•E 3 gonna do? Will that $600 yearly subscription to Midjourney be worth it now? Better get to it!

    • @robxsiq7744
      @robxsiq7744 Před 2 měsíci +6

      Time to get those pictures moving.

    • @sooool4716
      @sooool4716 Před 2 měsíci

      Niji still blows the competition, so yes 600 yearly is still worth it imo

    • @robxsiq7744
      @robxsiq7744 Před 2 měsíci

      @@sooool4716 Obviously opinion. niji is anime. Anime is trash (imo), so wouldn't be worth a dollar a year.

    • @sooool4716
      @sooool4716 Před 2 měsíci

      @@robxsiq7744 it's facts, and niji isn't only anime. Nothing comes close to it.

    • @robxsiq7744
      @robxsiq7744 Před 2 měsíci

      @@sooool4716 facts that something looks better? Fact that pizza tastes better with carrots on it...its just science. :P
      I had the year sub for Mid and used Niji like...just a few times...no...I personally thought it was trash, but I know some will see treasure. You do you, I think Dall-E and SD are better overall, but Midjourney basic does have amazing style, I'll give em that.

  • @helmort
    @helmort Před 2 měsíci +9

    Who needs Twitter or a mailing list when you have Matt?
    💀☠💀☠💀

  • @LeChris89
    @LeChris89 Před 2 měsíci +2

    I can’t wait for this to come out and this is definitely a competitor to dalle 3! Love your vids man ❤

  • @emilrogengellschwaner3555
    @emilrogengellschwaner3555 Před 2 měsíci +5

    I love how we've gotten so far that the art itself really doesn't matter anymore. Its already practically perfect. What matters now is promps adherence and consistency

  • @hypersonicmonkeybrains3418
    @hypersonicmonkeybrains3418 Před 2 měsíci +9

    I predict that within a few months SD3 will be fully released at which point OpenAI will have to react, and they will release DALLE-4 based on Sora which will also be released, so their red-teams will be on a tight deadline to get their work done ready for full release.

  • @clarkkent6977
    @clarkkent6977 Před 2 měsíci +1

    Thank for the news Matt. I Ran a lot of these SD3 prompts in Stable Cascade also and it still hit or miss on the texts. Can't wait to get my hands on this.

  • @afrosymphony8207
    @afrosymphony8207 Před 2 měsíci +10

    bruh i shed a tear...we are finally at dalle3 level of prompt comprehension. not quality but when ppl start training shits going to get crazy

    • @Octamed
      @Octamed Před 2 měsíci

      I like/am disturbed that 'finally' in AI is only 5 months :D

    • @mh7a135
      @mh7a135 Před 2 měsíci

      ​@@Octamed AI images was looking horrible 2 years ago, now we have THIS

  • @Yic17Gaming
    @Yic17Gaming Před 2 měsíci +1

    Been using Stable Diffusion for a year now. Let's go!

  • @thenoblerot
    @thenoblerot Před 2 měsíci +8

    I bet OpenAI starts using Sora (also a diffusion transformer) as their default image generator sooner than later.

  • @coloryvr
    @coloryvr Před 2 měsíci +1

    Oh wow! I am very glad to see that Stability AI is taking over again! ...this is so cool!
    Happy colored Greetinx!

  • @MissChelle
    @MissChelle Před 2 měsíci +1

    I’m excited! So glad Stability is still working on and perfecting image generation rather than skipping on to video! Well done Stability Ai!!🇦🇺❤️

  • @AI-Jocke
    @AI-Jocke Před 2 měsíci +2

    The year is young Matt! 😊

  • @reifuTD
    @reifuTD Před 2 měsíci +2

    Seeing this and seeing what OpenAI can do with Sora Dall-e 4 is going to be lit when it comes out.

  • @cyborgmetropolis7652
    @cyborgmetropolis7652 Před 2 měsíci +4

    The ability to understand the number of items in a prompt seems like a no-brainer but it’s apparently difficult. Nice to see some progress in this.

  • @scottwatschke4192
    @scottwatschke4192 Před 2 měsíci

    I like your enthusiasm. I'm excited too.

  • @RicardoAum
    @RicardoAum Před 2 měsíci +101

    Matt's hair gets creazier every day. lol

    • @MattVidPro
      @MattVidPro  Před 2 měsíci +59

      I just got out da shower 😭

    • @tut4moon
      @tut4moon Před 2 měsíci +10

      Just like guy from the ancient aliens

    • @vincentvoillot6365
      @vincentvoillot6365 Před 2 měsíci +8

      @@MattVidPro Are your hairs transformers based ? Seem when the temperature is too high, they start to hallucinate and this news was really hot :D

    • @Allplussomeminus
      @Allplussomeminus Před 2 měsíci +11

      He's being SHOCKED by all the good news.

    • @mattmcdermitt6482
      @mattmcdermitt6482 Před 2 měsíci +6

      The technology has blown his mind to the point of unhinged scientist

  • @AdamIverson
    @AdamIverson Před 2 měsíci +17

    I'm glad to see that the diffuson is similar to Sora, which is one step closer to Sora like quality of video generation. I really wanted to see a true open-sourced alternative to Sora, no reason for OpenAI to have the monopoly.

    • @IceMetalPunk
      @IceMetalPunk Před 2 měsíci +7

      It's also the same architectural approach that Meta's V-JEPA uses. Seems like this architecture is the next step forward in generative AI in general: train a Transformer autoencoder on the raw data, then train a diffusion model to diffuse the latent space rather than the raw data and have the decoder convert that back. In essence, it's taking the approach from "learn how to remove noise in the image to get a matching image" to "learn how to remove noise in the concepts to get a matching concept, then recreate an image from the newly imagined concept".
      Which is pretty cool; in some ways, it's closer to how humans imagine things than ever before. And clearly, it's an effective method 😁

    • @AdamIverson
      @AdamIverson Před 2 měsíci

      @@IceMetalPunk I'm all for open-sourced. I'm more excited for this than Sora, considering a possible severely censored model from OpenAI that can limit our creativity. Look at Google Gemini, it even refused to generate an image of a puppy swimming in lava. That's pretty lame.

  • @dalecorne3869
    @dalecorne3869 Před 2 měsíci +1

    I'm super happy that this AI stuff is happening during my lifetime. I want to take a minute to thank you Matt, for all the videos you do. I use AI in one form or another every single day and you keep me on top of the trends....thank you !

  • @davidwoods1337
    @davidwoods1337 Před 2 měsíci +4

    Multimodal input? What I'd like to see is prompt driven inpainting, where control over changes can be given via text. e.g. "wrapped gift on a table with a christmas themed tablecloth" then "remove the tablecloth" and it would keep all other aspects of the image exactly the same

    • @IceMetalPunk
      @IceMetalPunk Před 2 měsíci +5

      There's already similar models for that in SD, like instruct-pix2pix. No doubt SD3 can have the same sort of model trained for it as well.

  • @JussimirPasold
    @JussimirPasold Před 2 měsíci +5

    Being open source also means that Dall E 3, Midjourney and other competitors can look into the code and build upon it, it’s a really unbelievable arms race

    • @mh7a135
      @mh7a135 Před 2 měsíci

      thats how DALL-E 3 became far better than DALL-E 2, and now openai gonna copy everything and slightly improve it again

  • @neomicryo
    @neomicryo Před 2 měsíci

    cant wait to use this

  • @psyboyo
    @psyboyo Před 2 měsíci +2

    You have a cute dog? Instant sub!
    Oh, wait, I am already subbed for the great content.

  • @Gunrun808
    @Gunrun808 Před 2 měsíci +1

    The ability to follow specific instructions is an important step to AGI

  • @TomiTom1234
    @TomiTom1234 Před 2 měsíci

    What a huge announcement, mind-blowing how accurate the pictures to the prompts are. But all this stay as an announcement, let's see the reality when it gets tested by you, Matt. can't wait.

  • @Vigilence
    @Vigilence Před 2 měsíci +2

    I hope they teach us how to prompt images for this model, so we can properly make Lora’s, remixes etc.

  • @erics7004
    @erics7004 Před 2 měsíci +3

    My hero doesn't wear a cape, he was born in Bangladesh and stands for open source AI.

  • @davidpurple3698
    @davidpurple3698 Před 2 měsíci

    Thanks a lot - can't wait. We need a video how to install for Mad's

  • @matthewoates
    @matthewoates Před 2 měsíci +6

    It'd be nice if Midjourney understood prompts a bit better, hopefully that's coming. Saying that, Midjourney just reigns supreme at beautiful art and imagery, the other generators haven't come close.

    • @ChrisS-oo6fl
      @ChrisS-oo6fl Před 2 měsíci

      Yep and anatomy. Stable diffusion 3 seems to go backwards if you look at those clowns. Plus the photo realism still no where near Midjourney.

  • @brodok4252
    @brodok4252 Před 2 měsíci +8

    Guess we'll have to see how much they've sold out to their VC overlords when it comes to censorship. A "safe, responsible practices" disclaimer never fails to make me uneasy.

    • @WelshDragonJS8423-bv7kg
      @WelshDragonJS8423-bv7kg Před 2 měsíci

      It's like they are acting like those in power will never use AI themselves to create misinformation lol

  • @FusionDeveloper
    @FusionDeveloper Před 2 měsíci

    Awesome, I look forward to finding it's limit.

  • @VaibhavShewale
    @VaibhavShewale Před 2 měsíci +1

    Key Takeaways for quick navigation:
    00:00 🚀 Introduction to Stable Diffusion 3
    - Stable Diffusion 3 surpasses DALL-E 3 in understanding and quality.
    - CEO highlights capabilities and open-source release.
    01:09 🖼 Examples of Stable Diffusion 3 Outputs
    - Diverse images demonstrate superior prompt adherence and coherence.
    - Sets a new standard in interpreting complex prompts.
    03:56 🛠 Technical Advancements and Capabilities
    - New diffusion Transformer architecture enables multi-modal inputs and competitive realism.
    - Sets a benchmark in image generation with scalability and coherence.
    06:14 🆚 Comparison with Other Models
    - Outperforms DALL-E 3 and Mid Journey V6 in prompt understanding and coherence.
    - Open-source release allows for further customization and refinement.
    08:34 🌐 Democratization and Future Outlook
    - Emphasizes democratizing AI access and creativity.
    - Open-source nature enables scalability and quality customization.

  • @gonkdroid8279
    @gonkdroid8279 Před 2 měsíci +5

    Love your dog

  • @SignumEternis
    @SignumEternis Před 2 měsíci +2

    Wow, that really is impressive. I thought open source was a ways off of catching up to Dall-E and Midjourney, glad to be wrong. Can't wait to try it for myself. The future really is going to be wild.

  • @caramell5841
    @caramell5841 Před 2 měsíci +2

    I really like how this is natural language now. Definitely gonna use this over DALLE-3

  • @MrTk3435
    @MrTk3435 Před 2 měsíci +1

    Whoo Hooo, so exclusive bravo Matt! I am so deeply thankful for SD 3, Now this is Truly for the good of humanity. Thank you very much 🔥🔥🔥🍷🔥🔥🔥

  • @ventonthorn3455
    @ventonthorn3455 Před 2 měsíci +1

    Thinking about where AI image generation was about a year and a half ago when I started messing around with Craiyon and MJ v.3 compared to what we have today is already mind blowing.
    Now this. 🤯

  • @chineseducksauce9085
    @chineseducksauce9085 Před 2 měsíci +13

    Big difference already from last year compared to now, wonder where we'll be by christmas

    • @worldino2390
      @worldino2390 Před 2 měsíci +1

      Actual text to video would be nice. The ultimate step for all these image generators is video.

    • @hipjoeroflmto4764
      @hipjoeroflmto4764 Před 2 měsíci

      No gaussian splatting/nerf worlds​@@worldino2390

  • @MyAmazingUsername
    @MyAmazingUsername Před 2 měsíci +2

    I expected this to require 80GB VRAM. To hear that it is just 8B parameters at max is mindblowing. 😮

    • @mirek190
      @mirek190 Před 2 měsíci

      SDXL has 2.3B parameters .... and takes less than 6GB Vram.
      So 24 GB vram ( rtx 3090) should be enough to run the biggest 8B model. ;)

    • @MyAmazingUsername
      @MyAmazingUsername Před 2 měsíci

      @@mirek190 Yeah especially after they optimize it to load stages dynamically so that it may only use like half of the memory at a time. I actually think a bigger problem may be LoRA training. It may not fit in 24 GB.

  • @I-Dophler
    @I-Dophler Před 2 měsíci +2

    Stable Diffusion 3's release marks a pivotal moment in AI image generation, emphasizing the power of open-source models in driving innovation and accessibility. Its superior prompt understanding and potential for multimodal input integration signal a future where AI tools can cater to a broader range of creative and practical applications, making 2024 a landmark year for advancements in this field.

  • @MilesBellas
    @MilesBellas Před 2 měsíci

    Amazing !

  • @AgustinCaniglia1992
    @AgustinCaniglia1992 Před 2 měsíci

    Amazing ❤

  • @JamesStakerWin
    @JamesStakerWin Před 2 měsíci +11

    Let's be real, even a year back we all knew Stable Diffusion was going to eventually sweep the competition off their feet completely and now it looks like they have done it. Leap frog next?

    • @ChrisS-oo6fl
      @ChrisS-oo6fl Před 2 měsíci

      Nope look at the hands. And other anatomy issues. Every one of those photos generated by the clowns have hands like the OG stable diffusion. Something closed source models have gotten over. Not to mention the realism of Midjourney which is something we can’t simply train or fine tune into the model.

    • @helix8847
      @helix8847 Před 2 měsíci

      @@ChrisS-oo6fl But you can and people have trained and fine tuned models that look better than Midjourney for People.

  • @vi6ddarkking
    @vi6ddarkking Před 2 měsíci +17

    Last year I predicted that we'd be able to create comic strips from our AI chats by the end of 2024.
    If this is any indication.
    I was being rather pessimistic.

    • @mystic6121
      @mystic6121 Před 2 měsíci +2

      And it is always crazy to me that even the most optimistic normies is always wildly off.

    • @mystic6121
      @mystic6121 Před 2 měsíci +2

      Not talking about you specifically

    • @vi6ddarkking
      @vi6ddarkking Před 2 měsíci +2

      @@mystic6121 Ho I agree . If we went back in time five years and showed people our current models they think we were from the 2040s.

    • @worldino2390
      @worldino2390 Před 2 měsíci +2

      @@vi6ddarkkingWe had movies (Back to the Future) predicting flying cars in 2015. It's nice to be outdated instead.

  • @shadowdemonaer
    @shadowdemonaer Před 2 měsíci +1

    I really hope they can make it so someone with just under 8gb vram can use this... and that if we can, someone makes a tutorial on how to make checkpoints and loras for it right away. I've always wanted to make something like AnyLora for a model of this quality. I have high hopes for this.
    Two things I wanna see on your live stream (if i manage to not miss it):
    - How much better it does on anime as it currently is
    - How much more consistent it can do hands and feet since it struggles so much. This also involves things like holding hands and clasped hands, which are just harder to do in general
    - If it can do both anime eyes consistently (they always tend to look good only from a distance, but are missing tons of details up close)
    - if multiple characters can be rendered in a scene without it messing up and blending the two together, testing to see if you can ask for specific things on each character to make sure they don't all look like the same person

  • @IceMetalPunk
    @IceMetalPunk Před 2 měsíci +1

    Looks extremely promising; and the flow matching approach (diffusing on the latent space of a Transformer autoencoder) seems to finally be everywhere in the wild. And it's clearly an effective method, given the results of this, Sora, V-JEPA, etc.
    I can't wait for LORAs and custom checkpoints for SD3. It'll be... quite slow on my 8GB VRAM machine, I'm sure, but the results look like they'll be worth it! I've been trying to get an AI-generated album art for my Synthia Nova framework for so long now, but the title of the album is "Synthia Nova: Concert in Silicon", and every single diffusion model to date struggles hard with the text "silicon" (the "ili" part, in particular, has too many adjacent vertical lines, and they all fail at it). If the text generation is generally as good as these examples show -- if the clown images are not cherry-picked and "stable diffusion" text wasn't over-represented in the training data -- then I might finally be able to get that album art!

    • @ICE0124
      @ICE0124 Před 2 měsíci +2

      they did say there will be performance improvements so you and me still can have hope

  • @frazy4487
    @frazy4487 Před 2 měsíci +1

    This is massive. Why are not more people talking about this

  • @BluezJustice
    @BluezJustice Před 2 měsíci

    Man i'm so excited

  • @thefpvmvp
    @thefpvmvp Před 2 měsíci +1

    I call shenanigans! Oscar the dog was obviously added into this video via Ai for added cuteness!

  • @Zonca2
    @Zonca2 Před 2 měsíci +5

    Im extremly worried SD3 will be in some way lobotomized or not understand human anatomy, because all the focus on safety in their blogpost, I guess it could be trained for that later, but the underlaying censorship worsening the quality could be unremovable. AI of any kind performs the best when freshly trained, and worsens when "safety department" gets their hands on it. Kinda defeats the purpose of open-source advantage against all the closed-source ones.

    • @SignumEternis
      @SignumEternis Před 2 měsíci +3

      Yeah, I am worried about that as well. Hopefully community training and tweaking will be able to able to fix a lot of things, but definitely will be disappointing if it's heavily censored. And like you said, it pretty much defeats the whole purpose of why people want models like stable diffusion in the first place.

  • @Glowbox3D
    @Glowbox3D Před 2 měsíci

    That was a bold last statement--especially since it's only February. :) I'd say we'll get a new version of every diffusion model by years end. Time will tell.

  • @petergedd9330
    @petergedd9330 Před 2 měsíci +3

    I've made some negative comments on here re AI being not so good, but I've been using eleven labs for text to speech and it is becoming incredibly realistic with it's vocal inflections and empathy with the written text I'm using, quite mind blowing that I am actually watching it learn and getting better.

  • @VigoHornblower
    @VigoHornblower Před 2 měsíci +2

    When you get access, could you try some of the Dall-E 3 prompts like the avocado and spoon therapist?

  • @arinco3817
    @arinco3817 Před 2 měsíci +2

    *Nothing can top that this week*

  • @conrifor
    @conrifor Před 2 měsíci +4

    Finally,it feels like forever since SD had a major upgrade to its coherency. I’be been using dalle-3 along with SD’s inpainting, but now I can just have everything in one place.
    Have they stated the system requirements for using this model?

    • @IceMetalPunk
      @IceMetalPunk Před 2 měsíci +1

      SDXL is 6.6B parameters, so on parameter count alone, this 8B model will likely run on similar hardware. I can get SDXL to work on my 8BG VRAM machine, but only if I offload some of the processing to the CPU, which of course slows things down. I'd guess the 800M model could run better, entirely on an 8BG VRAM GPU, though I'd also assume the quality is much worse with that one than the 8B.

    • @mirek190
      @mirek190 Před 2 měsíci

      SDXL has 2.3B parameters not 6B like you said.... such model of size 8B will fit in 24 GB VRAM.@@IceMetalPunk

  • @MrErick1160
    @MrErick1160 Před 2 měsíci

    I'm itnerested by this new application of transformers do diffusion methods they're talking about.

  • @Oxes
    @Oxes Před 2 měsíci +4

    these pictures are getting so real it;s hard to tell reality from fiction.

    • @ChrisS-oo6fl
      @ChrisS-oo6fl Před 2 měsíci

      Umm no take a look a better look at the clowns. Dead giveaway in less then 1 second from a mile away. Unfortunately Midjourney is still the only model that produces images that are indecipherable.

  • @RMCanimationOFFICAL
    @RMCanimationOFFICAL Před 2 měsíci +5

    Hype!

  • @scottiewardle
    @scottiewardle Před 2 měsíci

    cant wait to see the sd3 vs sora

  • @marcihuppi
    @marcihuppi Před 2 měsíci

    HYPE!

  • @DiceDecides
    @DiceDecides Před 2 měsíci

    man oh man, truly hard to imagine image models getting better than this!

  • @aimademerich
    @aimademerich Před 2 měsíci

    Phenomenal

  • @George-fw9um
    @George-fw9um Před měsícem

    Hello Matt ... thank you for all the info ....I'm a complete new in ai...I have a dream to make one series of images about one project I have in mind ....I need print quality 4k bec need to print them by 70×100cm ....need masterpiece quality and don't care how difficult interface have all thease softwares....my question is what machine to select...didn't want to learn something that spends my time without the right results ....so what you suggest to start ?

  • @seraphin01
    @seraphin01 Před 2 měsíci +1

    Oh boy once this hits the shelves and guys at civitai start sharing better fine tuned models, this is gonna be mind blowing.. The day stability manage proper hands it's game over for MJ and Leonardo etc.
    Glad to see Stability focused on prompt understanding and such because it's actually key to useful generation, way above esthetic like MJ.
    Thumbs up to Stability and their commitment to open source, that's fantastic

    • @ChrisS-oo6fl
      @ChrisS-oo6fl Před 2 měsíci

      Long way from that! Look at those clowns on the demos. It’s sad because the closed sourced models handle anatomy perfectly now. It’s absolutely insane we haven’t gotten this taken care of yet in SD yet keep moving forward with focus on other things.

    • @helix8847
      @helix8847 Před 2 měsíci

      @@ChrisS-oo6fl You commenting on every positive post... bot...

  • @JimWellsIsGreat
    @JimWellsIsGreat Před 2 měsíci +2

    Is Sora based on its own engine? I would love to see the stills it can create.

  • @Concepts_Space
    @Concepts_Space Před 2 měsíci +5

    ayy so this was the announcement out of left field!! good shit, matt! very hyped for this!

  • @samphelps856
    @samphelps856 Před 2 měsíci

    Brilliance

  • @Ignatowskic64
    @Ignatowskic64 Před 2 měsíci

    Effin hot content! I'm glad I bet on the right horse. Opensource beats the commercial ones. Now it's their move again.

  • @IcyLucario
    @IcyLucario Před 2 měsíci

    Hell yeah, now THIS is what I'm talking about.

  • @bladechild2449
    @bladechild2449 Před 2 měsíci +2

    Thing is they keep coming out with these new things, but they're becoming somewhat stumped without the flexibility SD 1.5 had with easily trained LORAs and models.

    • @user-on6uf6om7s
      @user-on6uf6om7s Před 2 měsíci +1

      Rampant censorship in modern models will keep 1.5 relevant for the foreseeable future. I suspect this model will be effectively useless for NSFW content. My hope is just that the censorship of the data set isn't so egregious as it was with 2 and 2.1 that it forgets how to create a realistic body entirely. XL seems to be better in that regard but they're driving home the safety message real hard in this announcement.

    • @zrakonthekrakon494
      @zrakonthekrakon494 Před 2 měsíci +1

      I’m confused, if it is open source can’t people simply remove those restrictions?

    • @user-on6uf6om7s
      @user-on6uf6om7s Před 2 měsíci +1

      @@zrakonthekrakon494 To some degree, yes, but you can't just retrain the entire model without a ton of money and GPUs. You can use Lora to refine a certain concept but a Lora trained on 50 images of humans isn't going to magically fix a model where tens of thousands of images have been pruned from the data set for being too suggestive. SDXL was left intact enough that there has been some progress on that front but it's still more finicky than 1.5 and 2.x was just considered a lost cause because of how censored it was and has mostly been forgotten. We'll see which way the pendulum has swung with this release but the extreme focus on safety and the fact that they haven't given us a single normal human in regular clothes when they know that's what people will be curious about gives me pause.

  • @user-oz9tf9zp7k
    @user-oz9tf9zp7k Před 2 měsíci

    So this base model looks to be ahead of even the best fine-tunes. That's super crazy exciting. I've got a 3090 so hopefully I can run and train it. And hopefully it trains well (everything about Cascade makes it seem like it's really trainable, but is Cascade now obsolete already because of SD3?)

  • @rayujohnson1302
    @rayujohnson1302 Před 2 měsíci

    My computer has a graveyard of old diffusion models, and the trend is accelerating!

  • @MrErick1160
    @MrErick1160 Před 2 měsíci +1

    Dang, every freaking day tehre is a new 10 times better AI

  • @teawa_
    @teawa_ Před 2 měsíci

    I need this now 😭

  • @Luxcium
    @Luxcium Před 2 měsíci

    I am interested in this one what is the SDXL for (I thought it was the evolution of the Stable Diffusion but I seem to be lost)…

  • @isajoha9962
    @isajoha9962 Před 2 měsíci

    I hope that, image to image will be very consistent adding eg a specific character to different situations. For eg making cartoon strips etc.

  • @xbon1
    @xbon1 Před 2 měsíci +3

    My main concern is how it’ll get anime images. Dall-e 3 is the only one to properly do anime images without needing loras/extra models/etc

    • @alexandrlukanin528
      @alexandrlukanin528 Před 2 měsíci

      It needs to be asked nicely though

    • @ICE0124
      @ICE0124 Před 2 měsíci +4

      its not that hard to find good anime models for stable diffusion right now, there is plenty out there and they are easy to install

  • @user-on6uf6om7s
    @user-on6uf6om7s Před 2 měsíci +2

    Notice how the only humans they're showing are clowns in oversized puffy outfits. With how obsessed SD has become with safety and how much focus it gets in this announcement, I would be surprised if knowledge of human anatomy and the ability to generate recognizable people has been even more censored than previous releases. People will try to add it back in with Lora but that gets more difficult the less understanding the underlying model has to begin with. They clearly have the potential to release something amazing as all the big AI companies do, the question is always how lobotomized will they be to avoid bad press.

  • @caracal4361
    @caracal4361 Před 2 měsíci

    very exciting , finally something good

  • @KunalSwami
    @KunalSwami Před 2 měsíci +1

    How about the problem of generating good faces and fingers? Is it resolved in Stable Diffusion 3?

  • @Zazume_
    @Zazume_ Před 2 měsíci

    SDXL already was a huge step forward in terms of prompt interpretation, but this is definitely on another level.

  • @RegularRegs
    @RegularRegs Před 2 měsíci +5

    yes!! and i finally have a computer that can handle running it.

    • @RegularRegs
      @RegularRegs Před 2 měsíci +3

      well shish... i hope so. well see i guess.

  • @johantitulaer1052
    @johantitulaer1052 Před 2 měsíci

    I really hope SD3 will be coherent enough to use image to image for creating actual comic or manga panels. Can you imagine how cool it would be if creative writers who can't draw get to create comics? Then you use these comics later as a screenplay for creating actual anime with Sora or whatever will be able to handle it. Can't wait

  • @limieon
    @limieon Před 2 měsíci +1

    I really like how SD3 is looking atm but I'm getting confused...
    It started with SD1.5 (the first model that got popular), continued with SD2.1, then SDXL (that was the point where I thought SDXL is like the next major version of SD), not long ago we got Stable Cascade (which I thought was the next major version to SDXL / a version besides SDXL) and now we're getting SD3.
    Are they working on SD, SDXL, Stable Cascade simultaneously or is it like while they're experimenting with SDXL and Stable Cascade they improve on SD and vice versa?

    • @mirek190
      @mirek190 Před 2 měsíci +1

      I think Stable Cascade is experiment to test some things to build SD 3
      Something similar was with Floyd before SDXL.

    • @limieon
      @limieon Před 2 měsíci

      @@mirek190 Would make sense, maybe well find out when the technical details of SD3 are released

  • @nahiddotai
    @nahiddotai Před 2 měsíci

    The prompt understanding here looks second to none, almost too powerful. The rate of AI advancements continue to amaze me

  • @KimSol90
    @KimSol90 Před 2 měsíci

    WOW Just insane.

  • @pietrocuni
    @pietrocuni Před 2 měsíci +1

    Matt you have became the MKBHD of the AI world! You know and try products weeks before the announcement 😂😍

    • @MattVidPro
      @MattVidPro  Před 2 měsíci +1

      LMAO i hope thats who i can become

  • @goyashy
    @goyashy Před 2 měsíci

    This is so cool! Open source for the win!

  • @mikesavad
    @mikesavad Před 2 měsíci +1

    its not that i doubt the abilities, but i only see one selected image. It would be nice to see a row of them to see they just got lucky on that one image.

  • @choiceillusion
    @choiceillusion Před 2 měsíci

    Stable video based off 3.0 gonna be insane. Open source ftw

  • @hamidmohamadzade1920
    @hamidmohamadzade1920 Před 2 měsíci +1

    I love you , I love open source, I love Stable diffusion 😍🤩🤩🤩