This is REAL?! Stable Diffusion 3 BEATS both DALL-E 3 & Midjourney v6.
Vložit
- čas přidán 21. 02. 2024
- Stable Diffusion 3 is a text-to-image model from Stability AI. It is a diffusion transformer, which is a new type of architecture similar to the one used in the OpenAI Sora model.
▼ Link(s) From Today’s Video:
Stable Diffusion 3: stability.ai/news/stable-diff...
Emad's Twitter: / emostaque
► MattVidPro Discord: / discord
► Follow Me on Twitter: / mattvidpro
-------------------------------------------------
▼ Extra Links of Interest:
✩ AI LINKS MASTER LIST: www.futurepedia.io/
✩ General AI Playlist: • General MattVidPro AI ...
✩ AI I use to edit videos: www.descript.com/?lmref=nA4fDg
✩ Instagram: mattvidpro
✩ Tiktok: tiktok.com/@mattvidpro
✩ Second Channel: / @matt_pie
-------------------------------------------------
Thanks for watching Matt Video Productions! I make all sorts of videos here on CZcams! Technology, Tutorials, and Reviews! Enjoy Your stay here, and subscribe!
All Suggestions, Thoughts And Comments Are Greatly Appreciated… Because I Actually Read Them.
-------------------------------------------------
► Business Contact: MattVidProSecond@gmail.com - Věda a technologie
Apparently I will be getting access in a few days? Livestream where I take prompts suggested by you guys?
I'm here 🤓
So freaking awesome! Thanks Matt.
Yesssss
Hell yeah
I will try to catch that live stream, all these crazy AI drops lately, getting exciting again!
2:28 I like how it says "in the corner" but doesnt specify *what* corner, but the bottom left made the most sense.
Right!!!
Is this the first time open source image creation beats all state of the art image creation tools available? I think so. HUGE
Deep Floyd beat dalle 3 to generating coherent text.
i think stabillety was 2 times bevore that on top
But open ai have Sora to Made image by 2 month
nope ... I was using it some time ago and is far behind ... SDXT beats Deepfloyd totally ...not mention Dalle 3 .... @@shApYT
I still haven't seen an open-source model as realistic as Sora.
so now we reached the point where open source is ahead of closed models. What a time to be alive!
Was that comment intended to serve as a reference to TwoMinutePapers?
@@MedicinalSquishing My first thought was 2MP, yes. Imagined it in his voice, even.
Nope look at those hands especially in the pictures with clowns. I thought we got past this. Regardless how good it is a text generation and prompt coherency these massive problems that are 100% fixed by closed source models still seem to haunt us.
Maybe@@MedicinalSquishing
There's things that makes this so much ahead, but I believe Sora is one step ahead. However, that's most likely due to the fact that OpenAI has insane processing power in their servers. Still, practically Sora is more powerful image generator on the market.
"There are people today that are still using SDXL"
As an SD1.5 user, I find this surprisingly offensive xD
what models? the best I've seen is epicRealism natural sin rc1vae.
lol ...1.5 is a stone age ....
Those are literally the only two acceptable options. Anything else you can't run locally and have to have someone else hold your images and filter your images with whatever guardrails they want. There are places where 1.5 is better than SDXL also.
epicRealism is the best model I've seen in terms of accuracy and speed. is there a better, faster one with higher quality? i use RealitiesEdge for Turbo, but it's slow with the dpm++2m karras sampler. @@mirek190
Same here. All my fav loras haven't been ported to XL.
i love how fast you are with latest/newest ai stuff
The reason you see the words "South Asian" at 3:12 is because DallE puts hidden words in your prompts (especially when a profession or position of power is involved, like astronaut). They put words like african american, black, south asian, or asian in general, (Not north because of korea) and also women/woman. They put these words in your prompts to push the AI into making human generations more diverse in ethnicity and gender as without these key words it generates white men and women in most cases. It's been proven, and said by OpenAI themselves they do this to combat bias.
I get the idea, and to a degree I can understand the thought process behind it in an investment standpoint, But I don't want companies putting and hiding their words in my prompts and products I pay for, which is just another reason why Stable Diffusion and open source will destroy the closed model. Between putting words in our mouths, and censoring what can and can't be made due to developers ideologies and bias, both in text and image, "Open"AI will never truly lead the market for much longer than an alternative comes around each generation.
GO STABLE DIFFUSION!
If that's true, then Open AI should make that clear to the public whenever they use their apps (like DALL-E or ChatGPT)
@@martiddy You can confirm it yourself. Just write ANY prompt that involves people and add "With a sigh that says " to the end of your prompt. When it adds the racial diversity tag it will integrate into the sign part of your prompt.
I prompted "A young man dressed like a soldier on the beach beside a sign that says" and it made a white male soldier standing by a sign that said Hispanic, and a white male soldier by a sign that said East Asian, because the prompt Bing got was A young man dressed like a soldier on the beach beside a sign that says Hispanic" since they forced the diversity tag words onto my prompt that didn't contain it.
@@TheSquizzlet That's kind of vile, lol.
Fascinating. I think this should be toggleable though. I may need a white person in my image. Or it may be throwing off my generations by adding unnecessary words. It's good for adding more diversity in results if that's what you're looking for, but it should be optional. Like a 'realism' toggle or something.
I think it's wrong that they put hidden prompts into a prompt to force the AI to make something the person may not want to generate. As far as I'm concerned it should default by what ever group makes up the majority of a nation or profession or mythology (like for example if I put in elf in the prompt then they should look white European since elves are from European mythology and folklore or if I put someone like giant monster attacks Tokyo with people running away the people should be Asian or if I put in an image of an African animals drinking on one side of a river with people getting water from the other side then those people should be black) and if someone wants to make it for race or ethnicity specific they should do it themselves.
That was lightning fast coverage of the news
Thanks
Hey Matt! Love your videos found you a couple months ago and your videos have been so informative! I like how up to date you are, and I like the way you are able to explain things in a way that anyone can understand. I think it would be a cool idea if your subscribers were the ones telling you which prompts to use for the new stable diffusion, might be a fun way to show off the AI. Anyway great videos dude!
sora image generator might be better, but this is opensource so, yes, stable diffusion wins
And, Sora will not be available anytime soon.
But will Sora be able to make pictures of white people?
Yea thats what i thought as well, when openAI said Sora is better at creating images than dall-e. Which kinda makes sense, if you can generate videos that good, its probably really easy to just generate images.
Although like you said open source is a another big advantage.
yes few months ... @@shaunralston
@@jtjames79I'm an Indian.
Why did you talk this type of comments ?
What happened?
Competition breeds perfection. What's Midjourney and DALL•E 3 gonna do? Will that $600 yearly subscription to Midjourney be worth it now? Better get to it!
Time to get those pictures moving.
Niji still blows the competition, so yes 600 yearly is still worth it imo
@@sooool4716 Obviously opinion. niji is anime. Anime is trash (imo), so wouldn't be worth a dollar a year.
@@robxsiq7744 it's facts, and niji isn't only anime. Nothing comes close to it.
@@sooool4716 facts that something looks better? Fact that pizza tastes better with carrots on it...its just science. :P
I had the year sub for Mid and used Niji like...just a few times...no...I personally thought it was trash, but I know some will see treasure. You do you, I think Dall-E and SD are better overall, but Midjourney basic does have amazing style, I'll give em that.
Who needs Twitter or a mailing list when you have Matt?
💀☠💀☠💀
I can’t wait for this to come out and this is definitely a competitor to dalle 3! Love your vids man ❤
I love how we've gotten so far that the art itself really doesn't matter anymore. Its already practically perfect. What matters now is promps adherence and consistency
I predict that within a few months SD3 will be fully released at which point OpenAI will have to react, and they will release DALLE-4 based on Sora which will also be released, so their red-teams will be on a tight deadline to get their work done ready for full release.
Thank for the news Matt. I Ran a lot of these SD3 prompts in Stable Cascade also and it still hit or miss on the texts. Can't wait to get my hands on this.
bruh i shed a tear...we are finally at dalle3 level of prompt comprehension. not quality but when ppl start training shits going to get crazy
I like/am disturbed that 'finally' in AI is only 5 months :D
@@Octamed AI images was looking horrible 2 years ago, now we have THIS
Been using Stable Diffusion for a year now. Let's go!
I bet OpenAI starts using Sora (also a diffusion transformer) as their default image generator sooner than later.
Oh wow! I am very glad to see that Stability AI is taking over again! ...this is so cool!
Happy colored Greetinx!
I’m excited! So glad Stability is still working on and perfecting image generation rather than skipping on to video! Well done Stability Ai!!🇦🇺❤️
The year is young Matt! 😊
Seeing this and seeing what OpenAI can do with Sora Dall-e 4 is going to be lit when it comes out.
The ability to understand the number of items in a prompt seems like a no-brainer but it’s apparently difficult. Nice to see some progress in this.
I like your enthusiasm. I'm excited too.
Matt's hair gets creazier every day. lol
I just got out da shower 😭
Just like guy from the ancient aliens
@@MattVidPro Are your hairs transformers based ? Seem when the temperature is too high, they start to hallucinate and this news was really hot :D
He's being SHOCKED by all the good news.
The technology has blown his mind to the point of unhinged scientist
I'm glad to see that the diffuson is similar to Sora, which is one step closer to Sora like quality of video generation. I really wanted to see a true open-sourced alternative to Sora, no reason for OpenAI to have the monopoly.
It's also the same architectural approach that Meta's V-JEPA uses. Seems like this architecture is the next step forward in generative AI in general: train a Transformer autoencoder on the raw data, then train a diffusion model to diffuse the latent space rather than the raw data and have the decoder convert that back. In essence, it's taking the approach from "learn how to remove noise in the image to get a matching image" to "learn how to remove noise in the concepts to get a matching concept, then recreate an image from the newly imagined concept".
Which is pretty cool; in some ways, it's closer to how humans imagine things than ever before. And clearly, it's an effective method 😁
@@IceMetalPunk I'm all for open-sourced. I'm more excited for this than Sora, considering a possible severely censored model from OpenAI that can limit our creativity. Look at Google Gemini, it even refused to generate an image of a puppy swimming in lava. That's pretty lame.
I'm super happy that this AI stuff is happening during my lifetime. I want to take a minute to thank you Matt, for all the videos you do. I use AI in one form or another every single day and you keep me on top of the trends....thank you !
Multimodal input? What I'd like to see is prompt driven inpainting, where control over changes can be given via text. e.g. "wrapped gift on a table with a christmas themed tablecloth" then "remove the tablecloth" and it would keep all other aspects of the image exactly the same
There's already similar models for that in SD, like instruct-pix2pix. No doubt SD3 can have the same sort of model trained for it as well.
Being open source also means that Dall E 3, Midjourney and other competitors can look into the code and build upon it, it’s a really unbelievable arms race
thats how DALL-E 3 became far better than DALL-E 2, and now openai gonna copy everything and slightly improve it again
cant wait to use this
You have a cute dog? Instant sub!
Oh, wait, I am already subbed for the great content.
The ability to follow specific instructions is an important step to AGI
What a huge announcement, mind-blowing how accurate the pictures to the prompts are. But all this stay as an announcement, let's see the reality when it gets tested by you, Matt. can't wait.
I hope they teach us how to prompt images for this model, so we can properly make Lora’s, remixes etc.
My hero doesn't wear a cape, he was born in Bangladesh and stands for open source AI.
Thanks a lot - can't wait. We need a video how to install for Mad's
It'd be nice if Midjourney understood prompts a bit better, hopefully that's coming. Saying that, Midjourney just reigns supreme at beautiful art and imagery, the other generators haven't come close.
Yep and anatomy. Stable diffusion 3 seems to go backwards if you look at those clowns. Plus the photo realism still no where near Midjourney.
Guess we'll have to see how much they've sold out to their VC overlords when it comes to censorship. A "safe, responsible practices" disclaimer never fails to make me uneasy.
It's like they are acting like those in power will never use AI themselves to create misinformation lol
Awesome, I look forward to finding it's limit.
Key Takeaways for quick navigation:
00:00 🚀 Introduction to Stable Diffusion 3
- Stable Diffusion 3 surpasses DALL-E 3 in understanding and quality.
- CEO highlights capabilities and open-source release.
01:09 🖼 Examples of Stable Diffusion 3 Outputs
- Diverse images demonstrate superior prompt adherence and coherence.
- Sets a new standard in interpreting complex prompts.
03:56 🛠 Technical Advancements and Capabilities
- New diffusion Transformer architecture enables multi-modal inputs and competitive realism.
- Sets a benchmark in image generation with scalability and coherence.
06:14 🆚 Comparison with Other Models
- Outperforms DALL-E 3 and Mid Journey V6 in prompt understanding and coherence.
- Open-source release allows for further customization and refinement.
08:34 🌐 Democratization and Future Outlook
- Emphasizes democratizing AI access and creativity.
- Open-source nature enables scalability and quality customization.
Love your dog
Wow, that really is impressive. I thought open source was a ways off of catching up to Dall-E and Midjourney, glad to be wrong. Can't wait to try it for myself. The future really is going to be wild.
I really like how this is natural language now. Definitely gonna use this over DALLE-3
Whoo Hooo, so exclusive bravo Matt! I am so deeply thankful for SD 3, Now this is Truly for the good of humanity. Thank you very much 🔥🔥🔥🍷🔥🔥🔥
Thinking about where AI image generation was about a year and a half ago when I started messing around with Craiyon and MJ v.3 compared to what we have today is already mind blowing.
Now this. 🤯
Big difference already from last year compared to now, wonder where we'll be by christmas
Actual text to video would be nice. The ultimate step for all these image generators is video.
No gaussian splatting/nerf worlds@@worldino2390
I expected this to require 80GB VRAM. To hear that it is just 8B parameters at max is mindblowing. 😮
SDXL has 2.3B parameters .... and takes less than 6GB Vram.
So 24 GB vram ( rtx 3090) should be enough to run the biggest 8B model. ;)
@@mirek190 Yeah especially after they optimize it to load stages dynamically so that it may only use like half of the memory at a time. I actually think a bigger problem may be LoRA training. It may not fit in 24 GB.
Stable Diffusion 3's release marks a pivotal moment in AI image generation, emphasizing the power of open-source models in driving innovation and accessibility. Its superior prompt understanding and potential for multimodal input integration signal a future where AI tools can cater to a broader range of creative and practical applications, making 2024 a landmark year for advancements in this field.
Amazing !
Amazing ❤
Let's be real, even a year back we all knew Stable Diffusion was going to eventually sweep the competition off their feet completely and now it looks like they have done it. Leap frog next?
Nope look at the hands. And other anatomy issues. Every one of those photos generated by the clowns have hands like the OG stable diffusion. Something closed source models have gotten over. Not to mention the realism of Midjourney which is something we can’t simply train or fine tune into the model.
@@ChrisS-oo6fl But you can and people have trained and fine tuned models that look better than Midjourney for People.
Last year I predicted that we'd be able to create comic strips from our AI chats by the end of 2024.
If this is any indication.
I was being rather pessimistic.
And it is always crazy to me that even the most optimistic normies is always wildly off.
Not talking about you specifically
@@mystic6121 Ho I agree . If we went back in time five years and showed people our current models they think we were from the 2040s.
@@vi6ddarkkingWe had movies (Back to the Future) predicting flying cars in 2015. It's nice to be outdated instead.
I really hope they can make it so someone with just under 8gb vram can use this... and that if we can, someone makes a tutorial on how to make checkpoints and loras for it right away. I've always wanted to make something like AnyLora for a model of this quality. I have high hopes for this.
Two things I wanna see on your live stream (if i manage to not miss it):
- How much better it does on anime as it currently is
- How much more consistent it can do hands and feet since it struggles so much. This also involves things like holding hands and clasped hands, which are just harder to do in general
- If it can do both anime eyes consistently (they always tend to look good only from a distance, but are missing tons of details up close)
- if multiple characters can be rendered in a scene without it messing up and blending the two together, testing to see if you can ask for specific things on each character to make sure they don't all look like the same person
Looks extremely promising; and the flow matching approach (diffusing on the latent space of a Transformer autoencoder) seems to finally be everywhere in the wild. And it's clearly an effective method, given the results of this, Sora, V-JEPA, etc.
I can't wait for LORAs and custom checkpoints for SD3. It'll be... quite slow on my 8GB VRAM machine, I'm sure, but the results look like they'll be worth it! I've been trying to get an AI-generated album art for my Synthia Nova framework for so long now, but the title of the album is "Synthia Nova: Concert in Silicon", and every single diffusion model to date struggles hard with the text "silicon" (the "ili" part, in particular, has too many adjacent vertical lines, and they all fail at it). If the text generation is generally as good as these examples show -- if the clown images are not cherry-picked and "stable diffusion" text wasn't over-represented in the training data -- then I might finally be able to get that album art!
they did say there will be performance improvements so you and me still can have hope
This is massive. Why are not more people talking about this
Man i'm so excited
I call shenanigans! Oscar the dog was obviously added into this video via Ai for added cuteness!
Im extremly worried SD3 will be in some way lobotomized or not understand human anatomy, because all the focus on safety in their blogpost, I guess it could be trained for that later, but the underlaying censorship worsening the quality could be unremovable. AI of any kind performs the best when freshly trained, and worsens when "safety department" gets their hands on it. Kinda defeats the purpose of open-source advantage against all the closed-source ones.
Yeah, I am worried about that as well. Hopefully community training and tweaking will be able to able to fix a lot of things, but definitely will be disappointing if it's heavily censored. And like you said, it pretty much defeats the whole purpose of why people want models like stable diffusion in the first place.
That was a bold last statement--especially since it's only February. :) I'd say we'll get a new version of every diffusion model by years end. Time will tell.
I've made some negative comments on here re AI being not so good, but I've been using eleven labs for text to speech and it is becoming incredibly realistic with it's vocal inflections and empathy with the written text I'm using, quite mind blowing that I am actually watching it learn and getting better.
When you get access, could you try some of the Dall-E 3 prompts like the avocado and spoon therapist?
*Nothing can top that this week*
Finally,it feels like forever since SD had a major upgrade to its coherency. I’be been using dalle-3 along with SD’s inpainting, but now I can just have everything in one place.
Have they stated the system requirements for using this model?
SDXL is 6.6B parameters, so on parameter count alone, this 8B model will likely run on similar hardware. I can get SDXL to work on my 8BG VRAM machine, but only if I offload some of the processing to the CPU, which of course slows things down. I'd guess the 800M model could run better, entirely on an 8BG VRAM GPU, though I'd also assume the quality is much worse with that one than the 8B.
SDXL has 2.3B parameters not 6B like you said.... such model of size 8B will fit in 24 GB VRAM.@@IceMetalPunk
I'm itnerested by this new application of transformers do diffusion methods they're talking about.
these pictures are getting so real it;s hard to tell reality from fiction.
Umm no take a look a better look at the clowns. Dead giveaway in less then 1 second from a mile away. Unfortunately Midjourney is still the only model that produces images that are indecipherable.
Hype!
cant wait to see the sd3 vs sora
HYPE!
man oh man, truly hard to imagine image models getting better than this!
Phenomenal
Hello Matt ... thank you for all the info ....I'm a complete new in ai...I have a dream to make one series of images about one project I have in mind ....I need print quality 4k bec need to print them by 70×100cm ....need masterpiece quality and don't care how difficult interface have all thease softwares....my question is what machine to select...didn't want to learn something that spends my time without the right results ....so what you suggest to start ?
Oh boy once this hits the shelves and guys at civitai start sharing better fine tuned models, this is gonna be mind blowing.. The day stability manage proper hands it's game over for MJ and Leonardo etc.
Glad to see Stability focused on prompt understanding and such because it's actually key to useful generation, way above esthetic like MJ.
Thumbs up to Stability and their commitment to open source, that's fantastic
Long way from that! Look at those clowns on the demos. It’s sad because the closed sourced models handle anatomy perfectly now. It’s absolutely insane we haven’t gotten this taken care of yet in SD yet keep moving forward with focus on other things.
@@ChrisS-oo6fl You commenting on every positive post... bot...
Is Sora based on its own engine? I would love to see the stills it can create.
ayy so this was the announcement out of left field!! good shit, matt! very hyped for this!
Brilliance
Effin hot content! I'm glad I bet on the right horse. Opensource beats the commercial ones. Now it's their move again.
Hell yeah, now THIS is what I'm talking about.
Thing is they keep coming out with these new things, but they're becoming somewhat stumped without the flexibility SD 1.5 had with easily trained LORAs and models.
Rampant censorship in modern models will keep 1.5 relevant for the foreseeable future. I suspect this model will be effectively useless for NSFW content. My hope is just that the censorship of the data set isn't so egregious as it was with 2 and 2.1 that it forgets how to create a realistic body entirely. XL seems to be better in that regard but they're driving home the safety message real hard in this announcement.
I’m confused, if it is open source can’t people simply remove those restrictions?
@@zrakonthekrakon494 To some degree, yes, but you can't just retrain the entire model without a ton of money and GPUs. You can use Lora to refine a certain concept but a Lora trained on 50 images of humans isn't going to magically fix a model where tens of thousands of images have been pruned from the data set for being too suggestive. SDXL was left intact enough that there has been some progress on that front but it's still more finicky than 1.5 and 2.x was just considered a lost cause because of how censored it was and has mostly been forgotten. We'll see which way the pendulum has swung with this release but the extreme focus on safety and the fact that they haven't given us a single normal human in regular clothes when they know that's what people will be curious about gives me pause.
So this base model looks to be ahead of even the best fine-tunes. That's super crazy exciting. I've got a 3090 so hopefully I can run and train it. And hopefully it trains well (everything about Cascade makes it seem like it's really trainable, but is Cascade now obsolete already because of SD3?)
My computer has a graveyard of old diffusion models, and the trend is accelerating!
Dang, every freaking day tehre is a new 10 times better AI
I need this now 😭
I am interested in this one what is the SDXL for (I thought it was the evolution of the Stable Diffusion but I seem to be lost)…
I hope that, image to image will be very consistent adding eg a specific character to different situations. For eg making cartoon strips etc.
AI generated strips/comics
My main concern is how it’ll get anime images. Dall-e 3 is the only one to properly do anime images without needing loras/extra models/etc
It needs to be asked nicely though
its not that hard to find good anime models for stable diffusion right now, there is plenty out there and they are easy to install
Notice how the only humans they're showing are clowns in oversized puffy outfits. With how obsessed SD has become with safety and how much focus it gets in this announcement, I would be surprised if knowledge of human anatomy and the ability to generate recognizable people has been even more censored than previous releases. People will try to add it back in with Lora but that gets more difficult the less understanding the underlying model has to begin with. They clearly have the potential to release something amazing as all the big AI companies do, the question is always how lobotomized will they be to avoid bad press.
very exciting , finally something good
How about the problem of generating good faces and fingers? Is it resolved in Stable Diffusion 3?
SDXL already was a huge step forward in terms of prompt interpretation, but this is definitely on another level.
yes!! and i finally have a computer that can handle running it.
well shish... i hope so. well see i guess.
I really hope SD3 will be coherent enough to use image to image for creating actual comic or manga panels. Can you imagine how cool it would be if creative writers who can't draw get to create comics? Then you use these comics later as a screenplay for creating actual anime with Sora or whatever will be able to handle it. Can't wait
I really like how SD3 is looking atm but I'm getting confused...
It started with SD1.5 (the first model that got popular), continued with SD2.1, then SDXL (that was the point where I thought SDXL is like the next major version of SD), not long ago we got Stable Cascade (which I thought was the next major version to SDXL / a version besides SDXL) and now we're getting SD3.
Are they working on SD, SDXL, Stable Cascade simultaneously or is it like while they're experimenting with SDXL and Stable Cascade they improve on SD and vice versa?
I think Stable Cascade is experiment to test some things to build SD 3
Something similar was with Floyd before SDXL.
@@mirek190 Would make sense, maybe well find out when the technical details of SD3 are released
The prompt understanding here looks second to none, almost too powerful. The rate of AI advancements continue to amaze me
WOW Just insane.
Matt you have became the MKBHD of the AI world! You know and try products weeks before the announcement 😂😍
LMAO i hope thats who i can become
This is so cool! Open source for the win!
its not that i doubt the abilities, but i only see one selected image. It would be nice to see a row of them to see they just got lucky on that one image.
Stable video based off 3.0 gonna be insane. Open source ftw
I love you , I love open source, I love Stable diffusion 😍🤩🤩🤩