The Open Source KING is BACK. Stability's NEW AI Image Generator!
Vložit
- čas přidán 3. 06. 2024
- Stable Cascade is an image generation model that can create new variations of an image while maintaining its style and composition. It is a text-to-image model that is fast and high quality.
▼ Link(s) From Today’s Video:
Stable Cascade Github: github.com/Stability-AI/Stabl...
Thibaud's Twit post: / 1757370745900937441
Stable Cascade 1 Click Launcher: / 1757457604781978091
Try Stable Cascade for free: t.co/eychPLlXNS
► MattVidPro Discord: / discord
► Follow Me on Twitter: / mattvidpro
-------------------------------------------------
▼ Extra Links of Interest:
✩ AI LINKS MASTER LIST: www.futurepedia.io/
✩ General AI Playlist: • General MattVidPro AI ...
✩ AI I use to edit videos: www.descript.com/?lmref=nA4fDg
✩ Instagram: mattvidpro
✩ Tiktok: tiktok.com/@mattvidpro
✩ Second Channel: / @matt_pie
-------------------------------------------------
Thanks for watching Matt Video Productions! I make all sorts of videos here on CZcams! Technology, Tutorials, and Reviews! Enjoy Your stay here, and subscribe!
All Suggestions, Thoughts And Comments Are Greatly Appreciated… Because I Actually Read Them.
-------------------------------------------------
► Business Contact: MattVidProSecond@gmail.com - Věda a technologie
Thanks for covering our work, thrilled to see how our research gets adopted this way. Also, I still find it hilarious that "Würstchen" stuck as the name of our architecture. Sorry in advance for all non-German speakers who break their tongues while trying to pronounce it.
Wow never knew tounge could be broken..must be a bony tounge
I'll just call it Worse Ten.
Small sausage?
@@CryptoTonight9393Yeah, small sausage... that's the translation. It's actually hard to find an English sequence of characters that sounds remotely like "Würstchen"... the "ü" being an umlaut of "u" which isn't used in English, and the "ch" is a single phoneme as well... imagine "k" but speaking it softly... in a way "ch" would be to "k" what "f" is to "p".
The biggest problem I have is duplicating a face I've created in different poses. It's infuriating.
Great video as always, Matt. Very happy to see this new model. I got my first job using stable diffusion and video diffusion 1.1 last week. Very happy to see the new model.
Trying to wrap my head around how it can get a 1024x1024 image from 24x24 o_o
I really REALLY want to see Stability's models pull ahead of the competition soon! I hope the (supposedly) easier training times can allow Stable Cascade to reach Midjourney's level of detail somehow.
It probably can, this is only the base model, it is very general so it can probably do a lot better than SDXL when finetuned, and SDXL can achieve Midjourney level of detail in some circumstances (like in Fooocus using certain styles and settings).
Reminds one of the quants.
Just wait till they figure out how to encode the image in subpixels 😂
1024x1024 encoded to 0.2 x 0.2 pixels
@@kuromiLayfe You can actually escape the pigeon hole limit by just setting the font size to 0.
Lol
"Würstchen" is german and the translation could be "small sausage" 😂
ah..
always funny to hear ü, ä und ö in english, in poland its easier to do a "smaller" version of a word, like wodka is small woda (water)
I watched this whole thing mainly because of Matt saying "Würstchen" multiple times throughout this video 😁
Haha
I translated it from german, and it translated to 'hot dog'
15:20 even though no mustache, there's something about the quality that's really soothingly satisfying I think!
my boy!!!!!! whats good matt! just been sick recently and i have been away from yt as usual. im here now though, amazing video its looking like and i cant wait to get my popcorn and watch
Even the text kerning was basically perfect. 😯
As a German speaker that's a really funny architecture name, literally just means sausage 😅
Sausage AI™
Haha
*The trivialization of sausage, to be more precise.
I used to work at a german pub called Wurst. Closed during the pandemic.
King's back.
Emperor Pigeon is back (me)
@@SW-fh7heit's subjective
@@SW-fh7hestop boofing monster energy
@@hipjoeroflmto4764 what do you mean?
The king never left. 😅
Thanks for these videos! I learn so much from them, keep it up!
Würstchen is pronounced Vürst-yen. V as in view, ü like the u in lurk, st as in stash and yen like the currency.
Americans never give a fuck about how names and words from other languages are pronounced.
I think this is more focused on efficiency and speed, which means things like animation and video (using similar methods) is going to be much more realistic. As currently the static models are being sort of shoehorned into animation workflows.
Their video is insanely realistic. Been beta testing it for a few days already.
Anyone else get the feeling that we're hitting diminishing returns with what's possible using the current NN architectures?
Yes. But I think there is a clear movement of capital and intelligence towards advancement in other areas of AI
Other archs have been researched, code released, work is happening on them. Transformers may get left behind eventually, this ride still has a long way to go.
@@blakecasimir Right, I agree. It's just a bummer that we may see another protracted plateau before getting something genuinely revolutionary to use within a commercial context (i.e better than humans). The Transformer arch is so close and yet so far away.
@@GearForTheYearYou are right in terms of image fidelity/aesthetics. It won’t get any better than midjourney v6. However prompt understanding and following is still not optimal. DALL-E 3 shows that it can be much better still. The problem is the training data. They lack more concepts than they provide. You can’t create truly creative images because for example there is no training example of a horse riding a human - so it can’t do it at all.
It’s not just the limitation of the architecture. Lot of it stems from the limitation of our language itself. We train and guide these models by using natural language however words are not sufficient for pinpointing an exact image you are looking for. One picture is worth more than thousand words and and using just few sentences as prompt will only get you just general image that could look okay but not exactly what you want down to nuance. Even if AI becomes smarter than humans, it still cannot read your mind and have only your words to go off of. Words carry too low-bandwith of information and only breakthrough I can think of is when we are able to upload our mind and thoughts directly to AI.
Sick! Been hoping they'd come out with something to compete with Midjourney and Dall-E. I love Dall-E 3, but I get so tired of getting "prompt blocked" with prompts that have nothing offensive or copyrighted in them. Wasn't aware of Pinokio either, so I'm excited to give that a try. Thank you!
Good Job Matt!! Truly Exciting... We need more competition so, the subscription price will go Lower! ✨✨🤟✨✨
I just did a quick text test. Wow, perfect on the first one, but then not so great on the follow ups.
Just awesome.
I kinda lost interest in text-to-image for a while. It isn't reliable enough to use in commercial applications yet (imo), and it didn't feel as competitive as text gen where almost every week there was news.
Nice to see open source text-to-image making progress towards catching up to the state of the art in this field.
Open-source isn't catching up with gpt-4, gpt-4 is still costly, gpt-5 tier doesn't exist. Overall, pretty meh too.
Matt I just had or still have covid need to retest but this video made me feel good
Perhaps but image generators use Convolutional Neural Networks and Transformers are for sequential data such as text. So, I assume huge improvements will be realized with both types of models and whatever improvements are made to them. It may seem more subtle because they are already great, but the will be faster, more controllable, more efficient, and integrated into useful apps.
I was starting to lose hope, but here they are! And with a focus on cost efficiency too! I hope it has backwards compatibility with 1.5. I have way too many loras of it stored up.
All loras are tightly coupled with base models, nothing will be compatible with sd 1.5 ever.
i would love to make consistent 16-bit style video game character sprite sheets
This model is non-commercial but if you want to make free games...
Nah i dont care for non commercial, more of a personal project to achieve, go have a look at wwf royal rumble sprite sheets for example. One sheet thats of one character, Walking running jumping punching kicking etc.
Awesome, glad to see SD keeping up. 1.5 is still relevant from the community, hope to see something like this treated the same way.
This came at a time we needed it most
It's an interesting concern, especially with the rapid evolution in AI. While Transformers have indeed been groundbreaking, the tech field's nature is to innovate continuously. Who knows, the next big breakthrough could be just around the corner, rendering today's limitations a thing of the past.
This was something I looked for a few days ago, since I am tired of SDXL being pretty bad compared to Dalle and Midjourney. Especially SDXL's extremely deformed hands and feet. So I checked Stability for news and saw nothing. Then your news dropped. Thanks. I just got excited about open source AI again.
Sont get your hopes up. This is not the model that will rival mj. Next ine probably will ( but mj will already release v7 till then )
I can’t wait to see what the trained models of Cascade end up producing later. Heck I say later but someone will probably have trained model by end of week or something with the current pace of things lol.
Mage and Leonardo will probably implement this model soon as possible.
I have actually been beta testing their video generation, which is absolutely amazing compared to anybody else even Pika. I also was able to ask for extra credits and they gave them to me because of the project that I’m doing with their video so I’m super excited.
Updated Forge UI is out too!!!
Well I'dk what that is so yes matt should make a video
I thought you meant a new update, with the ControlNet fixes but it's the one that's been out a few days. 😞
Which one is forge? Hard to keep up. Not sure i have used it.
@@abandonedmuse Search for SD Webui Forge.
Matt, I absolutely adore all your videos, but 42 is not orders and orders of magnitude greater than 8, it is barely half an order of magnitude!
That's more than two orders of magnitude in binary though.
This comment is half an order of magnitude more accurate than the subject matter!
11:50 it's easier to finetune this way than starting from a model biased towards photorealism
Does this mean it will require less VRAM to use? My 3070 struggles with SDXL without setting up various parameters and such to make it work and then it takes a pretty long time to generate an image.
I've read something on reddit about needing more instead
Think pretty much the same amount.. the concepts of this is similar to running a workflow in comfy that generates an image at 256x256 then does image to image with a upscale to 1024x1024 and then once more to detail the final sampler output.
Are the images commercial free to use?
Thanks for sharing!
With the same prompting, you can get better images (not definitive testing, just a couple of tests) than SDXL (NightVision XL), the images have a HDR midjourney look to them.
It will be better than Midjourney. 16x training performance + open source = magic
Wow, and in Pinokio already??? Love that!
Wow. Never heard of this before.
@@jeffwads I think he made a video about it... pinokio allows you to run AI tools on your PC without the hassle of installing complicated stuff, it's truly gamechanging. But you'll need a good GPU with a lot of vram (I went "cheap" by buying a used 1080ti, and 11gb of vram seems to be enough for what I do... for now).
Nice
Got it running on Windows (command line). It has to be possible to make it run in Comfy, but it would take some work.
Elon needs to take over !
"Robin Rombach, Andreas Blattmann, and Dominik Lorenz essentially created Stable Diffusion while at a German university. Stability AI got involved after the publication of their research and offered them the company’s computing resources. According to Forbes, all three have now left Stability AI which is also experiencing cash flow problems."
- Petapixel
What specs a pc should have to be prepared to run a SD model relatively fast? Is all about the graphic card?
Happy Valentines day 💓
bro this is crazy, looks like it'll blow midjourney out of the water once it gets in the hands of opensource trainers for a few more months down the line.
Exciting news!!
I'm sorry to say, but with the endless possibilities now available with Midjourney's --sref feature, I think they ran away with the crown. What's possible now is absolutely mindblowing.
Can it handle compoond nouns yet? How about magnet fishing for example?
Looks a lot better.
Hey, Matt. Do you know any A.I. that makes Cinemagraphs?
I think "Imagen 2" can do that.
I am curious, why did it take so long for implementing the Würstchen tech? This was shown by the actual people behind Würstchen last year.
There's a way easier way to do this. You just loop a clip the length of each notes phase. You do this and extend the loop out till it merges back in and you do this for all of the notes then you ctr+j to consolidate it.
From my testing, SDXL Turbo is utter garbage 💩 🤮.
I'm looking forward to Cascade
I didn't like it either, although I really tried.
Garbage how? It just needs tweaking to reach its potential.
@@aouyiu
The quality of the images is like that of Midjourney 2 based on my testing.......utter garbage
Hi Matt you can test the LLaVA 1.6- 34bit demo llm vision assistant,
Not sure if I'm just spoiled by community-finetuned SDXL models and Fooocus, but I'm not terribly impressed by what I've seen so far. But then again I was initially underwhelmed by SDXL as well.
What keeps me interested is the possibility of much more efficient finetuning compared to SDXL, but it might take a while for tooling and fine-tuned models to become available/usable.
Of course when I just uninstalled Pinokio to make room for more checkpoint models! lol Hope someone ports it to Comfy in the next few days!
Interestingly, at 11:07 when the picture of Barack Obama comes together, at times it looks a bit like Alfred E. Neuman from the Mad magazine.
I think you don't realize, this means opensource totally won today. just need to do this with language models too
You haven’t seen anything yet :)
Meta might get us that, maybe sooner than you think now that Gemini is officially competing with ChatGPT.
Miqu is getting there... It's not gpt4 level but it's definitely better than 3.5 all around, nearly as good as Gemini Ultra... And it's 70B 😂 It's coming!
I really hope that playground AI picks this up.
I also feel mppy inside lol
The Stable Zero123 model still has and the Stable diffusion video had the same limited licence during it's experimental phase.
So nothing new here.
Still being vigilant is always the way to go.
Do we have any idea based on past experience how long that licence will be limited? Are we talking weeks? Months? Over a year? 😮
@@starblaiz1986 Once Version 1.0 releases usually it bounces to the new fully open source licence.
This video makes me happy for the future.
I ami hyped !
Can this model will be used in automatic 1111?
How many free prompts in a day do you get in the free plan of stable cascade?
Always appreciate your being on the cutting edge of OS reporting, Matt.
yoo this is so exiting i love open source :D
unfortunately it takes like 30 minutes to generate a photo locally on my 3060 with pinokio
xD
Updated pinokio now it takes like 15 minutes
Why does it take Stable Cascade several minutes to generate an image with my RTX 3060 12GB? No problems with Stable Diffusion etc.
tried it, but idk dalle 3 give me a lot more specific and good results
My honest reaction was: "Oh no..." 🤣
I'm really trying to catch up with everything, but oh boy, it's hard
Soon In SD5... For my kids, Remake this folder of movies to take out all the non wholesome parts.
For example, in Bambi the mother doesn't die, no one is in life danger, they all meet happily in the end. In the lion king, Mufasa and Scar are good friends and Simba is raised with his Dad. Ariel doesnt loose her voice. Remove nightmare fuel from Pinokio and Dumbo, etc etc etc etc etc.
Generate new wholesome scenes, keep characters and style as the originals, voice with 11Labs.
We will actually be able to give nice content to our kids, without passing any horror from the hydra studios.
wow, just wow
You mention Krea, and Krea uses SDXL under the hood, so I wonder if you have found a way to get Krea or Magnific results but for free using comfy or a1111? I actually wonder how come no one is even trying to do it……anyways, great video!
Where can we use this?
The question I have, is, as always, how does it handle censorship? What happens if you give it a prompt that many AIs will label as NSFW, and will not render?
It seems to just ignore those parts of the prompt. I couldn’t even get two mechs to shoot at each other.
God: "walter white eating a big mac inside of mcdonalds, there are blue crystals in the big mac burger, walter white is dressed in a yellow hazmat suit"
Dall-E: "Even though I am just a tool and don't have a soul; I will pretend I have one. Therefore, I cannot do what my master commanded me to create, even though I'm fully capable of doing the job."
God: "Kicks Dall-E from the heavens; Downloads Stable Cascade!"
OH my god...
Talk about seeing something unexpected when opening CZcams
curious why they didnt show a benchmarking with MJ
Just trying it . Not full test but generating text seem OK
try photo taken on Fujifilm XT3
Honestly, I have a really interesting Question @mattvidpro. What is the relation between You and Lemon?
stability ai are the best!
Würstchen? Um… little sausage? Hot dog?
It's a bit slow one minute. 40 Seconds on a 3060TI. But as you said, it's FREE.
Can this run in Forge WebUI?
2:39 the images have been (image: UD, LR (UD 3, LR - { } 2, 5),
Close, but not even 1 order of magnitude, 1 if we round up.
People: 1980: we will have flying ca-
*literally 2024:*
niccee, going to check right now!
Stability AI is cool....
Open Source FTW.
Open Source means everyone is a winner.
Is it available to use right now ?
Yes!
Nightshade is coming.
I'm running it locally and it's far slower than sdxl for some reason, the web demo works better. Also the results are clearly inferior to dall e 3 so there must be some setting I'm missing. I'd say one can skip it until it's in the hands of someone that can run it to satisfactory levels
sadly not as good as dall-e 3 but... it's a huge improvement. prompting is so manual compared to DALL-E 3 lol
It's over for Midjourney and OpenAI.
It’s crazy though open ai just released Sora yesterday, way ahead of anyone else on ai video
Requires 20gigs of VRAM though. That will eliminate most people.
As a german, I have to admit, they did y'all dirty by calling an international used software (or at least part of it) "Würstchen" 😂😂😂 ... It means small sausage if someone is wondering.
creating something from nothing by spells, is it Harry Potter in real life? It's a magic!
I tried Stable Video Diffusion and it blew chunks... I went back to using Pica.
And Pica is really.. not.. great.
All of AI video is still in the early stages, like ChatGPT 1 stages. It will be where images are now, in a few years. Maybe sooner.
What are the Hardware requirements for running it local?
coo 💀