My funniest experience with Gemini pro was, I asked it to make a humorous image of a cartoon cat pulling the toilet paper off the roll. It told me that ethically couldn’t because the cat could ingest the toilet paper and it could cause an intestinal blockage 😂
@@MilkGlue-xg5vj haha yeah that would be a real nuisance. But then again, that’s one smart cat. What other potential could that cat have?? 🧐
Gemma 7b makes you realise how much compute Google is using just to output sorry I can't fulfill that request 🤣
@@markjones2349 you're talking as if the point of uncensored llms is fun rofl lmfao xd you're just makin' it funnier 🤣
Well, I'm never trusting benchmarks without personal testing again.
welcome to real life. Can't wait for you to leave the fantasyland bubble all these tech aibros have built around you.
I'm pretty sure they lobotomized it in the alignment phase :)))
To the point they took the lobotomy fragment and used it in place of the brain, and trashed the actual brain. Not only on models, but on personnel probably
You can't gimp the model with excessive censorship, and also have an intelligent model.
These are not open models, these are woke models, appropriately liberal.
@@madimakes No, the censorship sucks up so much of it's thinking there's little left to actually answer. You can ask the most banal question but it sits there thinking long and hard about if there's any way that could possibly be offensive to the woke? Considered the woke are offended by everything, that's a yes, so it has to work its way around that, then it needs to figure out if it's own reply is offensive (yes, everything is), so it has to find a way around that as well. Often it will fail and say "I'm afraid I can't do that... Dave." Other times it will try, but the answer so gimped and pathetic you'd have been better off asking your cat.
Exactly
Model network design is gimped from the creator developers itself when head of Google AI is literally have biased ideological, anti white and very censorship values all proven with their online records that's why those biases reflect onto the model
This is entirely speculation on my part, but I am guessing Google’s AI effort is largely driven by their PR team. A proper engineering team would never release this kind of smoke and mirrors crap. Right?
They have tarnished their brand. It will be interesting to see what happens in the next few years with regard to Google. (I do not have any financial interest in google).
Or engineering team knows this will be killed off regardless of quality or popularity so why bother.
No, Left. They are all one viewpoint at Google and have been so for decades. The PR folks represent the programmers and their programmer-managers and Sr. management.
If google keeps messing around with their censored models and under performing open source models, they'll get left in the dust. Mistral could end up way ahead of them in the next few months. They should find that embarassing...
Gemini Advanced is bad, too, compared to gpt4. Gemini sometimes answers in a different language, too cautious, and gets things wrong a lot of the times.
@@CruelCrusader90 Genuinely. If it's not a question about software development there's a wildly high chance that it'll start quizzing you on why you have the right to know things. I do hobby electronics and wanted to see how it would fare on helping make a charging circuit. It basically refused. Same is true for rectifiers. Too dangerous for me apparently lol. Ask it questions on infosec and it'll answer fine though. It's wild.
@@veqv lmao it refused, all anyone has to do is release a competely uncensored model and they literally take over the industry from their house. I dont know why google is such a fail at every product launch.
@@veqv yea i had a similar experience. i asked it to generate a top front and side view of a vehicle chassis to create a 3d model in blender. (for a project im working on) it said the same thing, its to dangerous to generate the image.
i didnt expect it to make a good/consistent vehicle chassis across all the angles but i was curious to see how far it was from making it possible. and i dont even know how to scale its potential with that kind of a developer behind its programming.
even a one would represent progression at its slowest form, but that would be generous.
Bad doesn't begin to cut it. At this rate, Google will become irrelevant in most of it's services. It makes no difference how much money they have, their policy is wrong and the AI models show it. They are so scared of offending someone or being made liable that their AI actually dictates what happens in the interactions with the users. That doesn't just make it annoying and time wasteful, it means that it cannot learn.Even worse than not learning, it's becoming dumber by the day. I cannot believe i'm saying this, but i miss bard. Gemini doesn't cut it in away way, shape or form. It's probably good for philosophy exercises, but so far I don't see any decent use for it aside from that. Give it enough space to go off in wild tangents and you may get a potentially interesting conversation, but don't expect anything productive from it. I'm done with trying out Google's crap for some time. Maybe in a month or two I will allow myself the luxury of wasting time again to see how they are doing, but not for now. Their free trial is costing me money, that's how bad it is.
Plot twist, Google was so far behind the AI race that they had to ask Llama or GPT 4 to create a model from scratch and this is what they named Gemini / Gemma.
google is so far behind these days, I love Google's design language tho, but their tech ? meh
Google’s NEW Open-Source Model Is so BAD... It SHOCKED The ENTIRE Industry!
Google set the entire OS community back a half hour with this troll release. well played google
Don't worry, Llama 3 will set the Open Source community 31 minutes ahead lol
I was absolutely paralyzed by the performance of this model.
Me: I send Pikachu GO! Use STUN attack on Greenthum6 NOW!
Pikachu: Pika Pika Pika!!! BBBZZZZZZZZZ ⚡️⚡️⚡️⚡️⚡️
Me: Greenthum6 seems to be in some form of paralysis. Quick Pikachu follow that up with a STUN attack on Greenthum6 NOW! Give him everything you got!!!
Pikachu: PIKA…. PIKAAAAAAAAAAA……. CHUUUUUUUUUUUUUUUU!!!!!!!
BBBBBBBBBBZZZZZZZZZZZZZZZZ ⚡️⚡️⚡️⚡️⚡️⚡️⚡️⚡️
Greenthum6 = ☠️ ☠️☠️
Me: Aaaahhh that was nice, I’m sure Greenthum6 will make a nice pokimon to my collection 🙂. **I throw my pokiball to Greenthum6 and it captures him as my new pokimon to my collection**
This shows one thing: We need other kind of benchmarks.
But great video Matthew, thanks!
Deepmind has done some pretty amazing work in the machine learning space. My bet is that they created a fantastic model and that's what was benchmarked. Then the Google execs came along and "fixed" the model for "safety" and this is the result.
@@MM3SoapgoblinDeepmind should spinoff from Google. It's a shame that they still run under the now Google giving their amazing works in the past
This looks like a hastily completed homework assignment by a student to meet the deadline
On a bright side, we have a top end model to generate reject responses in the DPO
@@user-qr4jf4tv2x I believe DPO in this context stands for "Direct Preference Optimization" which is a recent alternative technique to RLHF, but with less steps and thus more efficient.
I'm actually not 100% sure, but I believe the joke here is that if you try employing this model for DPO to "align" any other base-model, what you get is another model which only ever refuses to respond to anything.
@matthew Berman I think something is wrong with your test setup. I tested the `python 1 to 100` example with Gemma 7B via Ollama, 4bit quantized version (running on CPU) and the model did just fine. Check your prompt template or other setup config.
Hey Mathew, it's not open-source model because they are not releasing the source code. It's open-weight or open model.
But... they did? At least for inference, they uploaded both python and cpp implementations of the inference engine for Gemma to github. Which I suspect have bugs since I can't otherwise understand how they can release a model that performs this poorly..
Until Google spends less time on woke and more time on work, I'm not touching any of their products with a 10-foot pole
At the moment, there is a couple of issues with quantization and running the model in llama.cpp (LM Studio uses llama.cpp as backend), so when the issues are fixed, I'm going to re-test the model. That's because is weird that the 2b model gets better responses than the "7b" (really is more like 8.something) model.
It’s like asking an undercover alien to explain normal Earth things. No.
I'm wondering if this is technically half open-sourced given some critical components aren't available from Google.
The 2B-Version of Gemma is quite good for a 2b model actually. The 7b model is - a car crash.
I found the same, the 2B model is much better than the 7B for my set of tasks.
Instead of Artificial Intelligence we got Genuine Stupidity
The thing about Gemini is it has the memory of a goldfish it can barely hold on to any context and you always have to tell it what its supposed to write
Could you try lowering the temperature? The answers when you were running it locally look a lot like what I'd expect if the temp was set too high.
OpenAI: "So why do you want to leave Google and come to work with our dev team?" Dev: *shows them this video*
Imagine if Ed Sheeran released that video of DJ Khaled hitting an acoustic guitar, and said "This is my latest Open Source song". Yep. That's this.
That was actually really funny. The answers are so out of the blue Mannn
Please do fine-tuning based on private data
Looking at those misspellings and odd symbols all through the code examples, it's clear to see something is mis-tuned in the params for whatever ui you're using not being updated to support this new model. Apparently the interface I was using it with has corrected this as I was able to get coherent text with no misspelling but I did see people online saying they were having the same trouble as you, incoherent text and obvious mistakes everywhere. It's likely something wrong with the parameters that must be updated to values that the model works best with.
The settings on Kaggle may help- This widget uses the following settings: Temperature: 0.4, Max output tokens: 128, Top-K: 5.
Yikes google! 😬
Yeah - I was running this yesterday and ran into the same things - as well as the censorship, where it decided that my "I slit a sheet" tongue twister was about self-harm and refused to give an analysis.
I like that you tried it on Hugging Face, cause now I can say with certainty: "Google, why?"
Each parameter is just a floating point number (assuming no quantization) which takes 4 bytes. So 7b parameters is roughly 7b * 4 bytes = 28gb, so 34gb is not that surprising :)
Maybe it was a spelling error by Google: "State of the fart AI model". Yeah this model stinks. Yeah I am exhibiting a 14-year old intellect.
The safeguards of not just Google but most of these corporate models are ridiculous and history will look back on them quite unfavorably as unnecessary garbage and a significant hindrance on people attempting to work creatively.
16:00 - JFC ...this model is just horrible.
20:25 - "...the worst model I've ever tested." Crazy - why would Google release this?!
I think you didn't use the right prompt format. It is an error that a lot of people do with open-source LLMs.
Have you noticed that chatgpt4 is very bad in the last few days? Like it can't remember more than about 5 messages in the conversation and it constantly says things like "I can't help you with that" on random topics that have nothing to do with politics or anything sensitive. It's like they've got the guardrails dialed to randomly clamp down to a millimeter and it can't do anything useful half the time. I have to restart the conversation to get it to continue.
just to ask, how do I get the lastest version for linux, when it is just updated for windows and mac, but not linux?
does LMstudio work with wine?
The only Google AI branch I stll find credible is DeepMind. I hope they don't ruin it as well.
See, when you save the word SHOCKING for when its actually SHOCKING, its WAY more impactful & doesnt sound like you are spitting in the face of your community.
Great video! Their half open sourced LLM is hilariously bad
Hi, it seems that Gemma doesn't like repetition penalty at all. In your settings you shoudl set it to 1 (off). In LM studio, Gemma is a lot better that way, otherwise it's practically braindead.
And about the size of the model, it's an uncompressed GGUF. GGUF is a format but can contains all sorts of quantization. 32Gb is the size of the uncompressed 32bits model that's why it's big and slow. There are quantaizations now and even with importance matrix.
The TrackingAI website by Maxim Lott measures the leaning of various LLMs and they're all pretty much what we'd call "politically left" in the US. Which ... I'm not trying to make a thing out of it. There are plenty of reasons for it that aren't conspiracy and Lott himself would be the first to say them.
However, seeing that reddit post about "Native American women warriors on the grassy plains of Japan", I wonder if maybe it had been deliberately encouraged to promote multiculturalism in all answers regardless of context.
A google exec spoke at an AI conference I went to recently. He was talking about models and how, if you train them on the entirety of the information available on the internet, they become very "conservative". He said confirmation bias is a huge problem. The proceeded to tell a story about how he tested two models, theirs and an un-named competitor, by asking it to say 5 things white people could do better. They both proceeded to name 5 things and he said stuff like "recognize your privledge, great. These are good things". Then he said he asked them to name 5 things black people could do better. And to his shock, they both named 5 things. The example he gave was "recognize the quality of life that western culture has given you". And he declared "How outrageous that it would say something like that. Talk about white supremacy confirmation bias." Then talked about how they "fixed" their models to only give "culturally appropriate" responses.
Deepmind has done some amazing work in the machine learning space and I have a lot of trouble believing that this is what they created. I bet they created a fantastic model and that's what the benchmarks were done against. Then the executives "fixed" the model into the useless thing it is right now.
Google is having it's Blockbuster Video moment - this is embarrassingly bad
Why does it have to understand the context of "dangerous"? Why does the model need to be censored? What children are running LLM's on their desktop computers?? What are we even talking about? Is nobody an adult?!
"AI will probably most likely lead to the end of the world, but in the meantime, there will be great companies." ~Sam Altman, CEO of OpenAI
0:04 Absolutely! This is the beauty of diversity in the mathematical world. While 4+4 equals 8, the operands being 4 doesn't mean their identity cannot also be 40. Y'all have to respect the diversity.
Google STUNS Gemma SHOCKING everyone
I've found the trick with models like Gemma, when you add this system prompt it gives more accurate results. THE SYSTEM PROMPT: "Answer questions in the most correct way possible. Question your answers until you are sure it is absolutely correct. You gain 10 points by giving the most correct answers and lose 5 points if you get it wrong."
At this point just use GPT 3.5 or Mixtral why bother with their idiotic model
@@h.hdr4563 Techniques such as that can help improve responses from any LLM.
have you seen the 26 principles of prompt engineering paper... ?? Its very interesting... works across LLM's too... although the better the LLM I think the less of an improvement there is, compared to the base model without a system message.
Massive layoffs at Google next week..
The size is because of the quantization, the same model with 8 bit much less in size
Google had to innovate on the context size. It was the only way the model could hold all the censorship prompts in its memory while responding to queries. That's also why it's so slow.
imho 😂
maybe Google's plan to avert the AI apocalypse is to release models so bad that they can never develop AGI
Hard to believe for a company that have massive resource to produce this underwhelming model.
Microsoft beat Google at AI
Gemma... its says so in the name, its Gemini without the i part... intelligence.
Why "Open Source?" Free labor. Don't worry, as soon as they get what they want, they will take what was learned from Open Source, and put it in their private models.
FYI this model is available on Ollama (0.1.26) without the hoops to jump thru, One more thing they also have the quantized versions. I found the 7B (fp16) model bad as you say but for some reason was much happier with the 2B (q4) model.
You need to reupload this video. You used a broken verison of the model... Gema is much better than what you experienced. It can even easily write snake in python despite being less than 13b and a non coding model
actually I redid the tests using Mathew's exact questions and the results his experience with the model. Either LM studio is using wring chat template, or the settings are off or the gguf is broken . I have a gist with the code I used that I can share, but it seems that the comments with links gets deleted.
The fact google charges and doesn't link to Google accounts and services caused me to delete my free account immediately.
2 free months of Gemini? No thanks, cancel immediately.
You should add some politically incorrect questions to your usual ones after this week's drama
Its kinda bad right? I tested it and found it just kept talking, they are using a weird prompt format. and it just keeps talking
This video needs a laugh track and some quirky theme music between sections. I was LOLing and even slapped my knee once.
Once again, another great video. This is my fav AI channel.
Hi Mathew. Thanks for testing . , I just posted a comment about a test I did using your questions and showing different results to your test when using not the gguf (I included a link to gist) . Was my comment deleted because it contains a link ? happy to resend you the link to the gist. P.S: actually even the 2b model gives decent answers to your questions
I am actually disappointed that you did not address the multiple comments pointing out the the flaws in your. testing. I thought you would retest the model and set the records straight.
Today's News: World's largest advertisement-delivery company releases terrible AI model.
"A diverse group of warriors..." Ahh feudel Japan, that bastion of diversity. GWGB.
This does not match the performance seen on hugging chat at all, you should issue a correction
This episode was like a Jerry Springer show, I couldn't stop watching
I think there were problems with the model files. The ollama version also had problems but they apparently fixed it now.
I guess this must be what they call AGI. These answers are far beyond human comprehension.
additionally, that formula to count to 100 is jibberish. for every number in the range of numbers 2, 98, and +: print number. Range can't accept 3 arguments and + is not a valid argument.
Google is really making a name for themselves in AI. They’re pretty good at this….
Consider how bad it is, now imagine using a quantized version in 4 bit, how much worse can it go.
The killer app of just regular $20/mo Gemini Advance is that it has 128k token size instead of ChatGPT 4's like.. 8k or 32k or whatever the hell it is right now.
Re: it thinking that "cocktail" might be a bit rude ...
not a patch on when Scunthorpe United FC updated their message boards with a profanity blocker and started to wonder why nothing was getting posted anymore
Google's research is not focused so much on LLM's, they produce a lot A.I. research on a variety of sectors. That said, their LLM's are so far behind it is not even funny. The multimodal 10 mil context window of Gemini pro, does look pretty good though!
Ouch! Why would they release this? I mean feeling pressure or not, releasing garbage is just BAD!
I was skeptical. I ran the same questions on huggingface and got way better answers. Something was off here.
It is very likely that his setup is incorrect or there is a bug in the way he loads model
Having early 2000s experience with google while it tries to work things out, I can tell you the will LAG BEHND UNTIL THEY DON’T. And when they hit the market with their all caught up models, they’ll be in the drivers seat.
i seriously don't understand why they've released this, especially if they tested it internally.
also the benchmarks are worthless by now, we need to come up with a better way of doing standardized tests.
i haven't found anything that works as well as Mixtral 8x7B, a lot of the models that have been coming out after that have been "mostly hype".
Tried one of the quantized versions last night. Was reasonably fast. Got the first question (a soup recipe). Additional questions that Mistral got right, Gemma was lost in space somewhere...back to Mistral.
Haha "Open Source" model.
Yeah, I tested it, it sucks.
And yet I am shockingly unsurprised it's as bad as it is.
Looks like we’re no longer in the age of Google. Crazy
I was so impressed with how Google could release such bad thing....
Can't understand how anyone can think that even releasing such a subpar model and associate your brand with it is a good thing. Waiting until your model is at least at the same level as the best model when you are google should be the minimum bar, Google is supposed to be the big fish.
A funny thought that might explain how truly shitty of results you’ve gotten: Microsoft noted in their paper for GPT-4 that it did the unicorn benchmark better before it was aligned than after.
The more censored an LLM is via training, besides those frustrating errors, if it is doing it by the model itself, making it more “politically correct” and censored causes brain damage as it weakens the language model.
As such, the safest LLM is the one you don’t run: uncensored, it might offend someone but is more likely to be self-consistent, censored, it may still offend someone, but also have truly defective reasoning all over the place, even outside of things you don’t want censored.
This is why I’m not wild about censored models, beyond their intentional biases: being politically correct (like in real life, oddly enough) means for more defective reasoning and lies.
I love how shocked you are in the opening clip
What I find hilarious about Google is that while using Gemini on the web, Google gives you the option to "double check" the responses with Google Search. So, why can't Gemini check itself against Google Search?? It's right there. I think Google is so scared of releasing AI into the wild they're not even trying, and in a way they're right.
Gemini 1.5 reviews are super legit looking... And we're probably not going to get access forever.
एक पुरानी कहावत है जब बुरा वक्त आता है तो ऊंट पर बैठे इंसान को भी कुत्ता काट लेता है. मुझे गूगल के साथ भी कुछ ऐसा ही नजर आता है. कुछ भी करें लेकिन कामयाबी नहीं मिल रही है
I tried it as well on ollama and was completely underwhelmed. It had typos, it had punctuation issues. In my very first prompt which was simply, “hey”. Then when I said it looks like you have some typos, it responded by saying it was correcting *my* text, and then added several more typos and nonsense words to its “corrected text”. I don’t know what’s going on with it, but I wouldn’t trust this to do anything at all. How embarrassing for Google.
Thanks! The massive size of the 7B GGUF was a put-off to start with. I am surprised it performed that bad.
You should use quantized versions. I doubt that there's much difference of quality between 32bit and 8bit (or even 4b).
They learn more from being wrong than being right when it comes to simple questions. Consider this.
Google SHOCKS and STUNS the Open source landscape
I should have used this title
Lol we all should have!!
@@matthew_bermanI thought at one stage you were literally going to start slapping your forehead off the keyboard!
Why does most Ai tech channels use that title 😂I just don’t pay attention to titles like that lmao 😂😊
its a meme at this point