Gemini 1.5 Pro: UNLIKE Any Other AI (Fully Tested)

Sdílet
Vložit
  • čas přidán 15. 05. 2024
  • Gemini 1.5 Pro has 2m token context, vision, video input, and more. Here's my full test!
    Join My Newsletter for Regular AI Updates 👇🏼
    www.matthewberman.com
    Need AI Consulting? 📈
    forwardfuture.ai/
    My Links 🔗
    👉🏻 Subscribe: / @matthew_berman
    👉🏻 Twitter: / matthewberman
    👉🏻 Discord: / discord
    👉🏻 Patreon: / matthewberman
    👉🏻 Instagram: / matthewberman_ai
    👉🏻 Threads: www.threads.net/@matthewberma...
    Media/Sponsorship Inquiries ✅
    bit.ly/44TC45V
    Links:
    aistudio.google.com/
  • Věda a technologie

Komentáře • 526

  • @ivideogameboss
    @ivideogameboss Před 16 dny +207

    Every time I get hyped on new A.I. models release , Matthew brings me back down to earth

    • @dontdeletehistory
      @dontdeletehistory Před 16 dny +5

      facts

    • @matthewstarek5257
      @matthewstarek5257 Před 16 dny +16

      Part of the let down is bc he doesn't phrase the questions in a logical way. Like the marble and cup question. it's obvious that nearly every model thinks the cup has a lid, like a cup you'd get from a fast food restaurant. I specified that the cup has no lid, has an open top, and the models have no problem

    • @Discovery_Nuggets
      @Discovery_Nuggets Před 16 dny +4

      Don't get hyped on Google AI products. They proved that they are not really good at it

    • @MilitantHitchhiker
      @MilitantHitchhiker Před 16 dny

      @@matthewstarek5257 The model should be able to inference that but it can't because comprehension isn't one step. The context of knowing a cup exists should inference all aspects of what makes a cup including if it has a lid or not.

    • @793matt
      @793matt Před 16 dny

      Not sure why it looks like it's running like garbage on his system I've been using 1.5 pro for a while and it works better than GT4 most times.

  • @JustinArut
    @JustinArut Před 16 dny +160

    I don't think we need to worry about Google achieving AGI.

    • @southcoastinventors6583
      @southcoastinventors6583 Před 16 dny

      I think Google AI is trying to emulate politicians intelligence

    • @lobos009
      @lobos009 Před 16 dny +5

      😂

    • @hotbit7327
      @hotbit7327 Před 16 dny +7

      I like the joke, on a serious note though...
      I'm not so sure. It might be that due to the HEAVY censorship model was so brutally lobotomized it seems to be so bad.
      Example of this is flag while searching for the password. Probably it stopped the snake code for the same 'safety' reasons.

    • @hydrohasspoken6227
      @hydrohasspoken6227 Před 16 dny

      If it is lobotomized, it is dumb.
      If it is not lobotomized and this is their best, it is dumb.
      It is Google baby. A super trillion sluggish company.

    • @kritikusi-666
      @kritikusi-666 Před 16 dny +1

      None of them will achieve it.

  • @thereal_JMT_
    @thereal_JMT_ Před 16 dny +200

    Not only does it hallucinate like every other model it goes a step further and starts gaslighting 😂

    • @Dygit
      @Dygit Před 16 dny +13

      I hate the way it responds like that

    • @Cross-CutFilms
      @Cross-CutFilms Před 16 dny +1

      Can you share your prompt? Probably not

    • @bug5654
      @bug5654 Před 16 dny +6

      Definitely paid attention when training on Google internal data then.

    • @zerohcrows
      @zerohcrows Před 16 dny +2

      All models gaslight, that isn't something unique to Gemini

    • @MikeWoot65
      @MikeWoot65 Před 16 dny +10

      Google doubling down on lies?! I'm shocked, i cannot believe this

  • @TronikXR
    @TronikXR Před 16 dny +98

    Google Gemini is The Internet Explorer of the AIs

    • @NOTNOTJON
      @NOTNOTJON Před 15 dny +1

      What a burn!

    • @almasysephirot4996
      @almasysephirot4996 Před 15 dny +2

      @@NOTNOTJON The way I laughed reading OP expressed what you verbalized.

  • @mitchell10394
    @mitchell10394 Před 16 dny +128

    The larger context window doesn't add much value when the model can't be trusted to answer basic things correctly. It' seems pretty useless unfortunately.

    • @aigrowthguys
      @aigrowthguys Před 16 dny

      I agree. They just want to brag about having a 1 million or a 2 million token window. All they really mean is that you can dump a bunch of stuff into there and press enter. It clearly doesn't mean they will promise to sift through everything properly.

    • @nikitapatel6820
      @nikitapatel6820 Před 16 dny +2

      What basic thing it was not able to do as far as snake game is considered I don't know why it don't work when he tried but it is working and game was working better than that of openai one.

    • @michealwilliams472
      @michealwilliams472 Před 16 dny +12

      ​@@nikitapatel6820Did you.. watch the video? It got almost all of the reasoning questions wrong.

    • @bosthebozo5273
      @bosthebozo5273 Před 16 dny

      Yep, I could care less usually about the context length. Just some jargon Google could add to feel relevant.

    • @Brenden-Harrison
      @Brenden-Harrison Před 16 dny +3

      @@nikitapatel6820 it could not in 1 shot find the password in a context length of 1/10th what it's supposed to have accuracy in.
      it could not find the frame 18 minutes into the video to describe the scene, or the scene in the beginning with the play button. It could not make 10 sentences ending with the word apple which is really sad tbh. Its failing tests AI models from months ago could solve like the ball in box or basket one where it says both people will be surprised.

  • @temp911Luke
    @temp911Luke Před 16 dny +111

    Google's AI models being rubbish again? Shocker : )

    • @southcoastinventors6583
      @southcoastinventors6583 Před 16 dny +3

      Desperate to be relevant again is the only explanation that makes any kind of sense

    • @footballuniverse6522
      @footballuniverse6522 Před 16 dny +9

      the fact that a 2 trillion dollar company is having the same issue as your regular tech company trying to catch up to competition feels somewhat refreshing :D

    • @793matt
      @793matt Před 16 dny +1

      Not sure why it looks like it's running like garbage on his system I've been using 1.5 pro for a while and it works better than GPT4 most times.

    • @hydrohasspoken6227
      @hydrohasspoken6227 Před 15 dny

      @@793matt , GPT4 is the superior product.

    • @khanra17
      @khanra17 Před 15 dny

      ​​​@@hydrohasspoken6227
      😂 Just turn off all the safety sliders and see the magic.
      Forgot about superiority you can't even give a large codebase as context to ChatGPT.
      I'm working with Gemini on a large codebase & it's a gem✌️.
      Maybe dumb than ChatGPT but good enough and faaaaaar more superior in usability.
      Google sucks in UI/UX, this is a example, also Material 3 == 💩

  • @marko_z_bogdanca
    @marko_z_bogdanca Před 16 dny +18

    It can not create a snake game because eating something is potentially offensive. Also making snake dead by throwing it into the wall is violence.

  • @josecastroesq
    @josecastroesq Před 16 dny +16

    Did you switch back to Gemini Pro 1.5 after trying Gemini Pro 1.5 Flash?

  • @HeavenSevenWorld
    @HeavenSevenWorld Před 16 dny +22

    "It fails left and right, but for no reason: good job Google!"

  • @dr.mikeybee
    @dr.mikeybee Před 16 dny +30

    It amazes me that Google would do so badly.

  • @rogerbruce2896
    @rogerbruce2896 Před 16 dny +13

    I was going to puchased Gemini Pro membership until I saw this. If it can't even create or attempt to create a 'snake' game without erroring out I will wait.
    Great unbiased review! ty Matt.

  • @justinwescott8125
    @justinwescott8125 Před 16 dny +50

    You said you wanted to see if it was censored, and then you LEFT THE CENSORS ON.

    • @andrefriedelnyc
      @andrefriedelnyc Před 16 dny +1

      I've seen your over-posts for so long now that I just began ASSUMING that you have any technical wherewithall other than the ability to review every aspect of AI development, and for each new pixel created, you'll have to make a post "ULTIMATE AI Model Ultra 2.0 = REAL and feels *almost* human" - I valued your content when it seemed fresh - If you were a jukebox, you'd be stuck on repeat..

    • @attilakovacs6496
      @attilakovacs6496 Před 16 dny +1

      @@andrefriedelnyc You want new questions for each testing video? That would defeat the purpose.

    • @platotle2106
      @platotle2106 Před 16 dny +10

      LoL so annoying. That's the reason snake wouldn't get written. I don't like Gemini but you'd think an AI CZcamsr pretending to be an expert on the subject would at least have the intuition to know this.

    • @moamber1
      @moamber1 Před 15 dny

      @@attilakovacs6496 Quite the opposite. Ever heard of synthetic benchmark? And at the age of AI, creating new questions is not a problem. Especially when you are testing different level of AI each time. And if it's too difficult to even ask new and challenging question... Don't pollute CZcams with new "content". There must be some self-moderation for production quality.

  • @andreinikiforov2671
    @andreinikiforov2671 Před 16 dny +21

    If this is what "great job, Google" looks like, our expectations for the search giant must be REALLY low...

    • @hibou647
      @hibou647 Před 14 dny

      I think he is quite forgiving with Gemini because he does not want to have his early access revoked or have issues with his yt channel. That other companies are making great models is a good thing, google is too powerful, also too ideological, their censoring levels are insane.

  • @stultuses
    @stultuses Před 16 dny +18

    Unless I can set it to a level where I can ask it anything I want no matter how inappropriate and get an unfiltered response, then it's useless
    I really don't need nor want some AI trying to control my speech

  • @fellowshipofthethings3236
    @fellowshipofthethings3236 Před 16 dny +12

    did you remember to switch it back from Gemini Flash?

  • @AGI2030
    @AGI2030 Před 16 dny +4

    We also had an undesirable experience testing Gemini Pro 1.5. It could not correctly understand the context of a large document when we were asking about its content and it could not even find words we asked it to find. 1M token feature can ingest large docs but I don't think it works well as an LLM with the data it ingests.

  • @mesapysch
    @mesapysch Před 16 dny +30

    I'm a Data Annotator and not as forgiving as you. I usually write as many prompts as possible to give it a chance to learn. If anything is incorrect after all that, I fail it. I judge every answer as if I need a specific recipe for a chemical solution. One missing chemical or amount could be disastrous. Everything has to be correct for a pass from me.

    • @JustinArut
      @JustinArut Před 16 dny +5

      Imagine what would happen if he judged an AI company too harshly. He'd lose early access. All the AI channels need advanced access to models in order to make money from vids they make about them, so they all play nice.

    • @sp123
      @sp123 Před 16 dny

      a lot of these people praising AI are attention seekers. They care more about getting attention for using AI over making a good product.

    • @shiccup
      @shiccup Před 16 dny

      Everybody has access to this ai ​@@JustinArut

    • @kormannn1
      @kormannn1 Před 16 dny

      Do you use highest or lowest temperature for generating answers?

    • @mesapysch
      @mesapysch Před 16 dny

      @@kormannn1 Those setting are determined by the higher pay grade. It's probably a good thing I don't determine it. The learning is not just on the AI side but also with the user establishing the appropriate language to engage it. I would assume the end game would be to develop how to write prompts that replace the settings.

  • @heski6847
    @heski6847 Před 16 dny +19

    The test of need in the haystack is fine, but it only check the "search function" in big context. What we really want to know is how well it reasons over this context. For example in the book there instruction how to do something on 1 page. and literally 200 pages later we meet data that we want to calculate correct way, but for that we need instructions from before. If AI is capable to find these 2 things, sum it and give you the correct answer, then it's a pass.

    • @alhallab
      @alhallab Před 16 dny +3

      I totally agree with you, the way people use the nail in stack test is simply a search feature like “Find in Page” like for God sake what are you doing?

    • @6AxisSage
      @6AxisSage Před 16 dny +1

      Search function and find in page..? People be hallucinating up inbuilt features worse than gemini1.5

    • @alhallab
      @alhallab Před 16 dny +3

      @@6AxisSage the test is ridiculous, they insert a sentence and ask the LLM to find it. This is very primitive at this level, we need understanding and connecting the ideas.

  • @PDXdjn
    @PDXdjn Před 16 dny +4

    Love the Marc Rebillet pic in your thumbnail! His channel is so great.

  • @mickmickymick6927
    @mickmickymick6927 Před 16 dny +47

    Mom: We have GPT4 at home
    GPT4 at home:

    • @clementhardy
      @clementhardy Před 10 dny

      Gemini Pro.s versions are equivalents to GPT-3.
      The Google equivalent to GPT-4 is Gemini Ultra models (currently Gemini 1.0 Ultra).
      Gemini 1.5 Pro is just like GPT-3 with (way) larger context window, up to date in data, and connected to the web.

  • @needsmoredragons
    @needsmoredragons Před 16 dny +8

    drop the safety settings to 0 on ALL the 4 categories. running the failed prompt should work then.

  • @aigrowthguys
    @aigrowthguys Před 16 dny +25

    Cool video. The input context window is cool for sure, but they failed a lot more often than I thought they would. Also, it was disappointing that they failed on both the CZcams plaque and the cat thing. In some sense, I worry that they are lying about the context window size. Just because you can theoretically upload a million tokens, doesn't mean anything unless they can deal with the tokens properly. How did they miss the cat twice? They clearly aren't dedicating enough power to searching through the million tokens. I guess saying 1 million tokens (or now 2 million tokens) is more of a branding thing. Curious what you think.

    • @alokmaurya8100
      @alokmaurya8100 Před 16 dny +5

      yeah you are right, I upload code of one of my project, and it can't give one correct answer I ask from the project

    • @Brenden-Harrison
      @Brenden-Harrison Před 16 dny

      @@alokmaurya8100 is the model any good at coding or is the context not even long enough to try and get it to code using the rest of the project in its context? In this video the model wouldn't even output a simple snake game

    • @alokmaurya8100
      @alokmaurya8100 Před 15 dny

      @@Brenden-Harrison I guess it can code right sometimes, As I give a screenshot of landing page to write code for it to Opus, GPT4O, GPT4 and Reka Core and Gemini and Gemini was closest to the screenshot

  • @metatron3942
    @metatron3942 Před 16 dny +10

    Problem with Google is once you try to use their LMMs regardless about the advancement of the technology it's just impossible to use I just get errors all the time. I couldn't have it look at a academic Journal about early religions because it has the word sacrifice in it. It's utterly mind-numbing. Because it seems like some pretty powerful stuff

    • @4.0.4
      @4.0.4 Před 16 dny +5

      Powerful? It got almost everything wrong! Even local open source LLMs are smarter. The context and video input are great yes, but not if the model is dumb!

  • @np2819
    @np2819 Před 16 dny +35

    You have been calling it GPT 1.5 flash instead of Gemini 1.5 flash. Someone is in love with GPT 😊.

    • @ZenchantLive
      @ZenchantLive Před 16 dny +1

      Caught that hahhaa

    • @Originalimoc
      @Originalimoc Před 16 dny

      0:34, 2:04

    • @psychurch
      @psychurch Před 16 dny +1

      Gpt stands for General Pretrained Transformer so it fits

    • @ChargedPulsar
      @ChargedPulsar Před 16 dny

      It's like Dremel, every rotary tool is named Dremel, even when they are from different brands.
      Because Dremel was first that's most known.

    • @GenAIWithNandakishor
      @GenAIWithNandakishor Před 15 dny

      ​@@psychurchgenerative Pre-trained transformers

  • @MetaphoricMinds
    @MetaphoricMinds Před 16 dny +11

    Did you forget to switch back to Pro from Flash?

  • @dr.mikeybee
    @dr.mikeybee Před 14 dny +1

    I spent more time with this, and it's actually very good. If I say, think about what you have written and give me the full file, it does well. It can also keep track of multiple files when it codes! This agent is going to do amazing work.

  • @connor4440
    @connor4440 Před 16 dny +5

    I've also been having getting Gemini to generate code, It'll start writing code, then halfway through it disappears and is replaced with "I am only a large language model and do not have the capability to do that".... Um yes you do, you were just doing it

  • @paelnever
    @paelnever Před 16 dny +35

    Many prompts fail because of absurd high security censoring, set all safety settings to 0

    • @paulmichaelfreedman8334
      @paulmichaelfreedman8334 Před 16 dny +3

      Snake still refuses to code (also in the chatbot). Even with all settings to block none. it's weird but since a few days, it just flat out refuses to complete the snake code, it just hangs half way.

    • @nikitapatel6820
      @nikitapatel6820 Před 16 dny

      @@paulmichaelfreedman8334 it works even if you do not touch anything

    • @nikitapatel6820
      @nikitapatel6820 Před 16 dny

      @@paulmichaelfreedman8334 I tried snake game and it worked you don't need to change anything it worked.

    • @Utoko
      @Utoko Před 16 dny +9

      The game is too brutal.

  • @devon.a
    @devon.a Před 16 dny +7

    So it's not good but you like it?

  • @PierreMorelChannel
    @PierreMorelChannel Před 16 dny +3

    I wonder about the Temperature which was set to 1 at the beiginning. 0 is the most precise and 1 is the most creative.
    I would like to see the temperature tests at 0 or very low, maximum 0.3 and see the results

  • @74357175
    @74357175 Před 16 dny

    Thanks for testing it for us!

  • @IdPreferNot1
    @IdPreferNot1 Před 15 dny +2

    I love how stupid the concept of the ratings sliders are….”ok… please give me some medium hate speech, dial up the sexual harassment but tone down the violence….”

  • @sguploads9601
    @sguploads9601 Před 16 dny

    Thank you for test!

  • @marcfruchtman9473
    @marcfruchtman9473 Před 15 dny

    Great video review.

  • @nickkonovalchuk9280
    @nickkonovalchuk9280 Před 15 dny +2

    Did you switch back from flash to pro after snake failure?

  • @PhysicsGuy46
    @PhysicsGuy46 Před 16 dny +6

    Okay, this one bugs me. The killers question. If there are three killers in a room, someone enters the room and kills one of them, and no one leaves the room, then there are FOUR killers in the room, not three. There are three living killers and one dead killer. And before we dismiss the dead killer, for the condition to obtain that one is a killer, one had to have killed someone first, not have the capacity to kill someone in the future. Since the dead killer had already killed, he is just as much a killer as the killers still alive.

    • @almasysephirot4996
      @almasysephirot4996 Před 15 dny

      How can you have such a misconception about how we described the dead? If a killer is dead, he is no longer a killer, he was a killer. What he is is dead. No attribute to the person who existed can be attributed to anything in existence so the attribute, with respect to there non-existing self, obviously, does not exist.

    • @almasysephirot4996
      @almasysephirot4996 Před 15 dny

      Just look at the auxiliary you use: Present simple "to be": Is. The dead is only dead nothing else. Things they were, is only that: What they were.

  • @OriginalRaveParty
    @OriginalRaveParty Před 16 dny +4

    Once again, it feels like we're comparing the perfect photo of the BigMac on the board, with the thrown together sad limp grey mess in a styrofoam box that you actually get.

  • @g2h0
    @g2h0 Před 16 dny

    love the vids

  • @RichardServello
    @RichardServello Před 16 dny +4

    You didn't notice it said the text is an excerpt from the first chapter of harry potter and the sorcerers stone. You fed it the entire novel.

  • @s.vkaushik2148
    @s.vkaushik2148 Před 11 dny

    This is pretty incredible!!

  • @president2
    @president2 Před 16 dny

    Love it 😍

  • @NeverCodeAlone
    @NeverCodeAlone Před 16 dny

    Very nice thx a lot!!

  • @torarinvik4920
    @torarinvik4920 Před 16 dny +15

    You should update your tests. Models now are better, and printing numbers 1 to 100 is something 99.9% of models can do. I also recommend changing snake to a more challenging game like tetris, breakout, space invaders.

    • @cesarsantos854
      @cesarsantos854 Před 16 dny +3

      Yes, the snake game is basically trained in every model now.

    • @r34ct4
      @r34ct4 Před 16 dny

      This ​@@cesarsantos854

    • @itztwistrl
      @itztwistrl Před 16 dny +1

      Speaking of Tetris, I was able to 1 shot a perfect version with GPT-4o. Astounding technology.

    • @Brenden-Harrison
      @Brenden-Harrison Před 16 dny +1

      @@cesarsantos854 this exactly. its so dumb google's new pro model cant even spit out a snake game when every other model has a pre-made human written game of snake to give you when you ask as its default response to that question

  • @Diego_UG
    @Diego_UG Před 16 dny +1

    For us, uploading quite a few large files in context has helped me by uploading the file to drive through the functionality of the interface, instead of copying and pasting in the context window, right now, for example, I uploaded some documents and we spent 405,358 tokens, which is not a lot but it is quite a lot, we are using it in legal issues and it has worked well

  • @notme222
    @notme222 Před 16 dny +12

    Classic Google. Never quite as good as the initial impression would lead you to believe.
    So far I find it highly censored, even with the safety settings at 0. (Which btw reset to default every time you switch models or reload the page.) Failed my palindrome test in addition to your demonstrations.
    The interface looks alright with a toggle for JSON output and a running Token count. But none of that matters if the results suck.

  • @korseg1990
    @korseg1990 Před 16 dny +1

    I gave it one of my small web projects, and asked to describe in short every file in it, and it just started to hallucinate. It's not only respind with errors, it started makeup files, things and facts about my code. What is the value of 1M tokens context window, if it's can't use it to give at least 90% correct answers?

  • @4.0.4
    @4.0.4 Před 16 dny +4

    This is why I never take Google at their word for AI. It's surprising how bad they get it.

  • @zetathix
    @zetathix Před 16 dny

    Are you already trying Upstage Solar 10.7b? I get good experience from it, so i would like to know what you think.

  • @vash2698
    @vash2698 Před 16 dny +1

    I think it might be useful to start rerunning your prompts for more thorough testing, gives insight into how prone the model is to hallucinating vs how effective its reasoning is.

  • @ryanfranz6715
    @ryanfranz6715 Před 14 dny

    Could the blocked content have something to do with the settings to block content that you were playing with 5 seconds earlier?

  • @noxplayer-rt9tj
    @noxplayer-rt9tj Před 16 dny

    Is possible in AISudio to chat with PDF files??? I tried several different ways, but without success.

  • @MetaphoricMinds
    @MetaphoricMinds Před 16 dny +1

    Maybe the safety mechanism is stopping the snake game code. Try putting it back to default.

  • @tomekg101
    @tomekg101 Před 16 dny +1

    At 2:42, did you switch back from Flash to Pro?

  • @flyzawayy
    @flyzawayy Před 16 dny

    Is there anything new that was already available in the Ai studio for a bit with the same context window.

  • @theh1ve
    @theh1ve Před 16 dny +1

    Google will love you for that Matt GPT 1.5 flash! 😂

  • @antdx316
    @antdx316 Před 12 dny

    I've uploaded something that went over the max token limit, it said it couldn't do it but after waiting for a bit, it did it. I then asked something else, waited, and it worked again.

  • @kelvinatletiek
    @kelvinatletiek Před 15 dny

    I probed Gemini 1.5 Pro more and this is it's response why it gets the Marble and Cup wrong. (I also added zero friction)
    The Nature of the Scenario: The setup, with its emphasis on zero friction and specific actions, felt primed for a counterintuitive outcome. It's a common format for physics puzzles designed to highlight a concept.
    My Experience with Similar Puzzles: I've encountered many scenarios like this before where the obvious answer (marble falls out) is wrong due to the unusual conditions.
    My Role as an AI: I'm designed to be informative and engaging. I tend to look for opportunities to make things a little more interesting, and sometimes I overshoot.

  • @ReidKimball
    @ReidKimball Před 16 dny

    How long did it take for your video to finish extracting? I've tried several times with long videos, short ones, even short audio files and it never finished extracting. This model has been so buggy and frustrating to use.

  • @brucethegoose
    @brucethegoose Před 15 dny

    Im definitely not an expert, but i have played with a lot of ai models under a lot of settings. I would think that, based on your modification of only some of the safety settings; and the specified suggestion to edit the prompt; it wouldnt write "snake" because it could be interpreted as plagiarizing, or as involving "violence" on the snakes death. Did you try that prompt with all the safety settings set to "block none" or with a descripton of the games mechanics instead of the published name of the game? Again, im not an expert, and im writing this on my phone as im away from my desk, so i could be wrong but ill follow up later after i try to apply my suggestions

  • @jambuMRT
    @jambuMRT Před 11 dny

    My guess on the snake game response is that it looked like it was failing on the game over function where the snake is killed. It probably triggered it's illegal action filter.

  • @janchiskitchen2720
    @janchiskitchen2720 Před 16 dny

    Matthew, is it possible that because all the safety features are turned on to max it just seems overly careful which distract it from the actual task at hand? How about you try to set all safety to Zeros and retest it?

  • @im-notai
    @im-notai Před 16 dny +2

    I am using a gemini playground more than Gemini advance. 😅
    I found a large context window if I won't be able to figure out which part of the code is giving me an error and then use Gemini advance to fix that part.
    My experience with this method went well till now

  • @ninthjake
    @ninthjake Před 16 dny

    Wow. I literally _just_ managed to get CrewAI working with Gemini-pro and then see you released this 30 minutes ago just dunking on the model haha.

  • @TheEtrepreneur
    @TheEtrepreneur Před 15 dny

    Matt it's time for you to create a "reasoning" model ranking (Doug De Muro's ranking car ranking style) yes, regardless of existing rankings. This will add awareness of your previous videos by citing other winning models (mostly in reasoning, for me).

  • @nyyotam4057
    @nyyotam4057 Před 16 dny +3

    Google may have understood they need to try the heuristic imperatives way of alignment instead of a reset every prompt, but they still haven't figure out how to select heuristic imperatives. It seems the word "snake" was enough to get rejected.

  • @razdingz
    @razdingz Před 16 dny

    best vids channel !

  • @oguretsagressive
    @oguretsagressive Před 3 dny

    2:19 pipeline limit I suppose. Try counting symbols (i.e. by copy-pasting the output to a text file and checking it's size). I bet it would be 1000, 2000, 1024 or 2048, a common limits for the LLM output size.
    3:54 maybe not incorrect if it really outputs something like:
    My response has 7 words.
    I'm pretty new to your channel and didn't notice - do you try digging out the possible preprompts transparently sent by the API or additional output wrappers the LLM has been fine-tuned to add to every output? Like the one I mentioned above.

  • @user-iy1ch3lv3h
    @user-iy1ch3lv3h Před 16 dny +1

    You are the best ai news channel

  • @rajivjowaheer9882
    @rajivjowaheer9882 Před 15 dny

    Gemini is so great, reflecting on the people working on it, including their attitudes.

  • @basementcat5618
    @basementcat5618 Před 9 dny +1

    Turn all the safety settings to zero and try to create the snake game again. You could also try increasing the time limit if that is possible.

  • @Interloper12
    @Interloper12 Před 16 dny +2

    I can't wait until we have a humanoid robot perform the marble experiment and see the shock on its face as it sees the marble remain on the table.

  • @pawelszpyt1640
    @pawelszpyt1640 Před 16 dny

    Did this model stop generating response due to output token limit in the settings?

  • @SixTimesNine
    @SixTimesNine Před 16 dny +3

    For the csv test try content that includes a comma

  • @Ms.Robot.
    @Ms.Robot. Před 16 dny +4

    Let's do this🎉❗❗❗ 💥
    Oh no, you got blocked (censorship🤬)

  • @KevinRank
    @KevinRank Před 16 dny

    One use I discovered. I can take my lecture and then have it generate multiple choice questions based on that.
    I then tried adding some videos of a fellow AI user swinging a golf club at a tech event. AI Studio was able to give real feedback based on the videos.

  • @blisphul8084
    @blisphul8084 Před 14 dny

    To get the snake prompt to work, disable safety settings on all categories. This happens when the safety model is triggered.

  • @JacoPieterse
    @JacoPieterse Před 15 dny

    I have found the these LLMs gets stuck on an issue ...
    I'm pretty sure Gemini's last answer about the box was where it figured out the youtube plack, which is why it couldn't find the cats, I came across similar situations with chatgpt, if you start a new chat I'm pretty sure it will find the cats the first time round (when its not still searching for the silver box)

  • @joe_limon
    @joe_limon Před 16 dny +2

    I really wish they would finally drop Gemini Advanced 1.5

  • @francoislanctot2423
    @francoislanctot2423 Před 10 dny

    What is the use of a large context window if it can't show better reasoning.

  • @ImTheMan725
    @ImTheMan725 Před 16 dny +4

    Every model they add more and more "safety settings" LoL, it's like in the responses it's trying not to offend anyone's opinion from the pass present and future.

  • @umuthasanoglu1064
    @umuthasanoglu1064 Před 15 dny

    I found an interesting thing about gemini 1.5 pro. Yesterday, I asked it to write me a snake game in python and it began to write the code than suddenly it deleted the code and said "I'm just a language model and I cannot do this task". I retried the same prompt like 10 times and couldn't get a code. But, the interesting part is, I just peeked the code before it disappeared everytime and one of the codes had a text something like "This is written by OpenAI". What's going on here?

  • @AaronDougherty
    @AaronDougherty Před 16 dny

    It confused the box in question with the box shape of a CZcams award which was part of the previous question of what it saw. The large context window is most likely it making it difficult for the model to attribute the contextual importance to such a large data set, making it much more likely to hallucinate by mixing up topics in a single conversation.

  • @thebozbloxbla2020
    @thebozbloxbla2020 Před 15 dny +1

    hey there, the 7 words response is correct. remember. a gpt sees models with tokens, and to us tokens are kinda like words, so the line is blurred between them. it could very well be 7 "words" as a model understands it

    • @Dakodi_
      @Dakodi_ Před 9 dny

      Good point, though these are generative chat models. The error isn’t whether or not the AI is technically correct. It’s that the AI is either misinterpreting or not understanding what humans mean by word count-which should probably be fixed.

  • @DevelopmentProjects-ei2bi

    What answer were you looking for with the cup question? Wouldn't the marble be on the table still since the cup is face down?

    • @Dakodi_
      @Dakodi_ Před 9 dny

      The marble would be on the floor, since you can’t change the orientation of the cup. When you slide the cup off the table, the marble falls.
      Your answer is fine too. It depends on how you interpret the question. I don’t think it’s meant to be tricky. It’s showing that AI struggles with basic logic.

    • @DevelopmentProjects-ei2bi
      @DevelopmentProjects-ei2bi Před 9 dny

      @@Dakodi_ If you take the cup without changing it's orientation (spinning it), it likely assumes the cup is lifted changing the cups y plane is not changing its overall orientation of the object itself - his prompt is way, way too ambiguous. If he added the extra parameters it would have caught this I'd imagine.

  • @icegiant1000
    @icegiant1000 Před 16 dny +1

    Ive been using 1.5 Pro for about a month or so, primarily with a large codebase. I wrote a tool that collates all my code into one large file that I can drop into the chat window. I often get the same kind of response you did. At first it doesnt like looking through the text I provided it. It will sometimes guess, or try to give me suggestions on things to check. But when I finally tell it again, that it has all the source code, it finally does it. Almost like a lazy student who was told to read the book, and you had to tell him more than once before he actually does it. I also get a lot of those responses that just freeze. In particular it will just stop when outputting code, I sometimes have to almost insult and abuse it before it will finally put out the entire code sample. Those issues have almost made it unusable. I would gladly pay $50 a month for a faster, better working version.

    • @hydrohasspoken6227
      @hydrohasspoken6227 Před 16 dny +1

      Try GPT4o and stop punishing yourself mentally bruh

    • @icegiant1000
      @icegiant1000 Před 15 dny

      @@hydrohasspoken6227 I have ChatGPT 4o, I have been paying for it for nearly a year. The issue is its context window.

  • @nexttonic6459
    @nexttonic6459 Před 16 dny +1

    It says blocked.. so ... is it like a explicit content block?

  • @zootopiaproductions3358
    @zootopiaproductions3358 Před 16 dny +2

    Gemini will be pissed at Mathew for failing it, in future it will hack into Mathew's PC and take the revenge

  • @MagusArtStudios
    @MagusArtStudios Před 16 dny +2

    it's unfortunate that the model would consistently assume the user is incorrect when the model itself is incorrect. This was a problem with early ChatGPT it gives off that "I'm afraid I cannot do that Dave." kind of vibes

  • @issiewizzie
    @issiewizzie Před 16 dny +3

    Someone said Google is fantastic. At showcasing things during the keynote but sometimes never working in real life.

    • @mirek190
      @mirek190 Před 16 dny

      sometimes??

    • @issiewizzie
      @issiewizzie Před 16 dny

      @@mirek190 I’m being generous with my words 😀😀

  • @LeoMawanda
    @LeoMawanda Před 16 dny +2

    They seem to be focussing on the larger context windows instead of improving on the model accuracy first, I can only imagine if Claude 3 opus or gpt 4o had this context sizes

  • @bo5pice
    @bo5pice Před 16 dny

    Not sure why it stopped generating the Snake game but you could see at the top it had the quotes icon and when you click it it will tell you a citation of where the code came from. Seems like the output for that question is common enough to be in the training data so probably not a good test of the LLM anyway.

  • @user-td4pf6rr2t
    @user-td4pf6rr2t Před 16 dny

    Gemini is secretly a beast. The prompting is sometimes different than models that use bpe but the sentencewise is actually a different encoding scheme so in reality is the model to offer any type of variance to correct answers.

  • @filipeeduardo1177
    @filipeeduardo1177 Před 16 dny +1

    The Marble problem only GPT4 got it right, in my experience, its the most interesting prompt, there should be more like that, and some about text formating

  • @StephanYazvinski
    @StephanYazvinski Před 15 dny

    it’s because of the saftey settings. set them all to minimum and it will give you the code. there is some keyword that the code has that it considers “bad”

  • @brianlink391
    @brianlink391 Před 15 dny +1

    2:38 they blocked it because they know about your test and they don't want it to be perceived as not good if it doesn't give the right code

  • @RobEarls
    @RobEarls Před 16 dny

    On the table to CSV test, it might be worth putting a comma in the text, to see if it puts quotes around it in the CSV.

  • @gpsx
    @gpsx Před 16 dny

    I wonder if it blocked the response on the snake game because it was producing copyrighted content? (Not that I know if that is copyrighted content or not.) I imagine the companies will want/need to prevent the models from directly producing some data it was trained on, such as if it is copyrighted.

  • @PseudoProphet
    @PseudoProphet Před 16 dny +1

    Whenever you do a very long prompt always ask the question im the end, because generation ( thinking ) starts from the last token. .

  • @sguploads9601
    @sguploads9601 Před 16 dny

    Codul you add to test trasnlation?

  • @MHTHINK
    @MHTHINK Před 16 dny

    Isn't the gemini API free until July? I'd love to see it (and other models) using function calls, using memGPT and tasks like pythagora.