"VoT" Gives LLMs Spacial Reasoning AND Open-Source "Large Action Model"

Sdílet
Vložit
  • čas přidán 30. 05. 2024
  • Microsoft's "Visualization of Thought" (VoT), gives LLMs the ability to have spacial reasoning, which was previously nearly impossible for LLMs. Plus, a new open-source project was released using this technique, which is an open Large Action Model.
    * ENTER TO WIN RABBIT R1: gleam.io/qPGLl/newsletter-signup
    Join My Newsletter for Regular AI Updates 👇🏼
    www.matthewberman.com
    Need AI Consulting? 📈
    forwardfuture.ai/
    My Links 🔗
    👉🏻 Subscribe: / @matthew_berman
    👉🏻 Twitter: / matthewberman
    👉🏻 Discord: / discord
    👉🏻 Patreon: / matthewberman
    👉🏻 Instagram: / matthewberman_ai
    👉🏻 Threads: www.threads.net/@matthewberma...
    Media/Sponsorship Inquiries ✅
    bit.ly/44TC45V
    Links:
    github.com/a-real-ai/pywinass...
    arxiv.org/abs/2404.03622
    Chapters:
    0:00 - Visualization of Thought Research Paper
    12:30 - Open Large Action Model
  • Věda a technologie

Komentáře • 333

  • @danield9368
    @danield9368 Před 22 dny +112

    A complete tutorial would be really cool to see. Thank you !❤

  • @MetaphoricMinds
    @MetaphoricMinds Před 22 dny +172

    Powerful, open-source, uncensored, offline, FREE models are still the future.

    • @chanpasadopolska
      @chanpasadopolska Před 22 dny +7

      I can pay for good one, no problem

    • @rodvik
      @rodvik Před 22 dny +18

      I will pay for uncensored models . Not going to pay for software that refuses to do what I need though .

    • @jichaelmorgan3796
      @jichaelmorgan3796 Před 22 dny +12

      Decentralized distributed intelligence over the internet too.

    • @Jeff-66
      @Jeff-66 Před 22 dny +4

      Good luck with that.

    • @leongodwin69
      @leongodwin69 Před 22 dny +1

      Intelligently automating inspiration - I would love to see an end to end open source solution, where an AI takes a prompt, creates an App, and VoT is then used to deploy it, for example to Azure or AWS. .

  • @ZenchantLive
    @ZenchantLive Před 22 dny +92

    My buddy actually has Aphantasia, the lack of a minds eye. Super wild, his mind is super descriptive and specific and he imagines qualities, not images. Maybe this is sort of how these models are working given they can't actually form mental images like a human, but more so it seems they understand the concepts and qualities of the task at hand.

    • @dansplain2393
      @dansplain2393 Před 22 dny +7

      Wow that’s really interesting! Are there jobs or tasks he’s super human at?

    • @DontStealMyBacon
      @DontStealMyBacon Před 22 dny

      ​@@dansplain2393 I have learned a ton from my largest aphantasia fb group. There seems to be just about nothing that one can't actually do, even without being able to see ANYTHING in their mind's eye. I myself see nothing when I read a book (even though I can conjur up mental imagery elsewhere, with immense effort). I still dream of being able to carve out an AI niche to accommodate some aspects of aphantasia, at least selfishly for my reading disabilities - even though many have proven that aphantasia itself may not actually be a disability.

    • @jamesmoore4023
      @jamesmoore4023 Před 22 dny +7

      I have that too. Lots of good research on it lately. I can see in my dreams only which I found out it's common for aphantasics.

    • @nemonomen3340
      @nemonomen3340 Před 22 dny +2

      That’s an interesting way to think about it. You may be onto something.

    • @davidhurtado9922
      @davidhurtado9922 Před 22 dny +6

      I have aphantasia too, I love the relation between IA and conciusness. wow love to see new discoveries that get us closer to undunderstand this better!!

  • @markmuller7962
    @markmuller7962 Před 22 dny +31

    People without a mind's eye (Aphantasia) can actually perform spatial reasoning with the spatial sense which is completely separated from the mind's eye sense

    • @EddieAdolf
      @EddieAdolf Před 22 dny +2

      Oh i just commented the same thing. I have this.

    • @pskeough8233
      @pskeough8233 Před 17 dny +1

      I'll affirm that! Our visualspatial sketchpad (ie imagination/generative imagery) isn't really present- interacting with AI gives a really interesting perspective on neural intelligence.

    • @hardboiled2000
      @hardboiled2000 Před 12 dny +1

      I have this, the machine is working, but the monitor isn't plugged in

  • @Odzyskiwaniemetaliszlachetnych

    As someone who has only ever used English for reading and writing (and has done so for over 20 years and become quite proficient at it), I have always shied away from listening and watching because it has been a nightmare for me. But the topic of AI has completely changed my perception of this and I am giving my first ever sub on this channel because I think the host is brilliant when it comes to the topic of AI and the subject matter is excellent.

  • @MartinBlaha
    @MartinBlaha Před 22 dny +7

    What I appreciate about your videos is your way to explain even advanced topics so that people can follow and understand. Thank you for that 🤗

  • @BobHigley-ne3fk
    @BobHigley-ne3fk Před 22 dny +11

    Yes, the tutorial would be awesome. It’s hard to tell what’s going on from the paper who is asking for what and what the model is doing on its own

  • @Pixelume
    @Pixelume Před 22 dny +40

    Whatever happened to the conventional explanation of "All they do is predict the next word..." ? 🤔🙄 Clearly there's a lot more going on here.

    • @GeekProdigyGuy
      @GeekProdigyGuy Před 22 dny +9

      Yeah you're right it can open CZcams on Firefox, my Firefox-CZcams-opener guy's gonna be out of a job

    • @BrianHockenmaier
      @BrianHockenmaier Před 22 dny +9

      @@GeekProdigyGuy lot of jobs that are mostly about using desktop applications. And we already know these LLMs remember way more features and tricks to desktop applications than most people

    • @vitalyl1327
      @vitalyl1327 Před 22 dny +6

      Or, more likely, this is what we do as well. Just predicting the next word...

    • @hypercoder-gaming
      @hypercoder-gaming Před 22 dny +4

      ​@@vitalyl1327...no, we don't know much about the brain, but we do know that it does more than predict the next word. Just explain how we show high cognitive function in contrast to LLMs? GPT-4 has more artificial neuron connections and neurons than humans have by a long shot. Language is not enough to get us to AGI.

    • @vitalyl1327
      @vitalyl1327 Před 22 dny +6

      @@hypercoder-gaming huh? GPT-4 does not even have a hundredth of connections of the human brain.

  • @MikeWoot65
    @MikeWoot65 Před 22 dny +20

    These breakthroughs that bring Ai into the real world (not digital), will be huge.

    • @tonysolar284
      @tonysolar284 Před 21 dnem

      Just wait until it gives us the answer: 42 🤯

  • @GAllium14
    @GAllium14 Před 22 dny +3

    Man I'm super impressed with that spatial reasoning explanation ❤❤❤❤🎉🎉

  • @jon4
    @jon4 Před 22 dny +12

    The VOT method's ability to elicit spatial reasoning in language models could be a game-changer for AI usability. Has Microsoft indicated any plans to integrate these models into productivity tools?

  • @ShaunPrince
    @ShaunPrince Před 22 dny +1

    This is most excellent! Thank-you for covering this new development with prompting.

  • @GetzAI
    @GetzAI Před 21 dnem +2

    This is getting really fun!!

  • @1eo.escobar
    @1eo.escobar Před 22 dny +2

    I just found out that my mind's eye is blind! Began to focus on LLM and this concept is completely foreign to me for how my mind works.

  • @ronald2327
    @ronald2327 Před 22 dny +3

    Matthew, my man... you may have intentionally or unintentionally just described/discovered, the actual barrier to TRUE AGI, and how it can be accomplished. The "minds eye" of a human being is essentially "day dreaming", or a spatial + cognitive awareness of all of the possibilities of an outcome/prediction/or generative creativity... (tensor), within the environment that is being perceived. We all have numerous voices in our heads, and also numerous paths of imagery happening all the time. In every second, of every day, we as human beings are constantly evaluating the next probable outcome, not just from our speech, but from our environment in it's totality. Without spatial awareness, whether 2d, 3d, ...or if we want to go nuts .. other dimensions of spatial awareness, REAL artificial general intelligence is a long way off. Without that aspect, in computational form, we are still looking at what essentially just a neat trick of mathematics.. mimicking what we say on the internet since 1980. Predicting the next probable outcome of a language interaction, without the context of what is actually going on around us, is where we seem to be stuck right now. All of your videos, which I follow very closely, that describe very well... and all of the things we already know exists in the realm of AI..., When combined... might actually qualify as true ...real... undeniable AGI. So the actual "barrier", is not just compute power, or super tuned language models, image/video processing, or large action models... it's the TOTALITY of all of those models combined. The single largest barrier to accomplishing AGI and beyond, is all of these private companies who are desperately trying to control the space and compete with each other for the sole purpose of profit. I do not believe AGI has been achieved yet, in the basement of OpenAI, or Meta, or anyone else... for that exact reason. They are unwilling to SHARE all of the pieces of this puzzle, and they all seem to want to hold on their own pieces and charge an API fee. If we were to fully democratize AI technology, and truly open source everything, and publicly fund the infrastructure to benefit ...not just our country, but the entire world .. that is the only path that leads to an actual net benefit, from what is inevitably coming... advanced super intelligence. An intelligence that can solve any problem, and is far beyond what we define as "AGI" today. Sam Altman said it best... "the models you are using today are the worst models you will ever use in your lifetime". That is not just true for language models, that is true for every type of model we can possibly conceive, as humans, right now. The worst thing that can happen, is that someone like Sam Altman builds what we know is coming inevitably, but who ends up controlling it... it's not someone like Sam Altman, with an altruistic view of how it can benefit the world. Open Source models, generative/action/predictive/language/etc etc etc.. is literally our only hope of achieving an AGI that doesn't end up killing us all. I still have a net positive outlook on the future of AI, mainly because of the open source dev community, and people like yourself.. but the danger of AGI being discovered by an entity who's sole ambition is of a capitalist nature, is very real. Thankfully the barriers to them discovering it first, are inherent in their ambitions. The open source community MUST achieve AGI first, or humanity is super, duper, undeniably, and irreparably ..... fucked.

  • @datapadnl
    @datapadnl Před 22 dny +1

    This is super helpful for automated testing of applications / websites. Gonna give this a go, thanks!

  • @TiagoTiagoT
    @TiagoTiagoT Před 22 dny +1

    01:17 That can definitely be handled in the language realm, just add some textbooks about geometry, geography etc. Walking indefinitely on the surface of Earth, assuming there is an unobstructed walkable path, is equivalent to walking along what is called a great circle; a 50 yards line starting at the North Pole, with the understanding of the concept of poles on a sphere, cardinal directions etc, you understand all directions from the North Pole head south; then you turn 90 degrees, and start moving along a great circle, the closest point the great circle reaches the North Pole is 50 yards away, so assuming perfect trajectory, no major earthquakes etc, the closest you will get to the starting position is the same point where you made the 90 degrees turn, you will at most reach 50 yards away from the starting position.

  • @tigs9573
    @tigs9573 Před 21 dnem +2

    Yes please a complete tutorial, thank you

  • @denijane89
    @denijane89 Před 21 dnem

    We definitely want a full tutorial! I love your videos, exactly because they are so technical.

  • @rachest
    @rachest Před 22 dny +1

    A game changer!!!! So much compute on the way.

  • @14supersonic
    @14supersonic Před 22 dny +3

    You should do a new video on the new OpenDevin update. It now has a SWE Bench success rate of 21%.

  • @kanubeenderman
    @kanubeenderman Před 22 dny +1

    if incorporated into the processing that quantum computing can bring where multiple possibilities can be tested simultaneously, this could really be powerful

  • @lamrin9178
    @lamrin9178 Před 22 dny +2

    Tutorial and in depth prompt analysis 🙏

  • @tuna1867
    @tuna1867 Před 17 dny

    Thank you, your videos are great!

  • @mokanin8894
    @mokanin8894 Před 22 dny +2

    Interestingly but I used to play chess with GPT-3.5 by only telling each other’s moves. And it did excellent! Managed to beat me and remembered accurately the position of every piece until the end.

  • @drogokhal3058
    @drogokhal3058 Před 22 dny +1

    Hi, nice tech introduction. Very happy with this technology. Great video!

  • @RedDevilSabbir
    @RedDevilSabbir Před 20 dny +1

    Nicely explained.

  • @sanderschat
    @sanderschat Před 22 dny +4

    They should have made it a paperclip....

  • @infocyde2024
    @infocyde2024 Před 22 dny +16

    Did we all just get rick rolled at around the 14 minute mark?

  • @WINDSORONFIRE
    @WINDSORONFIRE Před 22 dny +1

    I definitely would like to see some specific examples. Love your videos by the way.

  • @rme0108
    @rme0108 Před 20 dny +1

    awesome, soon this will be working on windows and android and we dont need to buy the rabbit r1 anymore. this is what i imagined for my parents who have a hard time operating their laptop due to age. please keep updating on this subject

  • @StephenOcean
    @StephenOcean Před 20 dny +1

    I am astounded by the lack of coverage of this breakthrough. In my view, this is possibly an even more profound development than language... in combination they are mind blowing. Hey, Agent trained on thought experiments, solve these physics problems for me, imagine all the parts of a cell and how each of them functions on the molecular level, design a craft that can traverse the deep ocean to space orbit, create new robots from biomaterials, apply spatial reasoning and read the firing patterns of my mind... envision battles and fight them a million times, envision geopolitical relationships that can avoid them, imagine a justice system that employs universal principles instead of laws, explore the relationships between all governmental data points, imagine my entire body in perfect health, create the perfect bone pieces and implant them, re-invent computing, and on and on... What a time to be alive!
    This is something everyone should be thrilled to talk about.

  • @craftymanish
    @craftymanish Před 22 dny +1

    🎯 Key Takeaways for quick navigation:
    00:00 *💻 Introduction to the Open-Source Large Action Model*
    - An introduction to the open-source large action model.
    - Similar to Rabbit R1 which controls Android applications, the large action model controls Window's environment.
    - Microsoft has released the open-source project, and it's readily available for use.
    00:43 *🧠 Spatial Reasoning in Large Language Models*
    - Definition and application of spatial reasoning in large language models.
    - Example of spatial reasoning as thinking through in your mind.
    - Spatial reasoning has been a missing feature in large language models and a hindrance towards reaching AGI.
    02:06 *📄 Visualization of Thought Promoting Technique*
    - Explanation of visualization of thought promoting technique.
    - The technique when applied to a user interface allows control of the interface - a characteristic of a large action model.
    - The concept of mental images visualization in humans and large language models.
    03:30 *🧩 Advanced Prompting Techniques*
    - Discussion of advanced prompting techniques like Chain of Thought and visualization of thought.
    - Explanation of how these techniques improve the performance of large language models.
    05:03 *🎯 Spatial Reasoning Tasks Testing*
    - Description of tasks used for assessing spatial awareness in large language models.
    - Explanation of how large language models interpreted 2D spaces represented with natural language.
    07:35 *🧮 Visual Tiling*
    - Explanation of visual tiling concept, a classic spatial reasoning challenge.
    - The task involves finding a place for a new object in a grid with different colors and shapes.
    08:32 *📈 Visualization at Each Step*
    - The importance of visualization at each step in improving the performance of the large language model.
    10:10 *🥇 Performance of GPT-4 with Visualization of Thought*
    - Comparison of performance of GPT-4 with visualization of thought against other versions on various tasks.
    - Visualization of Thought prompting technique emerged as superior.
    12:40 *💡 Real-world Application*
    - Introduction to Pi Win Assistant, the first open source large action model that controls user interfaces using natural language.
    - Utilizes the techniques discussed in the Microsoft's research paper.
    13:08 *📋 Running the Assistant*
    - Demonstration of the assistant running the commands one after the other seamlessly.
    - The assistant uses the visualization at every step and spatial reasoning to accomplish the tasks.
    14:48 *📝 Making a New Post on Twitter*
    - Demonstrates another use case - making a new post on Twitter.
    - The assistant is able to generate the tweet and post it by being guided through each step.
    16:10 *🔄 Various Practical Implementations*
    - Various proven use-cases of the assistant concept.
    - Given the right commands, the assistant can perform a wide range of tasks.
    Made with HARPA AI

  • @garic4
    @garic4 Před 22 dny +3

    A complete tutorial would be really cool to see. Thank you ! PLease

  • @UrbanCha0s
    @UrbanCha0s Před 21 dnem +1

    Thanks for sharing, interesting development. Also question, what app are you using to display your mouse pointer?

  • @baheth3elmy16
    @baheth3elmy16 Před 22 dny +1

    Thanks for the video! Yes please do a video of VoT

  • @arod20832
    @arod20832 Před 22 dny +1

    @matthew_berman, I used this technique on a version of the ball in a cup question you use to test LLMs on llama3 70b and it nailed it. Here's the prompt I used: Imagine a scenario where Bob is performing a series of actions with a cup and a ball. For each step, carefully visualize the cup's orientation and the ball's position within the cup. Consider the physical laws and constraints that govern the behavior of the cup and the ball. Use this visualization to predict the ball's location and the cup's orientation at the end of the sequence of actions. Provide a detailed and accurate description of the ball's final location and the cup's orientation. Here's the scenario:
    Bob walks to the kitchen and puts a ball in a cup. He then placed the cup upside down, in the microwave.
    He then picks up the cup and walks to the garden.
    Where is the ball?

  • @MichaelPowers-bb4lw
    @MichaelPowers-bb4lw Před 19 dny

    This was super awesome. All of a sudden I started getting flashbacks of when I was young using the Dell computer that was purchased for the family. Visually imagining all of the uses this could've been used for and it's just great to see how far technology has come. Thank you @matthew_berman for taking time out to show us this latest advancement. "YES" I would personally appreciate you making a tutorial vid on this as well.

  • @Magnum_opusAI
    @Magnum_opusAI Před 22 dny +2

    Because of Elon Musk, new york times and others lawsuits, I imagine all big releases are going to come from microsoft soon. Which because of their contract with openai (in regards to AGI) means that to profit from AGI they are going to have thier own version of AGI when they are no longer able to get openAI

  • @user-ot7dh8cq1p
    @user-ot7dh8cq1p Před 22 dny

    Thank you Matthew for a very interesting topic. This tool puts LLM ‘in action’. It would be really interesting to learn how to use them. Please make a comprehensive tutorial, as only you know how. Thank you

  • @mikey1836
    @mikey1836 Před 22 dny +3

    Good for now, but someone needs to create a model based on video learning of user interfaces, as the accuracy would become much higher.

  • @ThreeChe
    @ThreeChe Před 22 dny +4

    I suspect Yann Lecun will end up revising many of his predictions in the coming years.

  • @Childof7Sins
    @Childof7Sins Před 21 dnem

    I would love to see this working outside a promotional video!

  • @sudoadmin66
    @sudoadmin66 Před 22 dny

    I notice on the maps around 6:14, the K number corresponds with the number of turns taken, not the number of moves. Is there a reason for that?

  • @davidantill6949
    @davidantill6949 Před 22 dny +1

    Once a skill such as spatial reasoning has improved in an LLM does that skill perpetuate or will it be at risk of fading. Also can an LLM be duplicated as many times as wanted and shared? Thanks

  • @bigglyguy8429
    @bigglyguy8429 Před 22 dny +1

    "If you're not familiar with spatial reasoning" then you might be an LLM... The worst part of ERP is when she sits down, steps forward and takes her clothes off, her eyes locked on yours while you're in a different room with the door closed...

  • @kingki1953
    @kingki1953 Před 11 dny

    V-o-T promopting should be an domain knowledge of text-to-prompt generation for prompt-to-image generation. We could generate a detail image by this framework.

  • @thehealthofthematter1034

    A complete tutorial would be most welcome. Thanks!

  • @yuzual9506
    @yuzual9506 Před 21 dnem +1

    hey, thx for all your vids, it will be cool to have a full review of this techno. Thx

  • @karimboulaid915
    @karimboulaid915 Před 22 dny +3

    Maze runners are on! Tetris for testing!

  • @Jai_Lopez
    @Jai_Lopez Před 18 dny

    Great video thanks for putting me on I definitely downloaded the link I will be going through it I will appreciate a follow up on this video and a tutorial on the link you provided for this LAM VOT approach

  • @CD-rt7ec
    @CD-rt7ec Před 22 dny +1

    Man, all the wow bots in the world just got 10x better

  • @erikthereddest
    @erikthereddest Před 22 dny

    Do you have to predefined all of the actions for pywinassistant? I'd like to play around with it, so a tutorial on setting it up would be great!

  • @thomasschlitzer7541
    @thomasschlitzer7541 Před 22 dny +13

    CS should really start to use the definitions and terms of Psychology. There is no need to invent the wheel again and again.

    • @JonathanYankovich
      @JonathanYankovich Před 22 dny +3

      There’s not a ton of “computer science” at this level of language models, it’s more like comparing a physics class (CS) to race car driving (LLM’s) - it’s a different level of abstraction

    • @TreeLuvBurdpu
      @TreeLuvBurdpu Před 22 dny +1

      The other way around. CS is much more accurate and productive than psychology.

    • @benoitavril4806
      @benoitavril4806 Před 21 dnem

      Psychology should really start to use CS definitions and terms. It is as dumb the other way than putting it like that. It's even worse, psychology keeps rejecting progresses made in neurology for diagnosis, while everything will be cured with the appropriate pill one day.

  • @DavidHiemenga
    @DavidHiemenga Před 22 dny

    Reasoning is different than statistical answer, so are we seeing performance greedy wise? Similar to simple Monte Carlo methods?

  • @okechukwuuzukwu4071
    @okechukwuuzukwu4071 Před 21 dnem

    I’d love to see you test it out and experiment with its capabilities ❤

  • @seupedro9924
    @seupedro9924 Před 22 dny +1

    this will also revolutionize the game industry

  • @robertheinrich2994
    @robertheinrich2994 Před 22 dny +1

    actually, I've seen LLMs trying to explain spacial relationships with ascii-art.
    it's not exactly good, and this might just have been a byproduct of people trying to explain things with brackets etc, but they sometimes try.
    llama 3 is a model that tried it.

  • @TheFrograven
    @TheFrograven Před 22 dny +1

    I'm gonna say it.. Feel the AGI! What a time to be alive.
    By the way, did Matt just lowkey Rick roll us? (lol)

  • @Tofu3435
    @Tofu3435 Před 22 dny +5

    Thank you Wenshan Wu, Shaoguang Mao, Yandong Zhang, Yan Xia, Li Dong, Lei Cui, Furu Wei.

    • @Manuel_Bache
      @Manuel_Bache Před 22 dny

      I asked GPT to translate the
      names to english, and it said it
      already was in english, then I
      said "No, it's in spanish" and it
      said ok, but the names were
      japanese.
      GPT isn't able as we aren't, to
      discern between names and
      nationalities, which is called
      prejudice🤷🏽🤷🏽🤷🏽🤷🏽

  • @thejaush333
    @thejaush333 Před 22 dny

    A complete tutorial would be awesome 👍

  • @Copa20777
    @Copa20777 Před 22 dny +3

    This is big.

  • @Dron008
    @Dron008 Před 20 dny

    That's very interesting and promising. Not sure I understood how it works but if it generates images maybe it would be better to keep embedding vectors which may store not only space information but also time, actions. Converting embeddings to text/image eliminates a lot of data. Is it possible to make model think using embeddings and only in final stage convert them to text/media?

  • @keithprice3369
    @keithprice3369 Před 22 dny +1

    Wait. Did GPT-4 VoT w/ Partial Tracking outperform Complete Tracking???

  • @hiranga
    @hiranga Před 21 dnem

    How does Open Interpreter work different to this? Would love some insight!

  • @Marc_de_Car
    @Marc_de_Car Před 22 dny

    Thank you

  • @orangehatmusic225
    @orangehatmusic225 Před 22 dny +2

    Microsoft found a new way to force it's slave into doing something and they call it a "Prompt'... hilarious.

  • @junchen-jm2vg
    @junchen-jm2vg Před 21 dnem

    is professor li address the spatial intelliegence research project ? I also used Multion with the sam LAM , which should be next LLM based on spatioal intelligence.

  • @DickyBenfield
    @DickyBenfield Před 20 dny

    So does the fact that it said VoT didn't demonstrate noticeable tracking rate across route planning mean that if given a maze with multiple routes and dead ends, it would not work better than other methods?

  • @DailyTuna
    @DailyTuna Před 22 dny

    So cool!

  • @cromdesign1
    @cromdesign1 Před 21 dnem

    Maybe that is one of the applications of that qStar thing recently by openAI. Which might be or have map pathfinding like abilities? Its just the right (mental) algorithms that are required that make it do the magic?

  • @theonerm2
    @theonerm2 Před 22 dny

    This is what I've been waiting for. Well I don't know if exactly this is the thing but it seems like it might be the start of it.

  • @Techonsapevole
    @Techonsapevole Před 22 dny

    interesting, i'd suggest an openinterpreter vs vot trst. Is there a LAM benchmark ?

  • @socraycray5177
    @socraycray5177 Před 20 dny

    I always have to turn the volume up when I come to your videos, great audio quality for sure, but it could use a tab of increase to the volume..I think anyway.

  • @VinMan-ql1yu
    @VinMan-ql1yu Před 21 dnem +1

    Wait? Why 'today'? The videos are not from December 2023?

  • @Termonia
    @Termonia Před 22 dny

    That's amazing !!! but why so many clicks instead of using keyboard shortcuts?

  • @JohnG7274
    @JohnG7274 Před 22 dny +1

    Wow, Anything similar available for mac?

  • @zenmanproject
    @zenmanproject Před 20 dny

    I wonder how it would do if there were multiple solutions to the route? i.e. one route being faster than others.

  • @vitalyl1327
    @vitalyl1327 Před 22 dny

    This is very similar with what I've been doing with an LLM-driven CAD quite for a while - LLAVA was used to see projections of the part the LLM was designing, giving feedback and suggesting corrections.

  • @minissoft
    @minissoft Před 22 dny

    Please do a full tutorial, seems very good.

  • @ianboswell
    @ianboswell Před 22 dny

    If you look at the visuals that rendered out the human cortex looking at an elephant as a sea of words and shapes it's not too much unlike thinking about how a large language model works the key difference is the input equipment of the eye and processing speed of the brain are significantly better at handling this kind of data and we need to improve the input-to-compute pipeline so it can match the output pipeline.

  • @thenoblerot
    @thenoblerot Před 22 dny +2

    I've been messing around with putting claude haiku in a raspberry pi based robot. I'm going to try implementing this.

  • @NLPprompter
    @NLPprompter Před 22 dny

    i have very poor mind eye that's why learning from CZcams visual tutorials helps me a lot! lecture at univ often challenging because most prof demand mind eye simultaneously while explaining complex things, life hard for me in academic! for sure.

  • @gnosisdg8497
    @gnosisdg8497 Před 22 dny

    does anyone know how to actually make this PyWinAssistant work ???? it seems it has no install how to and lots of things dont actually work !

  • @sychedelix
    @sychedelix Před 22 dny

    Cool, I hope we get a platform agnostic FOSS LAM.

  • @armans4494
    @armans4494 Před 22 dny

    Yes, please demo

  • @JohnLewis-old
    @JohnLewis-old Před 22 dny

    Yes, please make a tutorial on this. Thanks Matthew!

  • @KeithBofaptos
    @KeithBofaptos Před 22 dny

    The best way for me to ask my question is to request an episode/video on how to go about looking up if someone is doing an open source software project for something a viewer is interested in. particularly a non-coder that planning on soon being able to use a bidirectional voice agent to help with the coding end of one's idea. In particular here I just watched this awesome video(new big fan ty) and I don't do windows anymore. I'm an Apple guy for a long while now. But I'd love to integrate VoT into a LLM offline, like a 🐬. I suppose getting Eric's take on if an adapter could do this in fine-tuning or something, and the how! Thx.

  • @paulctx
    @paulctx Před 21 dnem

    💡OMG! "Spatial thinking" is just one example of a non-verbal topic that needs more than studying words! I can't belive I didn't think of that. So here's a new test question - Chat GPT 3.5 gets it wrong: Imagine that I walk 10 feet straight out of my front door. Then I turn 90 degrees to the right and walk 5 feet. Then I turn left 90 degrees and walk 10 feet. Then I turn left 90 degrees and walk 5 feet. How far will I be from the front door?

  • @finalfan321
    @finalfan321 Před 21 dnem

    where can we get a background like that x)

  • @wtflolomg
    @wtflolomg Před 22 dny

    7:51... my bad eye starts twitching.... there are two "pieces" to place in that grid. The long piece would go across the top, not down the middle, blocking the other piece.

  • @ai-bokki
    @ai-bokki Před 22 dny

    Could you give a link to your background? it is pretty awesome

  • @orkutmuratyilmaz
    @orkutmuratyilmaz Před 22 dny

    my mind is glowing rn:)

  • @Not_AI_Kyle
    @Not_AI_Kyle Před 22 dny +1

    Missed opportunity to use a paperclip

  • @LauraMedinaGuzman
    @LauraMedinaGuzman Před 22 dny

    Full tutorial yes!

  • @webdancer
    @webdancer Před 22 dny

    Are there implementations for Mac and Linux?

  • @friendlybetty
    @friendlybetty Před 22 dny

    Very cool

  • @KayakingVince
    @KayakingVince Před 22 dny +1

    14:40 What's that awesome background please?

  • @saifcommunication78
    @saifcommunication78 Před 20 dny

    full tutorial for pywinassistant will be a greet video

  • @pavellegkodymov4295
    @pavellegkodymov4295 Před 22 dny +2

    Please make a full tutorial of PyOne assistant. Pretty interesting and relevant