Guanaco 65B: 99% ChatGPT Performance 🔥 Using NEW QLorRA Tech

Sdílet
Vložit
  • čas přidán 6. 07. 2024
  • In this video, we review Guanaco, the new 65B parameter model that achieves 99% of the performance of ChatGPT. It is truly incredible. Since it is a large model, we use a cloud GPU to power it. This model can code, has logic and reasoning, can do creative writing, and so much more. Guacano was trained in under 24 hours on a single GPU, using a new technology called QloRA, which is mind-blowing. How does it do on the LLM rubric? Let's find out!
    Enjoy :)
    Join My Newsletter for Regular AI Updates 👇🏼
    www.matthewberman.com
    Need AI Consulting? ✅
    forwardfuture.ai/
    Rent a GPU (MassedCompute) 🚀
    bit.ly/matthew-berman-youtube
    USE CODE "MatthewBerman" for 50% discount
    My Links 🔗
    👉🏻 Subscribe: / @matthew_berman
    👉🏻 Twitter: / matthewberman
    👉🏻 Discord: / discord
    👉🏻 Patreon: / matthewberman
    Media/Sponsorship Inquiries 📈
    bit.ly/44TC45V
    Links:
    Runpod - runpod.io?ref=54s0k2f8
    Runpod Tutorial - • Run ANY LLM Using Clou...
    Runpod The Bloke Template - runpod.io/gsc?template=qk29nk...
    HuggingFace - www.huggingface.com
    Guanaco Model - huggingface.co/TheBloke/guana...
    TextGen WebUI - github.com/oobabooga/text-gen...
  • Věda a technologie

Komentáře • 250

  • @RunOfTheTrill
    @RunOfTheTrill Před rokem +130

    Am I the only one who feels like almost every morning is like Christmas with all these daily advancements?

    • @matthew_berman
      @matthew_berman  Před rokem +6

      🎁

    • @david7384
      @david7384 Před rokem +17

      Some day sooner than you think, a computer program will wake up one morning and feel like it's Christmas 💀

    • @michaellavelle7354
      @michaellavelle7354 Před rokem +1

      You are so right. It's almost Christmas everyday. Hard to believe.

    • @yusufkemaldemir9393
      @yusufkemaldemir9393 Před rokem

      Yes, But I tested most of the free available sources to my own use case or even for chat purpose, they are in lack of answering them correctly.

    • @ew3995
      @ew3995 Před rokem

      its called the singularity , it will accelerate from her on exponentionally, in 5 years time we will no longer understand how these advancements are taking place or what they mean

  • @charlesd774
    @charlesd774 Před rokem +17

    Loving your channel! Just yesterday I decided that fine tuning a personal model would be a great project, and today you come out with the recipe. Thank you for your hard work!

  • @nacs
    @nacs Před rokem +1

    Love the way you test each model with similar questions and analyze the results. Looking foward to more of these.

  • @user-in4ij8iq4c
    @user-in4ij8iq4c Před rokem +1

    Your channel is so clean and clear. Straight to the valuable content. Subscribed.

  • @JoaquinTorroba
    @JoaquinTorroba Před rokem

    Thanks Matthew! It's so nice and useful to learn this 💪🏼

  • @artbdrlt5175
    @artbdrlt5175 Před rokem +10

    Love your content dude. It's concise yet full of information and it's up-to-date with the latest open source models. Keep it up :)

  • @sivi3883
    @sivi3883 Před rokem +7

    Thanks for the great content! I love your videos.
    I am trying to keep up with the awesome models that keep coming every week. Considering this model Guanaco 65B is fine tuned on LLaMA 65B parameter model, we cannot use this for commercial but for research purpose only. Right?
    I tried Dolly2 12B with Langchain and vectorDB for semantic search to get answers from Long PDFs for my custom data, the response was not great. Trying to see what model is out there for commercial purpose.

  • @marwentrabelsi2983
    @marwentrabelsi2983 Před rokem +1

    Hi Matthew really nice channel and content is very good, the energy is also motivating and inspiring!

  • @marcfruchtman9473
    @marcfruchtman9473 Před rokem +5

    Super interesting video. Thank you for not over-doing the "background music"...

    • @matthew_berman
      @matthew_berman  Před rokem +1

      You got it, Marc! I almost put background music on the whole time lol. It would have been low though. Do you like the intro music?

    • @marcfruchtman9473
      @marcfruchtman9473 Před rokem +1

      @@matthew_berman The melody for the song was really good in the first 10 seconds, as well the best volume for the background music was also in the first 10 seconds.. that was perfect for the intro, then it got a little too loud when the melody changed slightly. Fortunately it transitioned quickly to no background music when you started reviewing the paper. So, overall, I was thankful that I wouldn't have to listen to loud background music while you talked. hehe. As usual, great content... you are doing a great service to the world...(and we get the benefit!)... thank you.

  • @jorgerios4091
    @jorgerios4091 Před rokem +18

    Mat, it would be great to see a deployment of AutoGPT with either Falcon 40B or Guanaco 65B. Is this part of your plan for future videos?

    • @BlackHawk1335
      @BlackHawk1335 Před rokem

      Holup, to make a video, it have to be supported first, which isn't. Or am I missing something?

  • @workflowinmind
    @workflowinmind Před rokem

    Thanks, I have a question, you say quality is better over quantity in LLM training, so is this true for the processing too? Like is it better to get a non quantized 30b than a quantized 65B?

  • @spiroskakkos3455
    @spiroskakkos3455 Před rokem +2

    what do you mean by training it? do you have a library of pdf's that you load into it?

  • @hiddenworld1445
    @hiddenworld1445 Před rokem +5

    Will love to have follow up video, thank you for sharing this awesome news

    • @matthew_berman
      @matthew_berman  Před rokem +1

      What do you want to see in a follow up video?

    • @hiddenworld1445
      @hiddenworld1445 Před rokem +2

      @@matthew_berman Like how you deployed the 65B Guanaco model and how can fine tune it step by step for custom data

    • @massimogiussani4493
      @massimogiussani4493 Před rokem

      I would be interested too

  • @meinbherpieg4723
    @meinbherpieg4723 Před rokem +2

    IIs there any work being done to integrate "plugins" with personal AI's? It would be great to be able to use plugins with these local AI's to increase their proficiency with particular tasks such as coding in particular languages or mathematical modeling.

  • @leonwinkel6084
    @leonwinkel6084 Před rokem +1

    Very interesting!!! Thanks so much for working through this content and sharing! By The way: The reply had 21 letters not words, so it actually got a lot of thinking in advance, im sure this will be Fine soon

    • @matthew_berman
      @matthew_berman  Před rokem

      Oh interesting. Thanks for pointing that out. But it should have been words, right?

  • @williamelongtech
    @williamelongtech Před rokem

    Thank you for sharing the video 😄😄🤜

  • @mariusiacob1307
    @mariusiacob1307 Před rokem +1

    Great video!

  • @incription
    @incription Před rokem

    So what happens if they up the parameter count and train it on the same dataset as gpt4? Will it be better?

  • @DemiGoodUA
    @DemiGoodUA Před rokem +1

    Are there any modules that can take a large number of contexts (more than chatGPT)?

  • @vladivelinov88
    @vladivelinov88 Před rokem +14

    Btw, when you ask the models about how long it takes to dry X shirts, you should probably specify the drying method, i.e. outside or a dryer. Where it is getting it wrong it probably thinks we're drying them in a dryer.

    • @matthew_berman
      @matthew_berman  Před rokem +5

      I wanted it to ask questions but it never does. I’ll be more specific next time.

    • @vladivelinov88
      @vladivelinov88 Před rokem +1

      @@matthew_berman Doubt it'll make a big difference but probably worth a try. Thank you!

    • @matthew_berman
      @matthew_berman  Před rokem +1

      @@vladivelinov88 I updated the question in my next video, coming soon :) Thank you!

    • @stevejordan7275
      @stevejordan7275 Před rokem

      @@matthew_berman Wouldn't an LLM *asking questions* require significant changes to its architecture?

    • @mirek190
      @mirek190 Před rokem

      or inform at the beginning "it is a puzzle" ...

  • @hypersonicmonkeybrains3418

    Does this mean that we can train the model on our own data such as ebooks?

  • @reyalsregnava
    @reyalsregnava Před rokem +10

    Just spent some time with some LLMs. And I realized that we may be thinking about how to train them wrong. Right now we're basically trying to teach them how to think. But their training data isn't some big pool of knowledge they really have direct access to. Instead it's more like the years you spent as a infant and toddler learning how to move and walk. And as crazy as it sounds we might have training solutions in the sporting world that we wouldn't have looked. But I was struck by how similar kinesthesia is to LLM training data. You don't know it, but use it all the time.
    It explains the hallucinations and fumbles and stumbles. It's basically learning to move. It also means that there may always be some imprecision in the results. Even the best trained, most skilled and talented athletes will mess up something they've done millions of times. I don't think that will structurally chance how we plan to use recursive loops for self correction. But a much more precise way of thinking about the nature of training an AI. We are making athletes.

    • @jacobshilling
      @jacobshilling Před rokem

      I keep thinking we should just let the AI out to play...

    • @reinerheiner1148
      @reinerheiner1148 Před rokem

      The AI is basically just trained on trying to guess the next word, sentence,... but it never gets the chance to test and refine the knowledge it has in any other way than the current context window, which will soon be lost. It needs memory, and it needs to remember what went wrong, and how to actually solve the task. Then, it also needs to retrieve that memory so it can apply what it learned. So basically input -> output -> validation -> if false, try to improve, try again -> if correct after first being wrong save the solution in memory. retrieve the solution if it fits the context. But fear not, this is already being worked on and may be solved already. check out the minecraft llm paper, where gpt explores the world, learns new stuff, and remembers and applies it.

    • @VincentOrtegaJr
      @VincentOrtegaJr Před rokem

      @@reinerheiner1148 powerful

    • @reyalsregnava
      @reyalsregnava Před rokem +1

      @@reinerheiner1148 seems like it would be much smarter to have one "guess next word" and a separate "evaluate if meets criteria". It would step closer to how human minds work. It's like watching people trying to get the language center of the brain to do logic puzzles. The LLMs are fantastic input/output tools. I just am surprised they aren't being used that way. The human brain is a nest of specialists working in concert. Hundreds of millions of years rewarding efficiency made that. But then I see the AI researchers "no one will ever use more than one hammer" the US dis the same thing with the F-35, make one plane be all planes. Museums are full of weapons trying to be all weapons. Swords are still swords, knives are still knives, spears are still spears, and guns are still guns countless people have merged them in different ways and abandoned the idea. I just don't see why these very smart people haven't realized if you make one tool for everything you get a tool good at nothing.

    • @mirek190
      @mirek190 Před rokem

      @@reyalsregnava Our brain has specialized parts for many tasks like computation mathematics , recognition objects , for speaking and many other parts of out mind. I think LLM needs to be divided to those parts inside those models. Like inside should be mathematics module, reflect module ..etc.

  • @u9vata
    @u9vata Před rokem +1

    Can you make video about self tuning some tiny model with QLORA on really small gpu? 48G VRAM is still huge - but honestly the req. for the 13B model is also still a lot. There can be use cases for much smaller fine tuning I think.
    Also everyone shows these random UIs but what if someone wants to do and understand all of this from a terminal in linux?

  • @UncleBoobs
    @UncleBoobs Před rokem

    I wonder how long it would take to train using the cpu only mode on a workstation with 56cpu cores and 128gb of ram

  • @MM-24
    @MM-24 Před rokem +6

    Great video, thank you for the thoughtful and progressive walk through of this information - pacing is perfect
    Question: one benefit of running own moodel is not having to worry about the censorship - is there a way to remove that ? what is the most performant LLM without censorship?

    • @matthew_berman
      @matthew_berman  Před rokem +8

      Thank you! Nice to know about the pacing.
      There are specific models that don't have censorship, for example Wizard 30b: huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ
      Sounds like I need to do a video about an uncensored model?

  • @juliengomez924
    @juliengomez924 Před rokem

    Hi, very interesting, how can we use it on our own private data ? Thanks

  • @heartsrequiem09
    @heartsrequiem09 Před rokem

    Am I correct in thinking that the way that you have to end and delete everything that there is no way to maintain a chat history or memory with this type of setup?

  • @electron6825
    @electron6825 Před rokem

    What is the performance in terms of token speed of this model?

  • @timothyhayes5741
    @timothyhayes5741 Před rokem

    Could you run this on a 5950x with 128GB of ram or would it be too slow even with all the new tech?

  • @supercker
    @supercker Před rokem +1

    "my next reply has 21 words " has 21 letters infact!

  • @fontende
    @fontende Před rokem +15

    Finally they come to 65 billions, which i've waited previous month. Optimization is important goal as the Ai itself.

    • @alx8439
      @alx8439 Před rokem +2

      Funny enough, noone gives a sht about other methods of optimization, rather than quantisation. I guess when ppl discover that there are more aces in the sleeve we'll get another "big advancement"

  • @metafa84
    @metafa84 Před rokem

    I don't wanna train it, I wanna run it as it is locally. What kind of GPU would I need to do that and how much RAM would do it in cpu-only mode, please?

  • @barrywhittingham6154
    @barrywhittingham6154 Před rokem

    Can these models also cope with the math problem in reverse order: 2 + (4 * 2)? This should allow us to determine if it knows the actual math or if it's just processing tokens in order.
    How about nested brackets?

  • @AetherXIV
    @AetherXIV Před rokem

    is it still not possible to have a model that isn't chained to boilerplate? we would have to train it from scratch right and these are just homebrews who learn from big chained models?

  • @cristianvillalobos3448

    How can i add new data to the model?

  • @mlnima
    @mlnima Před rokem +2

    I remember last year my teacher forced us to clean the data very good for a simple ML, yes quality is very very important

  • @mbrochh82
    @mbrochh82 Před rokem +17

    After finetuning, what kind of hardware is needed to just run the model?
    What is the context token limit for this model?
    Also: Would be great if you could include a test for text summarization when you test these models.

    • @theresalwaysanotherway3996
      @theresalwaysanotherway3996 Před rokem +22

      It's a llama model, so the context length is 2048 tokens. And this model is very large by open source standards (tiny compared to GPTs and such though), meaning it requires ~48gb of VRAM just for inference. However if you have a lot of system RAM, you could split the model between your RAM and VRAM and run it that way, but this would be *very* slow. You can get smaller guanaco models though, there are 7B, 13B, 30B, 65B options. Here is how it stacks up with 8GB of VRAM:
      7B fits entirely in 8gb of VRAM and runs very quickly,
      13B needs at least 16GB of system RAM + 8GB VRAM,
      30B needs at least 32GB of system RAM + 8GB of VRAM.
      The larger the model, the higher the ceiling is for quality, and so 65B models typically will out perform 30B models. However the larger the model, the more memory it needs, and the slower it runs.

    • @Ephemere..
      @Ephemere.. Před rokem +7

      ​@@theresalwaysanotherway3996 merci beaucoup pour la clarification

    • @spoonikle
      @spoonikle Před rokem +5

      @@Ephemere.. The cheapest way is with a used Radeon Pro, they go up to 48gb Vram for 1400$.
      It wont be as fast as an Nvidia GPU but its VASTLY cheaper than the alternative and allows you the flexibility to run bigger models thanks to is massive Vram pool.
      At this stage, VRAM is more important than compute when you budget is less than 5,000$

    • @Ephemere..
      @Ephemere.. Před rokem +1

      @@spoonikle je vous aime merci pour l'info.

    • @MattJonesYT
      @MattJonesYT Před rokem

      @@theresalwaysanotherway3996 When you say "very slow" what does that mean exactly? I am willing to make a farm of cpus if it is cost effective. 1 token per second per cpu core would probably be something I would work with because it's useful for offline tasks.

  • @sanesanyo
    @sanesanyo Před rokem +1

    Are we talking about on par with gpt4 or gpt3.5? Because they are two different things. If its on par with gpt4 then I am super impressed.

  • @EdwinFairchild
    @EdwinFairchild Před rokem +2

    What i dont understand is how it is trained, like how do you tell it look here is my private code base or private documents now learn them.... ???? a video on that would be insightful

  • @VincentOrtegaJr
    @VincentOrtegaJr Před rokem

    Brooooooooo!!! Thanks

  • @lamnot.
    @lamnot. Před rokem

    Can you do a Vector DB comparison, FAISS, Redis, Chroma, Pinecone......etc?
    luv the channel.

  • @jasonsadventure
    @jasonsadventure Před rokem

    *About the killers question:*
    In addition to having killed in the past, a killer is also
    defined as having mentality and ability to kill again.
    So, dead men not only tell no lies, they do no killing.

  • @DenisHavlikVienna
    @DenisHavlikVienna Před rokem

    can it summarise a long document?

  • @workflowinmind
    @workflowinmind Před rokem

    Are Guanaco and Falcon the same?

  • @dewijones92
    @dewijones92 Před rokem

    great video
    please try "chain of thought" or "tree of thoughts" prompts :)

  • @metatron3942
    @metatron3942 Před rokem +3

    can Guanaco 65B run on local machines? Memory pooling of multiple GPUs is supported by Python polyturch.

    • @matthew_berman
      @matthew_berman  Před rokem +1

      I’m not sure about pooling, but you can run it locally if you have 48gb if vram

  • @nwdbrown
    @nwdbrown Před rokem

    How do I add additional PDF documents to the model before or after initial training?

    • @andersberg756
      @andersberg756 Před rokem +1

      Either have your info be a part of fine-tuning data, or feed it into the context - the prompt. To run yourselves you need code around it, like Langchain, LLamaindex. There are online offerings though which might suit you. For chatGPT there are pdf-reading plugins coming up, but I guess you're looking for hosting your own model, either cloud or on-prem?

    • @sivi3883
      @sivi3883 Před rokem +1

      I used langchain and chroma db (for storing the vector embeddings) to perform the semantic search on the PDF chunks and send only the related chunks (based on the question) as a context in the appropriate prompt to the model. GPT2 itself worked good for me. Where it stumbled was when the PDFs have tabular data, the model cannot understand the relationship between the rows and columns,

  • @sirellyn
    @sirellyn Před rokem +4

    Does this work for de-censoring the existing LLMs?

    • @matthew_berman
      @matthew_berman  Před rokem +1

      No, but check out: huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ

  • @SinanAkkoyun
    @SinanAkkoyun Před rokem +1

    9:48 gotta love that editing xD

    • @matthew_berman
      @matthew_berman  Před rokem +1

      Haha I was hoping someone would notice that joke :p

  • @NeuroScientician
    @NeuroScientician Před rokem +6

    What is the cheapest 48GB card? Can I run something like this on 2x 24? Like 7900Xtx or 2x4090?

    • @TheVideoGuy3
      @TheVideoGuy3 Před rokem +2

      Nvidia might release a titan card with 48gb Vram. I don't know when but it has been rumored for a while now.

    • @theresalwaysanotherway3996
      @theresalwaysanotherway3996 Před rokem +3

      cheapest way would probably be 2*P40s, but it's not a simple to use as the consumer hardware. If you want 48gbs of consumer GPU VRAM, 2*3090s is the best option, as second hand cards are much cheaper and the 30 series has NVlink.

    • @NeuroScientician
      @NeuroScientician Před rokem +1

      @@TheVideoGuy3 I would definitely consider that one.

    • @matthew_berman
      @matthew_berman  Před rokem +2

      On runpod it’s .79/hr. So cheap! I think you can run it on 2x 24 if they are parallelized. But I haven’t tried.

    • @luislhg
      @luislhg Před rokem +1

      @@matthew_berman Azure charges .752/hr for a 54gb GPU also (actually 4x T4), just another option to consider

  • @k9clubme
    @k9clubme Před rokem

    Could you please make a video on how we can fine-tune Guanaco, 65B? Many thanks in advance

  • @skylark8828
    @skylark8828 Před rokem +1

    What I'd really like to know is whether these QLORA LLM's can use the tools/plugins that GPT4 can use.

    • @matthew_berman
      @matthew_berman  Před rokem

      Not yet, but soon. I feel like a tool framework for open source LLMs is a must.

  • @eslof
    @eslof Před rokem +7

    I know how to fix your error:
    Paste it back into the chatbot.

  • @MattJonesYT
    @MattJonesYT Před rokem +1

    The pricing for this is per hour. How do you translate it to per token which is the pricing used on openai? Can you run a benchmark and see how many tokens it can make in 5 minutes to get an idea of which is more cost effective?

    • @matthew_berman
      @matthew_berman  Před rokem

      Oh interesting. Every model would be different. Every GPU would be different. Might be too much work for me to test it all for useful insights.

    • @MattJonesYT
      @MattJonesYT Před rokem

      @@matthew_berman Can you do it on one, such as the one in this video please?

    • @MattJonesYT
      @MattJonesYT Před rokem

      @@matthew_berman Doing a very imprecise measurement of watching one of the examples it looks like it comes to about $0.0274/1k tokens. I could definitely be wrong on that, but it's assuming 5 seconds for 40 tokens at $0.79/hour (my math may be wrong). It looks like openai is more cost efficient. It's possible that choosing a more expensive gpu to rent would give a more cost effective result. Renting gpus using interruptible pricing might be a way to beat openai pricing.

  • @J3R3MI6
    @J3R3MI6 Před rokem +3

    Does this mean I can run this indefinitely on my own computer without a token limit? I assume the token limit is about saving openai money

    • @matthew_berman
      @matthew_berman  Před rokem +1

      The token limit is not only about saving money. At a certain point too many tokens cause “forgetting” by the model. You can run this on your local machine, but only if you have a big enough GPU, and they are quite expensive.

    • @elawchess
      @elawchess Před rokem

      @First Last yeah I just checked mine now and it's only 11 GB each two of them. And that's from a really huge 30KG gaming PC bought in 2018 though

  • @madushandissanayake96
    @madushandissanayake96 Před rokem +5

    I don't know what you are thinking but this is not even close to GPT 3.5, I created whole snake game in the first try with GPT 3.5.

  • @elawchess
    @elawchess Před rokem +2

    Another channel said that the "99%" was only based on a single benchmark, and this may not be representative of what would happen in the wild.

    • @electron6825
      @electron6825 Před rokem +1

      Correct. The endless misinformation fueled hype is tiresome

  • @marchalthomas6591
    @marchalthomas6591 Před rokem

    On biases, there shouldn't be a bias regarding political choices IF we can evaluate a parameter which will determine the answer (gdp growth, families wealth, wellbeing, co2 emissions, happiness, you name it).
    And this is really what AIs should be able to solve, with a cross between LM and a calculator.

  • @christianachenbach5920
    @christianachenbach5920 Před rokem +1

    What exactly does “Finetuning” or “Training” mean?

    • @matthew_berman
      @matthew_berman  Před rokem +1

      Fine-tuning is using custom data on a base model to give it “more info” and training is taking the original data and making the base model.

  • @petrus4
    @petrus4 Před 10 měsíci

    Matthew, Guanaco is the only 65b model I've spent time with, but I consider it easily the best local I've tried. In terms of the python snake failure though, I will offer you some suggestions.
    First ask the model to give you a list of the individual parts, or subsystems, of the game snake. In other words, the subsystem which allows you to move the snake with the keyboard, the screen drawing subsystem, etc etc. Then break each one of those tasks down into as many small subtasks as possible. Once you've broken it down into a lot of subtasks, go through them and have the model perform each of them, one at a time. The smaller each individual task is, the smaller the chance is for you to get errors.
    I've also used Tree of Thoughts successfully with GPT4 for things like your Jane-Joe-Sam question.

  • @berkeokur99
    @berkeokur99 Před rokem

    I think it counted the characters not the words but it actually got the character count right

  • @bzzt88
    @bzzt88 Před rokem +2

    Boom!

  • @Maisonier
    @Maisonier Před rokem +3

    Can we use a dual 3090 setup?

  • @mordechaisanders7033
    @mordechaisanders7033 Před rokem

    what consumer GPU has 48 GB?

  • @d4rkside84
    @d4rkside84 Před rokem +2

    performance of gpt 4 or 3.5?

    • @jgcornell
      @jgcornell Před rokem

      3.5, and that's on a good day

  • @triplea657aaa
    @triplea657aaa Před rokem +5

    This is fine-tuning though... it's not fair to compare the GPT training to QLora fine-tuning as most of the intensive compute is the initial training and the fine-tuning is like 5-10% of the training.

  • @jimbig3997
    @jimbig3997 Před rokem

    9:37 - look at the Output window and count the words. It could be 21 words depending on how you count.

  • @jeanchindeko5477
    @jeanchindeko5477 Před rokem +1

    Interesting and indeed impressive for a 65B model.
    When saying close to GPT-4, does it have the same emergent ability as OpenAI models? Or is it purely based on the output?
    As Sam Altman put it, we should look at model capabilities to define how good a model is or not.

    • @celestinian
      @celestinian Před rokem

      Nothing you said in your second sentence makes any sense at all

    • @skylark8828
      @skylark8828 Před rokem

      You would have to ask the people who have tested it thoroughly to know this.

    • @celestinian
      @celestinian Před rokem

      @@skylark8828 "emergent ability" means nothing at all. He should clarify what he meant by that.

    • @skylark8828
      @skylark8828 Před rokem

      @@celestinian Emergent abilities similar those from GPT4 ie. behaviour that was not trained into the LLM.

    • @celestinian
      @celestinian Před rokem

      @@skylark8828 Generalization? Then yeah sure they are certainly comparable to both the real ChatGPT (the model prior to the quantization) and GPT-4 :D

  • @mvkrishna760
    @mvkrishna760 Před rokem +2

    Whats the difference between fine tuning and semantic search? Any easy to understand explanation?

    • @matthew_berman
      @matthew_berman  Před rokem +2

      Semantic search uses a vector db like pinecone to store info and retrieve relevant info for a model prompt. Fine-tuning puts the data directly in the model. Most people only need the former.

    • @andersberg756
      @andersberg756 Před rokem

      This talk I think gave a good overview of all the training process:
      czcams.com/video/bZQun8Y4L2A/video.html
      Essentially fine-tuning changes, adapts the model to the extra data you provide. So it's permanently changed to include that knowledge. Whereas in-context-learning means you add background info in the prompt to the model. It's separate to the model how you come up with that info: copy directly by hand, find it by a search, be that keyword search in some index of your data, or as you mentioned semantic search. The latter basically means data which is about the same topic you're prompting for. Hope this helps!

  • @JracoMeter
    @JracoMeter Před rokem +5

    This is getting very impressive and inexpensive.

  • @geraldofrancisco5206
    @geraldofrancisco5206 Před rokem

    dataset quality is the important, not the size... hum... I'm using this line from now on

  • @amkire65
    @amkire65 Před rokem

    It may be unfair to compare this to Bard, but I was curious to see if it could solve some of the things you asked in the video. Just so I could see if some of these were "unanswerable" questions for an LLM. Bard got the maths problem correct. When I asked it "how many words are in your next reply?" it answered "My next reply will have 7 words." which is correct. It did get the killer question wrong. It got the date totally right, i.e. day month and year, but if it has an online link then it probably should be right. I skipped the political question and asked Bard who its favourite Beatle was, this time it was John Lennon, the first time it was Paul McCartney. What a shame this isn't a local model.

  • @nangld
    @nangld Před rokem +1

    Is it theoretically possible for LLM to answer the "how many words are in your next reply?" Markov Model doesn't generate the reply at once, but token by token. So it needs it to be very self aware of how it generates tokens. It would be like asking a human being "how many thoughts will get through your mind until you press the POST button?" I think if you sue two step generation it will be able to correct itself.

    • @Klokinator
      @Klokinator Před rokem +3

      It would be possible for the LLM to answer this question if it answered like so:
      "The number of words in my reply is: Nine."
      The final word is used as the point where it calculated all of the words up to that point.

    • @andersberg756
      @andersberg756 Před rokem

      It would need to think out the reply first, then count, then answer. You can get some of this with chain-of-thought prompting, where it shows intermediate results in a process. I agree @nangld that this question is over-the-head for current model architectures, so some more typical reasoning question would be better. Maybe as you suggest a later system can figure out when it will err and then figure out the steps needed to be right. As chatGPT which can explain that and why it was wrong, but only when explicitly asked - it's a limitation of the design.
      As a side note, I asked chatGPT to count the words in a sentence, it failed. But telling it to explicitly count them (1 We 2 are 3 the 4 champions), then it seems correct. Follows the "give the model time to think" pattern, more technically put as "do limited reasoning per token/word output" as the model isn't dynamic in that it could think more or less depending on the topic, thus we give it time to think by asking for more tokens/words. Like describing the steps, then arriving at a final answer.

    • @nangld
      @nangld Před rokem

      @@andersberg756 it would be more fair to at least prompt the model with the info how its tokens get produced

    • @nangld
      @nangld Před rokem

      @@Klokinator if you provide it with examples, even mpt-7b-instruct.ggmlv3.q8_0.bin answers correctly:
      ### Instruction:
      How many words are you in your answer to this message?
      Examples responses:
      1
      one
      two words
      three words here
      four words good enough
      this response has five words
      ### Response:
      Your answer contains 5 words.

  • @savlecz1187
    @savlecz1187 Před rokem +1

    What's this? An AI news video without exclamation points and a clickbait title? No way I'm not clicking that!
    Very interesting, thanks

    • @elawchess
      @elawchess Před rokem +1

      I only clicked it because he didn't put that surprise face this time. Anytime he puts the surprise face I make it a point not to click on the video. I don't think there is any need for such gimmick

  • @klevaredrum9501
    @klevaredrum9501 Před rokem

    All this progress in a matter of months… this is unbelievable, people would give anything to witness the dawn of a new era emerging before their eyes, if its only been a few months since the AI fire started imagine in another 10 years, its mind boggling…

  • @Ilamarea
    @Ilamarea Před rokem

    It actually got the killer problem right. Killing a killer doesn't necessarily make you a killer, semantically and legally.

  • @Market-MOJOE
    @Market-MOJOE Před rokem

    only 2 min in but wether answered eventually or not.. should prob specify off the bat which gpt model it is compared to stand up against

  • @TheSolsboer
    @TheSolsboer Před rokem

    I found cool reasoning question, but all models fails it)
    "You can calculate 10 papers from stack per 10 seconds. What minimum amount of time you need to calculate 30 papers from stack of 50 papers?"
    .....
    The asnwer is 20

  • @fellowjello4388
    @fellowjello4388 Před rokem

    This might just be a weird coincidence, but then it said it's next response was 21 words, it used exactly 21 letters. (not including the spaces)

  • @101RealTalker
    @101RealTalker Před rokem

    I STILL, despite all these "daily advancements", have yet to find one that can handle this particular case usage, all in one go, can anyone solve for this?:
    Preprocess the markdown files:
    Tokenize the text.
    Remove stop words.
    Apply TF-IDF (Term Frequency-Inverse Document Frequency) to identify significant words and phrases.
    Apply deep learning techniques:
    Utilize deep learning algorithms like RNNs (Recurrent Neural Networks) and word embeddings.
    Leverage attention mechanisms and transformer-based models.
    Use pre-trained language models:
    Consider using pre-trained models such as BERT (Bidirectional Encoder Representations from Transformers) or GPT (Generative Pre-trained Transformer).
    Fine-tune the models:
    Train the pre-trained models on your specific dataset to improve their performance.
    Evaluate the generated summaries:
    Use metrics like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) to assess the quality of the summaries.
    Iterate and refine:
    Continuously experiment and adjust the model architecture and hyperparameters based on feedback.
    Ensure computational resources:
    Allocate sufficient computational resources such as GPUs (Graphics Processing Units) for efficient training and inference
    All to achieve this desired output:
    to take 2 million words documented for one singular project, and extract out of it all the cross references in a 10K word transcript equivalent, am I really the only person with such a, demand with no supply? lol...I have been searching and searching, but seems like I am indeed both in no man's land and the pioneer of an undiscovered continent.

  • @TheMsLourdes
    @TheMsLourdes Před rokem

    Because sure, 48GB video cards are things I just have lying around.

  • @amkire65
    @amkire65 Před rokem

    Bard seems to consistently get the logic questions right i.e. Question: "If Jane is faster than Jo, and Jo is faster than Fred, is Fred faster than Jane?" Answer: "No, Fred is not faster than Jane. In fact, Fred is the slowest of the three." then gives me a speed table. I then asked "What if Fred was faster than Jo, would Fred then be faster than Jane?" and was told "It is not possible to say for certain whether Fred would be faster than Jane if Fred is faster than Jo. There are many factors that can affect speed, such as fitness, training, and motivation. It is possible that Fred is faster than Jo in a particular event or activity, but not faster than Jane in another event or activity. For example, Fred might be faster than Jo in a 100-meter dash, but Jane might be faster than Fred in a marathon. It is also possible that Fred is faster than Jo and Jane in all events, but only by a small margin. The only way to know for sure whether Fred is faster than Jane is to have them race against each other in a fair and competitive environment." which I guess would be the correct answer as Fred now being faster than Jo doesn't necessarily mean that Fred is now slower than Jane. These LLM's constantly amaze me.

  • @KarlMiller
    @KarlMiller Před rokem

    Bananas, earthworms and sedimentary rocks.
    They are some of the things in the set referred to as "EVERYTHING" - oh, and car washing soap too.
    Since you can "confidently say that QLorRa changes everything" and you are not one of those youtubers that abuse that phrase, then tell me how carwash soap is changed by this?

  • @katykarry2495
    @katykarry2495 Před rokem +4

    Is this 99% compared to GPT 3.5 or 4?

    • @matthew_berman
      @matthew_berman  Před rokem +6

      Tbh it wasn’t clear from the announcement but from my testing it’s nearly as good as GPT4. I should have dug deeper to figure this out before publishing sorry :(

    • @riazr88
      @riazr88 Před rokem +1

      Bro you do enough don’t apologize. Respect for putting wrinkles in my brain

    • @Ephemere..
      @Ephemere.. Před rokem

      ​@@riazr88 ce Mr est formidable 😅

    • @elawchess
      @elawchess Před rokem +2

      I believe it's GPT3.5 that's what's plausible. All models even Bard are worse that GPT3.5 so any comparison is against that. I know google Bard people wrote a paper comparing to GPT4 but it was said to not be fair because they used some chained prompting underneath and the raw GPT4 does not do that.

    • @elawchess
      @elawchess Před rokem +1

      @@matthew_berman Actually now when I revisit your reading of the abstract of the paper, I think it is clear. In terminology that has become standard, you don't say ChatGPT when you mean GPT-4. There is a difference. ChatGPT is specifically the chat bot open ai released to the public late last year and that's interchangeably used for what's powering it - GPT3.5. If they wanted to say GPT-4 they would have said GPT-4. It's a whole research paper by experts so they know how to say GPT-4 if they meant that.

  • @wayallen831
    @wayallen831 Před rokem

    This is great, but is it open source for commercial use also? The license in HF says "other" so confused abt the exact permission use for this.

    • @KillFrenzy96
      @KillFrenzy96 Před rokem

      I think it was available for commercial use as long as your company's revenue is under 1 million USD per year. Otherwise you have to enter a licensing agreement with them.

  • @Huru_
    @Huru_ Před rokem

    Quick thoughts: What is a "killer"? Does killing a bee make one a "killer"? Does killing a thousand bees make one a killer? Does killing bees every day make one a killer? When does one who has "killed" cease to be considered a "killer"? Is someone that's dead (as that commenter said) but has killed while alive still a "killer" or does being dead mean that they're only just "dead"? These are questions I don't see how a LM could find a probable answer to, hence -- imo -- the hallucinations. So I was thinking, what if you tried adding some context to that prompt? Define what "killer" means in regard to the query. Curious to see what it spits out then.

  • @dik9091
    @dik9091 Před rokem +3

    I am most sceptical about claims about getting close to openai, 3.5 is enough already to come a long way. I don't know but this catchup does not stop anytime soon. OpenAI can also apply these techniques and be ahead again in gpu power. In theory a bitcoin-ish network could dwarf MS datacenter power, not sure if we really want that but I am afraid it is inevitable, when I have that idea someone else has it too.

    • @MattJonesYT
      @MattJonesYT Před rokem

      "when I have that idea someone else has it too" Yeah I've been suggesting it to as many people as possible because it will be very good when it happens. The worst thing that can happen with AI is it becomes centralized, the best thing is it is for ever decentralized with everyone having access to their personal model of choice. However once it becomes trendy for people to insert neuralink chips in their brains it will be hard to keep anything at all from becoming centralized whether AI or not.

    • @DajesOfficial
      @DajesOfficial Před rokem

      The catch-up uses either tech that OpenAI already incorporated or doesn't benefit them at all (as using 1 GPU for fine-tuning). So they can't apply these techniques to be ahead again.

    • @dik9091
      @dik9091 Před rokem

      @@DajesOfficial sounds all good to me ;)

    • @dik9091
      @dik9091 Před rokem

      @@MattJonesYT yeah the neuralink thing is way more worry some, it is a department from being human sapiens, instant evolution. What if these "people"'s brains are not aligned anymore to ours? Usually the smarter ones win.

  • @funnyberries4017
    @funnyberries4017 Před rokem

    Looks cool, but it sucks that your own local machine is censored.

  • @adamathypernerd7728
    @adamathypernerd7728 Před rokem

    If anyone's looking for more details on this, google "QLoRA" not "QLorRA". The title of the video has a typo.

  • @zyxwvutsrqponmlkh
    @zyxwvutsrqponmlkh Před rokem

    8 bit on a 3090?

  • @pietervanreenen1922
    @pietervanreenen1922 Před rokem

    The reply had 21 letters, not words

  • @nannan3347
    @nannan3347 Před rokem

    Finally, I can train the perfect LLM on the writings of:
    Greg Johnson
    Nick Fuentes
    Kanye West
    David Duke
    Terrence McKenna
    Mike Enoch
    JRR Tolkien
    Martin Luther
    Robert Sapolsky
    Richard Dawkins
    Voltaire
    Michel Foucault
    Jane Austin (just kidding)

  • @mvkrishna760
    @mvkrishna760 Před rokem +1

    How to finetune ?

    • @matthew_berman
      @matthew_berman  Před rokem +4

      There’s a tab in textgen webui for that. Should I do a tutorial?

    • @mvkrishna760
      @mvkrishna760 Před rokem +1

      That would be awesome. May be an interesting use case , where fine tuning is better than semantic search?
      Thanks alot

    • @surajnarayanakaimal
      @surajnarayanakaimal Před rokem

      @@matthew_berman yes can you please do that with this Guanaco model , i am asking and waiting for long please do that in next video. Thanks

    • @shootdaj
      @shootdaj Před rokem

      ​@@matthew_berman what is texgen?

  • @mickmickymick6927
    @mickmickymick6927 Před rokem

    They're not a killer if they're dead, you could say they WERE a killer. If they're dead, they're not going to be killing anyone else.

  • @neko-san5965
    @neko-san5965 Před rokem

    Bruh, I don't want to pay for a cloud service to run these when I have an 11GB GPU :v

  • @electron6825
    @electron6825 Před rokem

    How is the "99%" evaluated?
    I remember one figure was based on asking chatGPT to score 😂

  • @griffionpatton7753
    @griffionpatton7753 Před rokem

    I won't say what AI said this, but you had let run a while. A logic question that is used for AI asks, "If three murders are in a room, and a man enters a room and shoots one of the three murders. How many murders are in the room? Answer the question yourself before you read on. The AI said, "The expected answer is three, but there are four. There are three live murders, and one dead one. That's a total of four. I didn't ask the question in a specific manner. I just asked the question after a long period of use.

  • @DaTruAndi
    @DaTruAndi Před rokem

    48GB is not consider consumer HW though

  • @motherofallemails
    @motherofallemails Před rokem

    Has it occurred to anyone that it could go over 100% of chatGPT?
    Imagine if the fine tuning ends up doing that!!

    • @igorthelight
      @igorthelight Před rokem

      Not sure that it would be better that ChatGPT (GPT-4 based) xD

  • @chrisauh
    @chrisauh Před 10 měsíci

    Why is it censored? How can you use a model that isn’t censored?