Insanely Fast LLAMA-3 on Groq Playground and API for FREE

Sdílet
Vložit
  • čas přidán 1. 06. 2024
  • Learn how to get started with LLAMA-3 on Groq API, the fastest inference speed that is currently available on the market on any API. Learn how to use the Groq API in your own applications.
    🦾 Discord: / discord
    ☕ Buy me a Coffee: ko-fi.com/promptengineering
    |🔴 Patreon: / promptengineering
    💼Consulting: calendly.com/engineerprompt/c...
    📧 Business Contact: engineerprompt@gmail.com
    Become Member: tinyurl.com/y5h28s6h
    💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
    Signup for Advanced RAG:
    tally.so/r/3y9bb0
    LINKS:
    Notebook: tinyurl.com/57yhf26h
    Groq API: groq.com/
    TIMESTAMPS:
    [00:00] Getting Started with Llama 3 on Grok Cloud
    [01:49] LLAMA-3 ON Playground
    [03:03] Integrating Llama 3 into Your Applications with Grok API
    [05:40] Advanced API Features: System Messages and Streaming
    All Interesting Videos:
    Everything LangChain: • LangChain
    Everything LLM: • Large Language Models
    Everything Midjourney: • MidJourney Tutorials
    AI Image Generation: • AI Image Generation Tu...
  • Věda a technologie

Komentáře • 43

  • @engineerprompt
    @engineerprompt  Před 2 dny

    If you are interested in learning more about how to build robust RAG applications, check out this course: prompt-s-site.thinkific.com/courses/rag

  • @3choff
    @3choff Před měsícem +2

    Oh my! And it even has function calling too. Looking forward to Whisper integration.

    • @engineerprompt
      @engineerprompt  Před měsícem +1

      Yeah, whisper will be awesome. Need to try their function calling with llama3

  • @hanlopi
    @hanlopi Před měsícem +1

    very nice explained

  • @starblaiz1986
    @starblaiz1986 Před měsícem +1

    Bro this is WILD! Just imagine combining this with agent frameworks like Crew AI! 😮

  • @NLPprompter
    @NLPprompter Před měsícem +5

    i watch this with 2x playback speed its generations speed become like a dream come true

  • @TzaraDuchamp
    @TzaraDuchamp Před měsícem +4

    Wow, that’s fast. Are you going to test this with function calling in an agentic workflow?

    • @TheReferrer72
      @TheReferrer72 Před měsícem

      I get 50 tokens per second for the 8b on a 3090 at home. Its a nice model.

  • @nac341
    @nac341 Před měsícem +18

    "we don't care about the responses, we only care about the speed". I can give you an even faster API that just returns random words :)

    • @Tofu3435
      @Tofu3435 Před měsícem

      Cool, a fast passphrase generator 😂

    • @unclecode
      @unclecode Před měsícem

      Actually author has a point. Picture a scenario with multiple agents working together on a super complex task. You might not even care about their responses or understand their complicated talk, but all you really want is to have 800 tokens per second to handle the task in just a few seconds. At that point, the final response is all that matters. Although I wish that random word generator API or "Infinite monkey theorem" was enough to solve world complex problems 😅

    • @RickySupriyadi
      @RickySupriyadi Před měsícem

      @@unclecode actually human do that. there was a time when our team as human work together, we must be work ridiculously fast then we all came up with stupid yet efficient way to communicate...

    • @syeshwanth6790
      @syeshwanth6790 Před měsícem +2

      He means he is not going to test the accuracy of the model in this video.
      He is demonstrating how fast the api is.
      There are other videos or articles where performance of these models have been evaluated.

    • @unclecode
      @unclecode Před měsícem

      @@RickySupriyadi haha u right, and It's not surprising that we're inclined to use our own human collaboration methods to design multi-agent systems. There's a desire to make AI resemble us.

  • @huyvo9105
    @huyvo9105 Před měsícem +1

    Sometimes it is limited, how to handle it?

  • @NicolasEmbleton
    @NicolasEmbleton Před měsícem +2

    Do we know how aggressively they quantize? I heard the quantization was pretty aggressive and as an outcome the models aren't "as good" as verbatim. If true, it's a reasonable tradeoff but we just need to know for sure so we can make informed decisions.

    • @Cingku
      @Cingku Před měsícem +3

      Yes I tested it for one of my complex calculation prompt and the one in the Groq (llama 70 billion) is really bad and answer wrongly always...but if I use the one in huggingchat, it will give perfect answer every time! So quantization really decrease the performance drastically and it doesn't matter if it fast when it gives the wrong answer.

    • @NicolasEmbleton
      @NicolasEmbleton Před měsícem +2

      @@Cingku I had fairly similar outcomes in my tests and stopped using Mistral / Mixtral back then. Maybe the free version target audience is just people testing and that would make sense. But it did not convince me to use the service. I'll give it another paid attempt see if it's any better.

  • @mirek190
    @mirek190 Před měsícem

    Why did you set for only 1024 tokens?

  • @MrN00N3_
    @MrN00N3_ Před měsícem

    Can you run Groq locally?

  • @zhonwarmon
    @zhonwarmon Před měsícem

    Cant wait for local models

    • @TheReferrer72
      @TheReferrer72 Před měsícem +3

      They have been around since Thursday.

    • @looseman
      @looseman Před měsícem +3

      70b is fine for local run.

  • @snehitvaddi
    @snehitvaddi Před měsícem

    Llama3 can generate images as well right? Can I use this API to generate images?
    If so, could please make a tutorial on that or atleast a short? (BTW, subscribed to see an update on that)

    • @engineerprompt
      @engineerprompt  Před měsícem +1

      there is another model on meta.ai which can generate images. Its not part of llama3. I am not sure if its available via api. Will check it out and update on the channel.

    • @snehitvaddi
      @snehitvaddi Před měsícem

      @@engineerprompt also, if you don't mind please leave an update as reply to this if you found any update on that

  • @noxplayer-rt9tj
    @noxplayer-rt9tj Před 8 dny

    How use Google Colab&Huggin Face to make Groq+Whisper converter ftom audio file to text with UI?

  • @CharlesOkwuagwu
    @CharlesOkwuagwu Před měsícem

    Please can you show us end to end fine-tuning llama3 on custom dataset

    • @engineerprompt
      @engineerprompt  Před měsícem +1

      Check the previous video on the channel. Will be making more on fine-tuning.

  • @unclecode
    @unclecode Před měsícem +1

    Do u agree, Groq feels way better when u set "stream=False" :)) When you understand "stream" was a way to hide a weakness.

    • @engineerprompt
      @engineerprompt  Před měsícem +1

      I totally agree. Streaming make it worse for Groq but others used it to show they are faster than they actually are :)

  • @Warung-AI-Channel
    @Warung-AI-Channel Před měsícem

    We just built Llama3 #RAG powered by groq and it's extremely fast 😮

  • @greendsnow
    @greendsnow Před měsícem +2

    Wait a second, that's extremely cheap

  • @nexuslux
    @nexuslux Před měsícem

    Notebook link doesn’t work

    • @engineerprompt
      @engineerprompt  Před měsícem

      Can you check again, seems to be working on my end.

  • @abdelhameedhamdy
    @abdelhameedhamdy Před měsícem

    I did not understand the difference between system and user roles !

    • @engineerprompt
      @engineerprompt  Před měsícem +1

      system role defines the behavior of the model. Think about that as a global instruction that will control the behavior model. "user" role is the actual input from the user. Hope that helps.

  • @namecUI
    @namecUI Před měsícem +1

    You said for free ?! How this is possible ?

    • @wwkk4964
      @wwkk4964 Před měsícem +2

      Groq has too many LPUs that's why

    • @InsightCrypto
      @InsightCrypto Před měsícem

      @@wwkk4964 its not free groq has clear pricing for models

  • @InsightCrypto
    @InsightCrypto Před měsícem +2

    so fucked up that you wrote free