Fine-Tune Llama3 using Synthetic Data

The future of AI agents is WebAssembly (get started now)

How AI 'Understands' Images (CLIP) - Computerphile

Fizi o Mně Řekl Tohle ve Videu

She blended SPAGHETTI @anasofiafehn

IShowSpeed Plays 'This or That'

Multi-Head vs Grouped Query Attention. Claude AI, Llama-3, Gemma are choosing speed over quality?

Chris Hay

zhlédnutí 1 031

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 26. 07. 2024
Multi-Head vs Grouped Query Attention. Are Claude, Llama-3, Gemma are choosing speed over quality?
frontier model providers such as anthropic claude 3.5 sonnet, and Google Gemini / Gemma 2B and Meta Llama-3 are trending towards using grouped query attention vs traditional multi-headed attention in transformer models as their attention mechansim. Interesting OpenAI with GPT-4o doesn't seem to be making this trade off.
Although this choice speeds up AI inference, it does impact content quality for output such such as summarization. in this video chris shows that you get better coherent output from models such as llama-2 or claude 3-opus over new models such as llama-3 or gemini or gemma. in the end, in certain scenarios such as summarization or generative content, gpt-4o still beats sonnet.
repo
github.com/chrishayuk/mha_gqa...
Věda a technologie

Komentáře • 16

@makepeace88 Před 25 dny ⁺²
I just attended detailed anatomy of LLM session.. and it’s just wow! Nobody’s telling these details. Thanks very much Chris ❤
@chrishayuk Před 25 dny ⁺¹
Glad it was useful, I skipped a lot of details, as I wanted to keep the focus on MHA vs GQA. I will probs do some other videos on some of the other details
@everyhandletaken Před 25 dny ⁺¹
Interesting!
Claude 3.5 Sonnet is definitely great for code, much better than cgpt 4-o & has really helped me solve things that are well beyond my brain capacity in the last few days.
@chrishayuk Před 25 dny
totally agree, much better for code than gpt-4o
@danielhenderson7050 Před 25 dny ⁺²
This was very interesting
@chrishayuk Před 25 dny
Glad you enjoyed, definitely a fun rabbit hole
@trsd8640 Před 26 dny ⁺¹
Great video! I don’t understand it fully, had to watch it again, but I‘m getting a idea of what is happening! Thank you!
@chrishayuk Před 26 dny ⁺²
it was quite a tough one to record, as i'm trying to avoid explaining the entire transformers architecture and attention fully (i'll do that in another video), but do enough to just show how this architectural change is affecting models output. it was a weird balance and apologies that i never explained it enough
@Leo-ph7ow Před 26 dny ⁺²
Excelent content! Thanks!
@chrishayuk Před 26 dny
Glad you liked it!
@seanknowles9985 Před 26 dny
Intel agencies are having their fill first. Its obviously being slowed down so three letter agencies can get ahead of this.
@chrishayuk Před 26 dny
lol, i'm sure 3 letter agencies are having their say but i suspect it's not on MHA vs GQA but would love to hear that conversation if they were
@user-rs4sg2tz6k Před 10 dny
I believe 4o's judges only 90%
@chrishayuk Před 10 dny
interesting, where did you get that info from?

Další v pořadí

Automatické přehrávání

Fine-Tune Llama3 using Synthetic Data

Fine-Tune Llama3 using Synthetic Data

The future of AI agents is WebAssembly (get started now)

The future of AI agents is WebAssembly (get started now)

How AI 'Understands' Images (CLIP) - Computerphile

How AI 'Understands' Images (CLIP) - Computerphile

Fizi o Mně Řekl Tohle ve Videu

Fizi o Mně Řekl Tohle ve Videu

She blended SPAGHETTI @anasofiafehn

She blended SPAGHETTI @anasofiafehn

IShowSpeed Plays 'This or That'

IShowSpeed Plays 'This or That'

Wait for the BOWLING BALL! 👀

Wait for the BOWLING BALL! 👀

Elysia - TypeScript framework that is (almost) as fast as Rust with e2e Type Safety | Side Project S

Elysia - TypeScript framework that is (almost) as fast as Rust with e2e Type Safety | Side Project S

SpaceX Finally Finishes Testing! Starship Flight 5 Next!

SpaceX Finally Finishes Testing! Starship Flight 5 Next!

Inside the LLM: Visualizing the Embeddings Layer of Mistral-7B and Gemma-2B

Inside the LLM: Visualizing the Embeddings Layer of Mistral-7B and Gemma-2B

Honda forced to shut down 3 MORE car factories as demand collapses

Honda forced to shut down 3 MORE car factories as demand collapses

Understanding STaR and how it powers Claude and Gemini/Gemma 2 (and maybe OpenAI Q* or Strawberry)

Understanding STaR and how it powers Claude and Gemini/Gemma 2 (and maybe OpenAI Q* or Strawberry)

Ex-OpenAI genius launches new “Super Intelligence” company

Ex-OpenAI genius launches new “Super Intelligence” company

Has Generative AI Already Peaked? - Computerphile

Has Generative AI Already Peaked? - Computerphile

Опасность фирменной зарядки Apple

Опасность фирменной зарядки Apple

Privacy on iPhone | Flock | Apple

Privacy on iPhone | Flock | Apple

#best PLAYSTATION CONSOLE #collection #shortvideos #gaming #foryou

#best PLAYSTATION CONSOLE #collection #shortvideos #gaming #foryou

ЧТО ЭТО За Флешки Замурованные в СТЕНЕ? #shorts

ЧТО ЭТО За Флешки Замурованные в СТЕНЕ? #shorts

Why No One Is Using Windows 11

Why No One Is Using Windows 11

I Built my own Power Supply (don't do it)

I Built my own Power Supply (don't do it)

This Is Getting Ridiculous

This Is Getting Ridiculous

Mongraal's $100,000 Gaming Setup

Mongraal's $100,000 Gaming Setup