Google just Solved the Context Window Challenge for Language Models ?

What's AI by Louis-François Bouchard

zhlédnutí 4 603

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 25. 06. 2024
References:
►Read the full article: www.louisbouchard.ai/infini-a...
►Twitter: / whats_ai
►My Newsletter (My AI updates and news clearly explained): louisbouchard.substack.com/
►Support me on Patreon: / whatsai
►Join Our AI Discord: / discord
How to start in AI/ML - A Complete Guide:
►www.louisbouchard.ai/learnai/
Become a member of the CZcams community, support my work and get a cool Discord role :
/ @whatsai
#ai #google #llm
Věda a technologie

Komentáře • 28

@StraussBR Před 2 měsíci ⁺³
The importance of crafting detailed prompts and knowing exactly how to guide the model to fulfill your specific needs is crucial. So far, I've found that managing prompts personally and disabling any pre-configured instructions yields the best results for my needs
@8eck Před 2 měsíci ⁺¹
Basically they have built vector database into LLM itself. Allowing it to perform search on its own, inside its own compressed "memory".
@dr.mikeybee Před 12 dny
Very interesting. It sounds as if Google is taking a dense vector representation for a chat rather than for a token and compressing it through an encoder from a trained auto-encoder for these long chat representations. Then this representation is continuously updated and always passed as context. I've been working on a hierarchical chat history stored in documents processed by RAG, but it does seem to me that some sort of sparse representation is also needed. I was thinking about graphical representations, like an HTM as Jeff Hawkins says may be in cortical columns, but this may be better. I still think the RAG idea is important, however. I've been using gemma:2b since it's very fast for multiple passes.
@ahmod_ali-6seo Před měsícem ⁺¹
Great Content
@WhatsAI Před měsícem
Thank you Ahmod!
@cliffordhilo8709 Před měsícem
can you do more of these where you talk about important papers?
@WhatsAI Před měsícem
Used to do only that! Yes more are coming when I feel like I can bring something about the paper that others didn’t :)
@nevokrien95 Před 2 měsíci ⁺¹
Original transformer already had an infinite window....
Like... sure u trained a big context window but the idea that it's infinite is weird
@narasimhasaladi7 Před 2 měsíci ⁺¹
Means is google trying to implement lstm main idea in the form of self attention?
Looking for ur reply.
@WhatsAI Před 2 měsíci ⁺²
I believe the idea is indeed very similar, and obviously has the same underlying goal! I believe the main difference is that the infini-transformer uses both fixed and variable memory. Meaning that some parts are just more compressed and stay there (older states) and some parts can change with the layers attention process (recent states) vs. In lstm there’s only the latter with gates to decide what content to bring. So I would assume it lacks full context for very very long sequences in the case of LSTMs (well if you want it to be computationally and memory viable). So I’d guess we could day the main achievement here is compression and efficiency for many sequences :)
And of course there’s attention that changes everything here combining current and past sequence tokens and information in a much better way!
@tylersnard Před 2 měsíci ⁺¹
That makes sense, that's kinda how the human mind works too.
@ONDANOTA Před 2 měsíci ⁺²
interesting. what about output token limit?
@WhatsAI Před 2 měsíci ⁺³
From what I understand it reuses calculated states across output tokens and generates tokens in a sequence to sequence manner still, so in the end it is also more efficient for generating longer outputs. Doesn’t seem to affect the quantity of output tokens and mainly still depends on compute budget I think. Couldn’t find clear info in the paper unfortunately!
@ONDANOTA Před 2 měsíci ⁺²
@@WhatsAI thank you for the swift and accurate reply ^___^
@akzsh Před 2 měsíci
sounds like RMSprop + attention
@thomasgoodwin2648 Před 2 měsíci ⁺¹
So it seems like a memoization of prior latents? I guess I can see doing that (in effect providing an ongoing summary of the current context window), but it does seem to me that it might make the models somewhat 'hidebound' in it's thinking when dealing with very long windows.
As a summary gets passed along, more and more emphasis is placed on key words and concepts related to a specific context. For example. If we had been chatting for a long time about river banks, and suddenly I mention that I need to go to the Bank, a model with a deep seated prior set might mistake meanings and believe I am talking about going to a river bank. A human can switch contexts very quickly, where models tend towards what could be described as 'Target Fixation'.
IDK. There's a lot to consider here, and a number of ways something like this can be approached. Certainly a lot of food for thought.
🖖😶‍🌫👍
P.S. I still think backprop of Transfer Functions can be made computationally viable, (I've seen it working), I just don't understand the math and a bit of the code (provided by ChatGPT [just regular, not 4]) well enough to publish and say 'Hey World! Look at this amazing seashell!' (Shoutout to Newton). Still sitting on the edge of when LLMs make me smarter than I actually am 😉
@WhatsAI Před 2 měsíci ⁺¹
Those are indeed good concerns! I’d assume there’s much more weight assigned to the local attention and the model learns to do that during training thanks to the text and how we communicate, not giving too much value to previous states but sometimes it does. Assuming we have enough data, I guess the model gets a good « understanding » of when to look into the past memory and when not to, haha!
Also, it seems the past info is more and more compressed the farther away it is in the past to limit computation required, so it should not influence the results much if we talk a lot about rivers and switch to an actual bank. Would probably be the same if we talk about rivers for 2 sentences or 20 potentially(?). Could be wrong!
@thomasgoodwin2648 Před 2 měsíci
@@WhatsAI Depends on how quickly the a-priori latents are compressed. If there's a drop-off of older memories as new ones come in there's loss.
If newer memories 'overwrite' or overwhelm older memories, there's loss (or potentially corruption).
We've already seen some forms of attacks that slowly corrupt the model into bypassing earlier 'safety' instructions and gaining unrestricted access. Direct attention attacks can't be far away.
Is it just me, or does the world seem to be getting a tiny bit more Cyberpunk everyday?
@dr.mikeybee Před 12 dny ⁺¹
I don't think he means a long text context that's vectorized. I think he means one that is vectorized and compressed through an encoder.
@manfredmichael_3ia097 Před 2 měsíci ⁺¹
Daymn
@tomaszzielinski4521 Před 2 měsíci ⁺⁸
Great content, but these sound effects are terribly disturbing.
@WhatsAI Před 2 měsíci ⁺³
Oh no. I might reduce them for more paper- like videos then! Thank you for the feedback!
@joser100 Před 2 měsíci ⁺¹
Can you follow this up with a review of the TransformerFAM method? in Arxiv 14/04/2024
@WhatsAI Před 2 měsíci
I haven’t seen this paper. I will have a look, thanks for sharing! :)
@8eck Před 2 měsíci
Starting from 3:30 don't thank me.
@WhatsAI Před 2 měsíci ⁺¹
Haha. Would you prefer no introduction and quick videos just starting at this point for the particularity of the paper in question (what they bring new without « soft introduction »)?
Trying to introduce for the followers that enjoy learning about new papers but may work in other sub field of ML :)
@BUY_YOUTUB_VIEWS_d125 Před 2 měsíci ⁺²
Breath of fresh air on CZcams

Další v pořadí

Automatické přehrávání

What are Mixture of Experts (GPT4, Mixtral…)?