The KV Cache: Memory Usage in Transformers

Rotary Positional Embeddings: Combining Absolute and Relative

Confused which Transformer Architecture to use? BERT, GPT-3, T5, Chat GPT? Encoder Decoder Explained

I wish I could change THIS fast! 🤣

JAK SE DĚLAJÍ DĚTI…

Jak na VEDRO?! 🥵

Which transformer architecture is best? Encoder-only vs Encoder-decoder vs Decoder-only models

Efficient NLP

zhlédnutí 19 724

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 1. 07. 2024
The battle of transformer architectures: Encoder-only vs Encoder-decoder vs Decoder-only models. Discover the architecture and strengths of each model type to make informed decisions for your NLP projects.
0:00 - Introduction
0:50 - Encoder-only transformers
2:40 - Encoder-decoder (seq2seq) transformers
4:40 - Decoder-only transformers

Komentáře • 28

@sp5394 Před dnem
Thank you very much. Great video! Clear, concise and yet covers most of the necessary details.
@sumukhas5418 Před 9 měsíci ⁺¹
Great video, learnt a lot on how models work
Looking forward on more videos like these 😊
@chrisogonas Před 9 měsíci
Well illustrated. Thanks
@chitranair1105 Před 6 měsíci ⁺¹
Good explanation. Thanks!
@groundingtiming Před 9 měsíci ⁺²
great video, can you make one with more detail focusing on the why ?
@Monoglossia Před rokem ⁺¹
Very clear, thank you!
@ZivShemesh Před 9 měsíci
Thank you very much, very helpful!
@kevon217 Před rokem
Great overview!
@WhatsAI Před 11 měsíci
Great video Bai! :)
@nudelsuppenzauberer3367 Před 4 měsíci
I think u safe my exams ty man
@MannyBernabe Před 4 měsíci
thx
@prabhakarnimmagadda6599 Před 11 měsíci ⁺²
Good bro
@xflory26x Před rokem ⁺⁵
It's still not clear what the difference between the three are - how are they different in terms of the way they process the text? How is the encoder-decoder different to the decoder only - if both of them are autoregressive?
@EfficientNLP Před rokem ⁺¹
Indeed they have a lot in common and both encoder-decoder and decoder-only models do autoregressive decoding. The main difference is encoder-decoder models make an architectural distinction between the input and output, in encoder-decoder models typically there is a cross-attention mechanism in the decoder, which is not present in decoder-only models.
@arabindabhattacharjee9774 Před 7 měsíci ⁺²
One thing which I still didnot understand was, how decoder only model works, when the encoder is not there? What part ensures that the sequence of inputs are managed and do not get jumbled up for a correct output?
@EfficientNLP Před 7 měsíci
In the decoder-only model, the input is provided as a prompt or prefix, which the model uses to generate subsequent tokens. As for how they don't get jumbled up - they use positional encodings to convey information about word order. I have some videos about how positional encodings work if you're interested.
@desrucca Před 6 měsíci ⁺¹
@@EfficientNLPIve tried prompting a conversational chatbot in transformers library Python.
But I found out decoder-only (causal) model is slower by many times compared to (seq2seq) encoder-decoder model. Why is that?
@saramoeini4286 Před měsícem
Hi. Thanks for your video
If my encoder produce series of tags for each word in input sentence and I want to use that tags for generating text that is correct based on input and generated tags of encoder, how can i use decoder for this?
@EfficientNLP Před měsícem
I don't know of any model specifically designed for this, but one approach is to use a decoder model, where you can feed the text and tags in as a prompt (you may experiment with different ways of encoding this and see what works best).
@saramoeini4286 Před měsícem
@@EfficientNLP Thank you.
@Sessrikant Před 5 měsíci
Thanks but not clear. Do you think encoder only or encoder-decoder is a matter of past as chatGPT now takes speech as input means speech to text its able to process?
@EfficientNLP Před 5 měsíci
Speech-to-text models generally use encoder-decoder architectures and cannot be handled by decoder-only model. ChatGPT I believe uses a separate speech model to transcribe before the main text based model.
@Sessrikant Před 5 měsíci
@@EfficientNLPOn decoder-only architecture for speech-to-text and large language model integration
Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu
Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been explored well. The "decoder-only" architecture has also not been well studied for speech processing tasks. In this research, we introduce Speech-LLaMA, a novel approach that effectively incorporates acoustic information into text-based large language models. Our method leverages Connectionist Temporal Classification and a simple audio encoder to map the compressed acoustic features to the continuous semantic space of the LLM. In addition, we further probe the decoder-only architecture for speech-to-text tasks by training a smaller scale randomly initialized speech-LLaMA model from speech-text paired data alone. We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines, highlighting the potential advantages of decoder-only models for speech-to-text conversion. arXiv:2205.01086
@kaustuvray5066 Před 6 měsíci
at 3:08 why does the encoder take 4 timesteps? Isnt the encoder supposed to be parallel?
@EfficientNLP Před 6 měsíci
You’re right, transformer encoders process all the input in parallel. However, encoders are not always transformers, and in this case the figure shows an example of the older RNN/LSTM type of encoder.
@MrFromminsk Před 7 měsíci
If the decoder only models can be used for summarization, translation, etc, why do we even need encoders?
@EfficientNLP Před 7 měsíci ⁺¹
For many tasks like summarization, both decoder-only and encoder-decoder architectures are viable. However, encoder-decoder architectures are preferred for certain tasks that are naturally sequence-to-sequence, like machine translation. Furthermore, for tasks involving different modalities, such as speech-to-text, only encoder-decoder models will work; you cannot use a decoder-only model.

Další v pořadí

Automatické přehrávání

The KV Cache: Memory Usage in Transformers

The KV Cache: Memory Usage in Transformers

Rotary Positional Embeddings: Combining Absolute and Relative

Rotary Positional Embeddings: Combining Absolute and Relative

Confused which Transformer Architecture to use? BERT, GPT-3, T5, Chat GPT? Encoder Decoder Explained

Confused which Transformer Architecture to use? BERT, GPT-3, T5, Chat GPT? Encoder Decoder Explained

I wish I could change THIS fast! 🤣

I wish I could change THIS fast! 🤣

JAK SE DĚLAJÍ DĚTI…

JAK SE DĚLAJÍ DĚTI…

Jak na VEDRO?! 🥵

Jak na VEDRO?! 🥵

Heartwarming: Stranger Saves Puppy from Hot Car #shorts

Heartwarming: Stranger Saves Puppy from Hot Car #shorts

The Attention Mechanism in Large Language Models

The Attention Mechanism in Large Language Models

Encoder Decoder Network - Computerphile

Encoder Decoder Network - Computerphile

[최신 연구 동향] Making Kernel Bypass Practical for the Cloud with Junction (NSDI 2024)

[최신 연구 동향] Making Kernel Bypass Practical for the Cloud with Junction (NSDI 2024)

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

Blowing up Transformer Decoder architecture

Blowing up Transformer Decoder architecture

How is Beam Search Really Implemented?

How is Beam Search Really Implemented?

I wish every AI Engineer could watch this.

I wish every AI Engineer could watch this.

Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA

Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA

How a Transformer works at inference vs training time

How a Transformer works at inference vs training time

Can You Draw A PERFECTLY Dotted Line?

Can You Draw A PERFECTLY Dotted Line?

AI: Giganti, horečka a konec světa | KOVY

AI: Giganti, horečka a konec světa | KOVY

I CAN’T BELIEVE I LOST 😱

I CAN’T BELIEVE I LOST 😱

Někdy to prostě nemáme jednoduchý… 🤷‍♂️🤣 Petr hostem Všechnopárty Karla Šípa na České televizi. 📺

Někdy to prostě nemáme jednoduchý… 🤷‍♂️🤣 Petr hostem Všechnopárty Karla Šípa na České televizi. 📺

Alex Pereira KOs Jiri Prochazka to Defend the Light Heavyweight Belt at UFC 303! 🏆

Alex Pereira KOs Jiri Prochazka to Defend the Light Heavyweight Belt at UFC 303! 🏆

KDYŽ SE NADECHNEŠ, PROHRAJEŠ… 😱 #shorts

KDYŽ SE NADECHNEŠ, PROHRAJEŠ… 😱 #shorts

This Girl shows the Smart Way of starting fire🔥👩‍🚒 #camping #fire #outdoors #bushcraft #survival

This Girl shows the Smart Way of starting fire🔥👩‍🚒 #camping #fire #outdoors #bushcraft #survival

Recycled Car Tyres Get a Second Life! ♻️

Recycled Car Tyres Get a Second Life! ♻️