A better Hugging Face model search with OpenAI, RAG, pgvector

Teacher-Student Neural Networks: Knowledge Distillation in AI

Lesson 3: Practical Deep Learning for Coders 2022

Have You Seen Inside Out 2?

Tomáš Le Sy | Konec Tadeáše Veselého | Rozhovor o turnaji Clash of the Stars 8

Please be kind🙏

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Efficient NLP

zhlédnutí 14 462

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 29. 06. 2024
Four techniques to optimize the speed of your model's inference process:
0:38 - Quantization
5:59 - Pruning
9:48 - Knowledge Distillation
13:00 - Engineering Optimizations
References:
LLM Inference Optimization blog post: lilianweng.github.io/posts/20...
How to deploy your deep learning project on a budget: luckytoilet.wordpress.com/202...
Efficient deep learning survey paper: arxiv.org/abs/2106.08962
SparseDNN: arxiv.org/abs/2101.07948
Věda a technologie

Komentáře • 24

@thomasschmitt9669 Před 3 měsíci ⁺⁶
This was one of the best explanation videos I have ever seen! Well structured and right complexity grade to follow without getting a headache. 👌
@bonob0123 Před 17 dny
that was really nicely done. as a non-expert, I feel like I can now have a great general idea of what a quantized model is. thank you
@DurgaNagababuMolleti Před 5 dny
Superb
@lucaskeller656 Před 2 měsíci ⁺¹
Great format, succinctness, and diagrams. Thank you!
@muhannadobeidat Před 3 měsíci ⁺¹
Excellent video. Well spoken. Nice visualizations.
@vineetkumarmishra2989 Před 3 měsíci ⁺¹
wonderfully explained !!
Thanks for the video.
@420_gunna Před 4 měsíci
This felt very nicely taught -- I loved that you pulled back a summary/review at the end of the video - great practice. Please continue, thank you!
@unclecode Před 4 měsíci ⁺¹
Great content, well done. Please make a video for ONNX, and another one for Flash Attention. Appreciate.
@huiwencheng4585 Před 5 měsíci ⁺¹
Fantastic introduction and explanation !
@jeremyuzan1169 Před 2 měsíci ⁺¹
Great video
@jokmenen_ Před 4 měsíci ⁺¹
Awesome video!
@heteromodal Před 5 měsíci ⁺¹
What a great video! Thank you!
@user-bd7eq6vx1t Před rokem ⁺⁵
your teaches so excellent.. we accepted many more videos from your side to understand for the fundamental NLP
@kevon217 Před 10 měsíci
^
@user-qo7vr3ml4c Před měsícem
Great summary, thank you.
@kevon217 Před 10 měsíci ⁺¹
Thanks for this!
@hrsight Před 2 měsíci ⁺¹
nice video
@MuhammadAli-dw7mv Před měsícem
nicely done
@yunlu4657 Před 5 měsíci ⁺¹
Excellent video, learnt a lot! However, the definition of zero-point quantization is off. What you're showing in the video is the abs-max quantization instead.
@EfficientNLP Před 5 měsíci
The example I showed is zero-point quantization because 0 in the original domain is mapped to 0 in the quantized domain (before transforming to unsigned). In abs-max (not covered in this video), the maximum in the original domain would be mapped to 127, and the minimum would be mapped to -128.
@ricardokullock2535 Před měsícem
And if one was to quantize a distilled model? Is the outcome any good?
@EfficientNLP Před měsícem ⁺¹
Yes, these two techniques are often used together to improve efficiency.
@andrea-mj9ce Před 3 měsíci
The explanation for distillation remains at the surface, it is not enough to understand it
@EfficientNLP Před 3 měsíci
If you have any specific questions I’ll try to answer them!

Další v pořadí

Automatické přehrávání

A better Hugging Face model search with OpenAI, RAG, pgvector

A better Hugging Face model search with OpenAI, RAG, pgvector

Teacher-Student Neural Networks: Knowledge Distillation in AI

Teacher-Student Neural Networks: Knowledge Distillation in AI

Lesson 3: Practical Deep Learning for Coders 2022

Lesson 3: Practical Deep Learning for Coders 2022

Have You Seen Inside Out 2?

Have You Seen Inside Out 2?

Tomáš Le Sy | Konec Tadeáše Veselého | Rozhovor o turnaji Clash of the Stars 8

Tomáš Le Sy | Konec Tadeáše Veselého | Rozhovor o turnaji Clash of the Stars 8

Please be kind🙏

Please be kind🙏

Minecraft Okno v REÁLNÉM ŽIVOTĚ 😳 #shorts

Minecraft Okno v REÁLNÉM ŽIVOTĚ 😳 #shorts

Speculative Decoding: When Two LLMs are Faster than One

Speculative Decoding: When Two LLMs are Faster than One

Can Whisper be used for real-time streaming ASR?

Can Whisper be used for real-time streaming ASR?

The KV Cache: Memory Usage in Transformers

The KV Cache: Memory Usage in Transformers

Gail Weiss: Thinking Like Transformers

Gail Weiss: Thinking Like Transformers

Knowledge Distillation in Deep Learning - Basics

Knowledge Distillation in Deep Learning - Basics

Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

Knowledge Distillation in Deep Learning - DistilBERT Explained

Knowledge Distillation in Deep Learning - DistilBERT Explained

tinyML Talks: A Practical Guide to Neural Network Quantization

tinyML Talks: A Practical Guide to Neural Network Quantization

Which transformer architecture is best? Encoder-only vs Encoder-decoder vs Decoder-only models

Which transformer architecture is best? Encoder-only vs Encoder-decoder vs Decoder-only models

ВОЗМОЖНО ЛИ ПОЧИСТИТЬ КЛАВИАТУРУ КЛЕЕМ?🤔 #shorts

ВОЗМОЖНО ЛИ ПОЧИСТИТЬ КЛАВИАТУРУ КЛЕЕМ?🤔 #shorts

Using Your phone in the Rain 💀.

Using Your phone in the Rain 💀.

Gizli Apple Watch Özelliği😱

Gizli Apple Watch Özelliği😱

Lid hologram 3d

Lid hologram 3d

Rabbit R1 makes catastrophic rookie programming mistake

Rabbit R1 makes catastrophic rookie programming mistake

cute mini iphone

cute mini iphone

The Weird, Terrible Smartphones They Only Have in North Korea

The Weird, Terrible Smartphones They Only Have in North Korea

High voltage Ground Fault testing.

High voltage Ground Fault testing.