Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Sdílet
Vložit
  • čas přidán 29. 06. 2024
  • Four techniques to optimize the speed of your model's inference process:
    0:38 - Quantization
    5:59 - Pruning
    9:48 - Knowledge Distillation
    13:00 - Engineering Optimizations
    References:
    LLM Inference Optimization blog post: lilianweng.github.io/posts/20...
    How to deploy your deep learning project on a budget: luckytoilet.wordpress.com/202...
    Efficient deep learning survey paper: arxiv.org/abs/2106.08962
    SparseDNN: arxiv.org/abs/2101.07948
  • Věda a technologie

Komentáře • 24

  • @thomasschmitt9669
    @thomasschmitt9669 Před 3 měsíci +6

    This was one of the best explanation videos I have ever seen! Well structured and right complexity grade to follow without getting a headache. 👌

  • @bonob0123
    @bonob0123 Před 17 dny

    that was really nicely done. as a non-expert, I feel like I can now have a great general idea of what a quantized model is. thank you

  • @DurgaNagababuMolleti
    @DurgaNagababuMolleti Před 5 dny

    Superb

  • @lucaskeller656
    @lucaskeller656 Před 2 měsíci +1

    Great format, succinctness, and diagrams. Thank you!

  • @muhannadobeidat
    @muhannadobeidat Před 3 měsíci +1

    Excellent video. Well spoken. Nice visualizations.

  • @vineetkumarmishra2989
    @vineetkumarmishra2989 Před 3 měsíci +1

    wonderfully explained !!
    Thanks for the video.

  • @420_gunna
    @420_gunna Před 4 měsíci

    This felt very nicely taught -- I loved that you pulled back a summary/review at the end of the video - great practice. Please continue, thank you!

  • @unclecode
    @unclecode Před 4 měsíci +1

    Great content, well done. Please make a video for ONNX, and another one for Flash Attention. Appreciate.

  • @huiwencheng4585
    @huiwencheng4585 Před 5 měsíci +1

    Fantastic introduction and explanation !

  • @jeremyuzan1169
    @jeremyuzan1169 Před 2 měsíci +1

    Great video

  • @jokmenen_
    @jokmenen_ Před 4 měsíci +1

    Awesome video!

  • @heteromodal
    @heteromodal Před 5 měsíci +1

    What a great video! Thank you!

  • @user-bd7eq6vx1t
    @user-bd7eq6vx1t Před rokem +5

    your teaches so excellent.. we accepted many more videos from your side to understand for the fundamental NLP

  • @user-qo7vr3ml4c
    @user-qo7vr3ml4c Před měsícem

    Great summary, thank you.

  • @kevon217
    @kevon217 Před 10 měsíci +1

    Thanks for this!

  • @hrsight
    @hrsight Před 2 měsíci +1

    nice video

  • @MuhammadAli-dw7mv
    @MuhammadAli-dw7mv Před měsícem

    nicely done

  • @yunlu4657
    @yunlu4657 Před 5 měsíci +1

    Excellent video, learnt a lot! However, the definition of zero-point quantization is off. What you're showing in the video is the abs-max quantization instead.

    • @EfficientNLP
      @EfficientNLP  Před 5 měsíci

      The example I showed is zero-point quantization because 0 in the original domain is mapped to 0 in the quantized domain (before transforming to unsigned). In abs-max (not covered in this video), the maximum in the original domain would be mapped to 127, and the minimum would be mapped to -128.

  • @ricardokullock2535
    @ricardokullock2535 Před měsícem

    And if one was to quantize a distilled model? Is the outcome any good?

    • @EfficientNLP
      @EfficientNLP  Před měsícem +1

      Yes, these two techniques are often used together to improve efficiency.

  • @andrea-mj9ce
    @andrea-mj9ce Před 3 měsíci

    The explanation for distillation remains at the surface, it is not enough to understand it

    • @EfficientNLP
      @EfficientNLP  Před 3 měsíci

      If you have any specific questions I’ll try to answer them!