Quantization - Dmytro Dzhulgakov

Sdílet
Vložit
  • čas přidán 28. 06. 2024
  • It’s important to make efficient use of both server-side and on-device compute resources when developing ML applications. To support more efficient deployment on servers and edge devices, PyTorch 1.3 now supports 8-bit model quantization using the familiar eager mode Python API.
  • Věda a technologie

Komentáře • 7

  • @digitaldreamer8637
    @digitaldreamer8637 Před 3 lety

    Excellent work. Very clear 👍🏼. I think Tesla needs help with Int8 Quantization. 😉

  • @xXMockapapellaXx
    @xXMockapapellaXx Před 4 lety

    Thank you for the talk. It's good to see a focused video on the quantization efforts for PyTorch.
    While I know this video is kind of old, I've been looking for a way to quantize GPT-2 XL for use on a GPU server (not mobile, mainly due to its size and computation requirements). I explain it in much better detail in this GitHub issue on huggingface's transformers repo: github.com/huggingface/transformers/issues/2466, but basically when I try to save the models for later use the file size gets bigger and performance gets worse (text repeats A LOT when it shouldn't with a variety of different prompts).

    • @PyTorch
      @PyTorch  Před 4 lety

      Hello. For help, please join and post in the PyTorch Forums: discuss.pytorch.org

  • @ameynaik2743
    @ameynaik2743 Před 2 lety

    czcams.com/video/IPQmGzYuxmc/video.html - What does this mean? Folding batch norm computation into convolution?

  • @MrDeyzel
    @MrDeyzel Před 3 lety

    Fusing the ResNet50 models like that doesn't work.

    • @dzhulgakov
      @dzhulgakov Před 3 lety +4

      What is the exact problem you encounter? You can try to ask at pytorch forums (discuss.pytorch.org/) or create a github issue.
      Maybe some of the minor things in APIs have changed since the talk was given, but generally it should work. Specifically you can refer to the following:
      - quantization tutorial (talks about MobileNetV2 instead of ResNet, but the idea is the same): pytorch.org/tutorials/advanced/static_quantization_tutorial.html
      - specifically for ResNet, there are already quantized models in TorchVision: pytorch.org/blog/introduction-to-quantization-on-pytorch/#integration-in-torchvision
      - ResNet50 specifically: github.com/pytorch/vision/blob/master/torchvision/models/quantization/resnet.py#L151
      - tutorial for using them: pytorch.org/tutorials/intermediate/quantized_transfer_learning_tutorial.html

  • @Khan0156
    @Khan0156 Před rokem +1

    Why most of Data Scientists on talks like this can't speak english properly?