Quantization - Dmytro Dzhulgakov
Vložit
- čas přidán 28. 06. 2024
- It’s important to make efficient use of both server-side and on-device compute resources when developing ML applications. To support more efficient deployment on servers and edge devices, PyTorch 1.3 now supports 8-bit model quantization using the familiar eager mode Python API.
- Věda a technologie
Excellent work. Very clear 👍🏼. I think Tesla needs help with Int8 Quantization. 😉
Thank you for the talk. It's good to see a focused video on the quantization efforts for PyTorch.
While I know this video is kind of old, I've been looking for a way to quantize GPT-2 XL for use on a GPU server (not mobile, mainly due to its size and computation requirements). I explain it in much better detail in this GitHub issue on huggingface's transformers repo: github.com/huggingface/transformers/issues/2466, but basically when I try to save the models for later use the file size gets bigger and performance gets worse (text repeats A LOT when it shouldn't with a variety of different prompts).
Hello. For help, please join and post in the PyTorch Forums: discuss.pytorch.org
czcams.com/video/IPQmGzYuxmc/video.html - What does this mean? Folding batch norm computation into convolution?
Fusing the ResNet50 models like that doesn't work.
What is the exact problem you encounter? You can try to ask at pytorch forums (discuss.pytorch.org/) or create a github issue.
Maybe some of the minor things in APIs have changed since the talk was given, but generally it should work. Specifically you can refer to the following:
- quantization tutorial (talks about MobileNetV2 instead of ResNet, but the idea is the same): pytorch.org/tutorials/advanced/static_quantization_tutorial.html
- specifically for ResNet, there are already quantized models in TorchVision: pytorch.org/blog/introduction-to-quantization-on-pytorch/#integration-in-torchvision
- ResNet50 specifically: github.com/pytorch/vision/blob/master/torchvision/models/quantization/resnet.py#L151
- tutorial for using them: pytorch.org/tutorials/intermediate/quantized_transfer_learning_tutorial.html
Why most of Data Scientists on talks like this can't speak english properly?