GPU optimization workshop (hosted by

Sdílet
Vložit
  • čas přidán 22. 05. 2024
  • 00:30 Workshop overview
    03:51 Crash course to GPU optimization (Mark Saroufim, Meta)
    39:18 High performance LLM serving on NVIDIA GPUs (Sharan Chetlur, NVIDIA)
    1:19:18 Block-based GPU Programming with Triton (Philippe Tillet, OpenAI)
    1:59:00 Scaling data processing from CPU to distributed GPUs (William Malpica, Voltron Data)
    Join the discussion on Discord: / discord
    Shared note (during the event): docs.google.com/document/d/1T...
    GitHub repo with schedule: github.com/mlops-discord/gpu-...
    ​Philippe Tillet, who’s leading the Triton team at OpenAI. Previously, he was at pretty much all major chip makers including NVIDIA, AMD, Intel, and Nervana.
    ​Sharan Chetlur, Principal engineer working on TensorRT-LLM at NVIDIA. He’s been working on CUDA since 2012, having optimized the performance of deep learning models from single GPU to full data center scale. Previously, he was Director of Engineer on Kernels team at Cerebras.
    ​William Malpica, co-founder of Voltron Data and creator of BlazingSQL. He helped scale our GPU-native query engine to handle 100TB queries!
    Mark Saroufim, PyTorch core developer and cofounder of CUDA MODE. He also ran the really fun NeurIPS LLM Efficiency challenge last year. Previously, he was at Graphcore and Microsoft.

Komentáře • 7

  • @KSK986
    @KSK986 Před 4 dny

    Great workshop !!! Very much insightful. Thanks to the organizers and all the speakers.

  • @VipulVaibhaw
    @VipulVaibhaw Před 27 dny +5

    This was fantastic and very helpful!

  • @SomeshChatterjee
    @SomeshChatterjee Před 12 dny

    Thank you so much for this amazing content!!

  • @sankeerth1729
    @sankeerth1729 Před 19 dny

    Thanks for organizing this, Chip!

  • @kevthedestroyer1044
    @kevthedestroyer1044 Před 24 dny

    The discord link is not working, would love if someone can share a new one!