SLICK: Driving SLO Culture At Meta | Dávid Bartók & Filip Klepo

Turing-NLG, DeepSpeed and the ZeRO optimizer

Optics in AI Clusters - Meta Perspective

Classic Italian Pasta Dog

DRAMA MR.BEASTA

【斗罗大陆】坏人居然敢欺负唐舞桐？斗罗家族可不好惹哟！#斗罗大陆#唐舞桐#唐三#小舞

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper

@Scale

zhlédnutí 3 125

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 29. 08. 2024
In this talk we present how we trained a 530B parameter language model on a DGX SuperPOD with over 3,000 A100 GPUs and a high speed Infiniband interconnect, and how we can scale to even larger models. We explore three types of parallelism: data, tensor, and pipeline and how these different types can be composed to achieve maximum efficiency. Our approach allows us to perform training iterations on a model with 1 trillion parameters at 502 petaFLOP/s on 3072 GPUs (per-GPU throughput of 52% of theoretical peak). We discuss challenges that we faced when training the 530B Megatron-Turing NLG model and give practical advice on how to successfully train very large language models.
Věda a technologie

Komentáře • 3

@voncolborn9437 Před 7 měsíci ⁺¹
Being an old-timer on computer ops (from back in the 80s), I find this whole new world of computer operations totally facinating. It really is hard for me to wrap my head around the size and performance of these systems. My hat is off to you guys. I'm watching and learning a little, too.
@prajyot2021 Před 3 měsíci
Need more such detailed content Jared. Appreciate your Work. Thanks Mate
@kazimejbaulislam9185 Před 8 měsíci
amazing explanation! Thanks

Další v pořadí

Automatické přehrávání

SLICK: Driving SLO Culture At Meta | Dávid Bartók & Filip Klepo

SLICK: Driving SLO Culture At Meta | Dávid Bartók & Filip Klepo

Turing-NLG, DeepSpeed and the ZeRO optimizer

Turing-NLG, DeepSpeed and the ZeRO optimizer

Optics in AI Clusters - Meta Perspective

Optics in AI Clusters - Meta Perspective

Classic Italian Pasta Dog

Classic Italian Pasta Dog

DRAMA MR.BEASTA

DRAMA MR.BEASTA

【斗罗大陆】坏人居然敢欺负唐舞桐？斗罗家族可不好惹哟！#斗罗大陆#唐舞桐#唐三#小舞

【斗罗大陆】坏人居然敢欺负唐舞桐？斗罗家族可不好惹哟！#斗罗大陆#唐舞桐#唐三#小舞

NSDI '24 - MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

NSDI '24 - MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Scaling AI Workloads with Kubernetes: Sharing GPU Resources Across Multiple Containers - Jack Ong

Scaling AI Workloads with Kubernetes: Sharing GPU Resources Across Multiple Containers - Jack Ong

Beyond the Hype: A Realistic Look at Large Language Models • Jodie Burchell • GOTO 2024

Beyond the Hype: A Realistic Look at Large Language Models • Jodie Burchell • GOTO 2024

BloombergGPT: How We Built a 50 Billion Parameter Financial Language Model

BloombergGPT: How We Built a 50 Billion Parameter Financial Language Model

Scaling AI Model Training and Inferencing Efficiently with PyTorch

Scaling AI Model Training and Inferencing Efficiently with PyTorch

Scaling RoCE Networks for AI Training | Adi Gangidi

Scaling RoCE Networks for AI Training | Adi Gangidi

Large Model Training and Inference with DeepSpeed // Samyam Rajbhandari // LLMs in Prod Conference

Large Model Training and Inference with DeepSpeed // Samyam Rajbhandari // LLMs in Prod Conference

Trends in Deep Learning Hardware: Bill Dally (NVIDIA)

Trends in Deep Learning Hardware: Bill Dally (NVIDIA)

Large Language Models (LLMs) - Everything You NEED To Know

Large Language Models (LLMs) - Everything You NEED To Know

Первый обзор Pixel 9 Pro / XL и почему дешёвый Pixel 9 - лучше

Первый обзор Pixel 9 Pro / XL и почему дешёвый Pixel 9 — лучше

I tried to Power my Home with Wind Generators! (Worth it?)

I tried to Power my Home with Wind Generators! (Worth it?)

I used oil to make a macro lens

I used oil to make a macro lens

Homemade mobile phone cooling device, the phone does not get hot and can be used as a stand. I n

Homemade mobile phone cooling device, the phone does not get hot and can be used as a stand. I n

Data recovery from MicroSD using PC3000 Flash & Spider Board 😎

Data recovery from MicroSD using PC3000 Flash & Spider Board 😎

I bought the most POINTLESS Tech on the internet.

I bought the most POINTLESS Tech on the internet.

Control Panel Going Bye Bye

Control Panel Going Bye Bye