Foundation Models on Consumer Devices - Tianqi Chen | Stanford MLSys #85

Beyond the Hype: A Realistic Look at Large Language Models • Jodie Burchell • GOTO 2024

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

EURO 2024 Byl NEJNUDNĚJŠÍ Turnaj ve FOTBALE…

DAD LEFT HIS OLD SOCKS ON THE COUCH…😱😂

3.5M❤️ #thankyou #shorts

Serving 100s of LLMs on 1 GPU with LoRAX - Travis Addair | Stanford MLSys #84

Stanford MLSys Seminars

zhlédnutí 5 924

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 5. 11. 2023
Episode 84 of the Stanford MLSys Seminar Series!
Serving 100s of Fine-Tuned LLMs on 1 GPU with LoRAX
Speaker: Travis Addair
Abstract:
Smaller, specialized language models such as LLaMA-2-7b can outperform larger general-purpose models like GPT-4 when fine-tuned on proprietary data to perform a single task. But serving many fine-tuned LLMs in production can quickly add up to tens of thousands of dollars per month in cloud costs when each model requires its own dedicated GPU resources. LoRA Exchange (LoRAX) is an LLM inference system built for serving numerous fine-tuned LLMs using a shared set of GPU resources. With LoRAX, users can pack over 100 task-specific models into a single GPU, significantly reducing the expenses associated with serving fine-tuned models by orders of magnitude over dedicated deployments. In this seminar, we'll explore the challenges of serving fine-tuned LLMs in production, and the motivation behind building a system like LoRAX. We'll introduce parameter efficient fine-tuning adapters like Low Rank Adaptation (LoRA), and show how LoRAX dynamically loads and exchanges different adapters at runtime, leveraging a tiered weight cache to speed up this exchange process. Additionally, we'll show how LoRAX achieves high throughput with continuous multi-adapter batching, allowing requests from different fine-tuned adapters to batch together within a single decoding step.
Bio:
Travis Addair is co-founder and CTO of Predibase, the AI platform for engineers. Within the Linux Foundation, he serves as lead maintainer for the Horovod distributed deep learning framework and is a co-maintainer of the Ludwig automated deep learning framework. In the past, he led Uber's deep learning training team as part of the Michelangelo machine learning platform.
--
Stanford MLSys Seminar hosts: Simran Arora, Dan Fu
Twitter:
/ simran_s_arora
/ realdanfu
--
Check out our website for the schedule: mlsys.stanford.edu
Join our mailing list to get weekly updates: groups.google.com/forum/#!for...
#machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford
Věda a technologie

Komentáře • 8

@voncolborn9437 Před 7 měsíci ⁺²
Great presentation. It is interesting to see the practical side of running a bunch of LLMs. Ops makes it happen. Coming from the old, really old, school of computing with massive multi-user, time-share systems, it is interesting to see how no matter how much computing changes, aspects of it remain the same. Through-put, latency, caching and scheduling is still central. All that seems to have changed is the problem domain. We do, in deed, live in intereswting times.
@conan_der_barbar Před 8 měsíci ⁺¹
great talk! still waiting for the open source release 👀
@suleimanshehu5839 Před 7 měsíci
Please create a video on fine tuning MoE LLM using LoRa adapters such as Mixtural 8x7B MoE LLM within your framework
@fastcardlastname3353 Před 8 měsíci
This shall change the landscape of multiple agents if it's promised.
@Gerald-iz7mv Před 4 měsíci
hi, do you have any links to benchmarks you can run to measure latency, throughput for different model and frameworks etc?
@mohamedfouad1309 Před 7 měsíci
Github link😅
@nithinrao7191 Před 8 měsíci
Second
@absbi0000 Před 8 měsíci
First

Další v pořadí

Automatické přehrávání

Foundation Models on Consumer Devices - Tianqi Chen | Stanford MLSys #85

Foundation Models on Consumer Devices - Tianqi Chen | Stanford MLSys #85

Beyond the Hype: A Realistic Look at Large Language Models • Jodie Burchell • GOTO 2024

Beyond the Hype: A Realistic Look at Large Language Models • Jodie Burchell • GOTO 2024

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

EURO 2024 Byl NEJNUDNĚJŠÍ Turnaj ve FOTBALE…

EURO 2024 Byl NEJNUDNĚJŠÍ Turnaj ve FOTBALE…

DAD LEFT HIS OLD SOCKS ON THE COUCH…😱😂

DAD LEFT HIS OLD SOCKS ON THE COUCH…😱😂

3.5M❤️ #thankyou #shorts

3.5M❤️ #thankyou #shorts

Lady Plays Hide and Seek with Her Dog

Lady Plays Hide and Seek with Her Dog

7 Things You Need to Know About Fine-Tuning LLMs

7 Things You Need to Know About Fine-Tuning LLMs

How We've Scaled Dropbox

How We've Scaled Dropbox

Text2SQL: The Dream versus Reality - Laurel Orr | Stanford MLSys #89

Text2SQL: The Dream versus Reality - Laurel Orr | Stanford MLSys #89

Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

Notes on AI Hardware - Benjamin Spector | Stanford MLSys #88

Notes on AI Hardware - Benjamin Spector | Stanford MLSys #88

LoRA explained (and a bit about precision and quantization)

LoRA explained (and a bit about precision and quantization)

Efficiently Build Custom LLMs on Your Data with Open-source Ludwig

Efficiently Build Custom LLMs on Your Data with Open-source Ludwig

Geoffrey Hinton | Will digital intelligence replace biological intelligence?

Geoffrey Hinton | Will digital intelligence replace biological intelligence?

Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83

Training LLMs at Scale - Deepak Narayanan | Stanford MLSys #83

Tento USB-C kabel má skrytou funkci.

Tento USB-C kabel má skrytou funkci.

Is it possible to close the Galaxy Flip 6 Too Hard? #carterpcs #tech #techtok #gaming #techfacts

Is it possible to close the Galaxy Flip 6 Too Hard? #carterpcs #tech #techtok #gaming #techfacts

ЧТО ЭТО За Флешки Замурованные в СТЕНЕ? #shorts

ЧТО ЭТО За Флешки Замурованные в СТЕНЕ? #shorts

Airpods Fit Inside The Galaxy Buds 3 Pro Case...?

Airpods Fit Inside The Galaxy Buds 3 Pro Case...?

Cheapest gaming phone? 🤭 #miniphone #smartphone #iphone #fy

Cheapest gaming phone? 🤭 #miniphone #smartphone #iphone #fy

Nvidia Has A Very Unique Problem #funfact

Nvidia Has A Very Unique Problem #funfact

Privacy on iPhone | Flock | Apple

Privacy on iPhone | Flock | Apple

New setup part 3: There's still a lot to add #setup #gamer #gameroom #techhouse #gamingtech

New setup part 3: There's still a lot to add #setup #gamer #gameroom #techhouse #gamingtech