Napkin Math For Fine Tuning w/Johno Whitaker
Vložit
- čas přidán 9. 07. 2024
- We will show you how to build intuition around training performance with a focus on GPU-poor fine tuning.
This is a talk from Mastering LLMs: A survey course on applied topics for Large Language Models.
More resources available here:
parlance-labs.com/education/f...
00:00 Introduction
Johno introduces the topic "Napkin Math for Fine Tuning," aiming to answer common questions related to model training, especially for beginners in fine-tuning large existing models.
01:23 About Johno and AnswerAI
Johno shares his background and his work at AnswerAI, an applied R&D lab focusing on the societal benefits of AI.
03:18 Plan for the Talk
Johno outlines the structure of the talk, including objectives, running experiments, and live napkin math to estimate memory use.
04:40 Training and Fine-Tuning Loop
Description of the training loop: feeding data through a model, measuring accuracy, updating the model, and repeating the process.
09:05 Hardware Considerations
Discussion on the different hardware components (CPU, GPU, RAM) and how they affect training performance.
12:28 Tricks for Efficient Training
Overview of various techniques to optimize training efficiency, including LoRa, quantization, and CPU offloading.
13:12 Full Fine-Tuning
Describes the parameters and memory involved with full fine-tuning
18:14 LoRA
Detailed explanation of full fine-tuning versus parameter-efficient fine-tuning techniques like LoRa.
21:04 Quantization and Memory Savings
Discussion on quantization methods to reduce memory usage and enable training of larger models.
23:10 Combining Techniques
Combining different techniques like quantization and LoRa to maximize training efficiency.
22:55 Running Experiments
Importance of running controlled experiments to understand the impact of various training parameters.
25:46 CPU Offloading
How CPU offloading works and the tradeoffs.
28:31 Real-World Example
Demo of memory optimization and problem-solving during model training, with code. This also includes pragmatic ways to profile your code.
45:44 Case Study: QLoRA + FSDP
Discussion of QLorA with FSDP, along with a discussion of tradeoffs.
54:25 Recap / Conclusion
Johno summarizes the key points of his talk. - Jak na to + styl
Johno is someone you always want to listen to, there's so much in that brain you would want to pick.
Thank you so much for this presentation and ways to tweak the model on GPUs very instructive.