Napkin Math For Fine Tuning w/Johno Whitaker

Sdílet
Vložit
  • čas přidán 9. 07. 2024
  • We will show you how to build intuition around training performance with a focus on GPU-poor fine tuning.
    This is a talk from Mastering LLMs: A survey course on applied topics for Large Language Models.
    More resources available here:
    parlance-labs.com/education/f...
    00:00 Introduction
    Johno introduces the topic "Napkin Math for Fine Tuning," aiming to answer common questions related to model training, especially for beginners in fine-tuning large existing models.
    01:23 About Johno and AnswerAI
    Johno shares his background and his work at AnswerAI, an applied R&D lab focusing on the societal benefits of AI.
    03:18 Plan for the Talk
    Johno outlines the structure of the talk, including objectives, running experiments, and live napkin math to estimate memory use.
    04:40 Training and Fine-Tuning Loop
    Description of the training loop: feeding data through a model, measuring accuracy, updating the model, and repeating the process.
    09:05 Hardware Considerations
    Discussion on the different hardware components (CPU, GPU, RAM) and how they affect training performance.
    12:28 Tricks for Efficient Training
    Overview of various techniques to optimize training efficiency, including LoRa, quantization, and CPU offloading.
    13:12 Full Fine-Tuning
    Describes the parameters and memory involved with full fine-tuning
    18:14 LoRA
    Detailed explanation of full fine-tuning versus parameter-efficient fine-tuning techniques like LoRa.
    21:04 Quantization and Memory Savings
    Discussion on quantization methods to reduce memory usage and enable training of larger models.
    23:10 Combining Techniques
    Combining different techniques like quantization and LoRa to maximize training efficiency.
    22:55 Running Experiments
    Importance of running controlled experiments to understand the impact of various training parameters.
    25:46 CPU Offloading
    How CPU offloading works and the tradeoffs.
    28:31 Real-World Example
    Demo of memory optimization and problem-solving during model training, with code. This also includes pragmatic ways to profile your code.
    45:44 Case Study: QLoRA + FSDP
    Discussion of QLorA with FSDP, along with a discussion of tradeoffs.
    54:25 Recap / Conclusion
    Johno summarizes the key points of his talk.
  • Jak na to + styl

Komentáře • 2

  • @dahiruibrahimdahiru2690

    Johno is someone you always want to listen to, there's so much in that brain you would want to pick.

  • @anne-marieroy8812
    @anne-marieroy8812 Před 7 dny

    Thank you so much for this presentation and ways to tweak the model on GPUs very instructive.