How to Build an LLM from Scratch | An Overview

Sdílet
Vložit
  • čas přidán 30. 05. 2024
  • 👉 Need help with AI? Reach out: shawhintalebi.com/
    This is the 6th video in a series on using large language models (LLMs) in practice. Here, I review key aspects of developing a foundation LLM based on the development of models such as GPT-3, Llama, Falcon, and beyond.
    More Resources:
    👉 Series Playlist: • Large Language Models ...
    📰 Read more: towardsdatascience.com/how-to...
    [1] BloombergGPT: arxiv.org/pdf/2303.17564.pdf
    [2] Llama 2: ai.meta.com/research/publicat...
    [3] LLM Energy Costs: www.statista.com/statistics/1...
    [4] arXiv:2005.14165 [cs.CL]
    [5] Falcon 180b Blog: huggingface.co/blog/falcon-180b
    [6] arXiv:2101.00027 [cs.CL]
    [7] Alpaca Repo: github.com/gururise/AlpacaDat...
    [8] arXiv:2303.18223 [cs.CL]
    [9] arXiv:2112.11446 [cs.CL]
    [10] arXiv:1508.07909 [cs.CL]
    [11] SentencePience: github.com/google/sentencepie...
    [12] Tokenizers Doc: huggingface.co/docs/tokenizer...
    [13] arXiv:1706.03762 [cs.CL]
    [14] Andrej Karpathy Lecture: • Let's build GPT: from ...
    [15] Hugging Face NLP Course: huggingface.co/learn/nlp-cour...
    [16] arXiv:1810.04805 [cs.CL]
    [17] arXiv:1910.13461 [cs.CL]
    [18] arXiv:1603.05027 [cs.CV]
    [19] arXiv:1607.06450 [stat.ML]
    [20] arXiv:1803.02155 [cs.CL]
    [21] arXiv:2203.15556 [cs.CL]
    [22] Trained with Mixed Precision Nvidia: docs.nvidia.com/deeplearning/...
    [23] DeepSpeed Doc: www.deepspeed.ai/training/
    [24] paperswithcode.com/method/wei...
    [25] towardsdatascience.com/what-i...
    [26] arXiv:2001.08361 [cs.LG]
    [27] arXiv:1803.05457 [cs.AI]
    [28] arXiv:1905.07830 [cs.CL]
    [29] arXiv:2009.03300 [cs.CY]
    [30] arXiv:2109.07958 [cs.CL]
    [31] huggingface.co/blog/evaluatin...
    [32] www.cs.toronto.edu/~hinton/ab...
    --
    Book a call: calendly.com/shawhintalebi
    Socials
    / shawhin
    / shawhintalebi
    / shawhint
    / shawhintalebi
    The Data Entrepreneurs
    🎥 CZcams: / @thedataentrepreneurs
    👉 Discord: / discord
    📰 Medium: / the-data
    📅 Events: lu.ma/tde
    🗞️ Newsletter: the-data-entrepreneurs.ck.pag...
    Support ❤️
    www.buymeacoffee.com/shawhint
    Intro - 0:00
    How much does it cost? - 1:30
    4 Key Steps - 3:55
    Step 1: Data Curation - 4:19
    1.1: Data Sources - 5:31
    1.2: Data Diversity - 7:45
    1.3: Data Preparation - 9:06
    Step 2: Model Architecture (Transformers) - 13:17
    2.1: 3 Types of Transformers - 15:13
    2.2: Other Design Choices - 18:27
    2.3: How big do I make it? - 22:45
    Step 3: Training at Scale - 24:20
    3.1: Training Stability - 26:52
    3.2: Hyperparameters - 28:06
    Step 4: Evaluation - 29:14
    4.1: Multiple-choice Tasks - 30:22
    4.2: Open-ended Tasks - 32:59
    What's next? - 34:31

Komentáře • 195

  • @ShawhinTalebi
    @ShawhinTalebi  Před 7 měsíci +12

    [Correction at 15:00]: words on vertical axis are backward. It should go "I hit ball with baseball bat" from top to bottom not bottom to top.
    🔧Fine-tuning: czcams.com/video/eC6Hd1hFvos/video.html
    🤖Build a Custom AI Assistant: czcams.com/video/4RAvJt3fWoI/video.html
    👉Series playlist: czcams.com/play/PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0.html
    📰 Read more: towardsdatascience.com/how-to-build-an-llm-from-scratch-8c477768f1f9?sk=18c351c5cae9ac89df682dd14736a9f3
    --
    More Resources
    [1] BloombergGPT: arxiv.org/pdf/2303.17564.pdf
    [2] Llama 2: ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/
    [3] LLM Energy Costs: www.statista.com/statistics/1384401/energy-use-when-training-llm-models/
    [4] arXiv:2005.14165 [cs.CL]
    [5] Falcon 180b Blog: huggingface.co/blog/falcon-180b
    [6] arXiv:2101.00027 [cs.CL]
    [7] Alpaca Repo: github.com/gururise/AlpacaDataCleaned
    [8] arXiv:2303.18223 [cs.CL]
    [9] arXiv:2112.11446 [cs.CL]
    [10] arXiv:1508.07909 [cs.CL]
    [11] SentencePience: github.com/google/sentencepiece/tree/master
    [12] Tokenizers Doc: huggingface.co/docs/tokenizers/quicktour
    [13] arXiv:1706.03762 [cs.CL]
    [14] Andrej Karpathy Lecture: czcams.com/video/kCc8FmEb1nY/video.html
    [15] Hugging Face NLP Course: huggingface.co/learn/nlp-course/chapter1/7?fw=pt
    [16] arXiv:1810.04805 [cs.CL]
    [17] arXiv:1910.13461 [cs.CL]
    [18] arXiv:1603.05027 [cs.CV]
    [19] arXiv:1607.06450 [stat.ML]
    [20] arXiv:1803.02155 [cs.CL]
    [21] arXiv:2203.15556 [cs.CL]
    [22] Trained with Mixed Precision Nvidia: docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html
    [23] DeepSpeed Doc: www.deepspeed.ai/training/
    [24] paperswithcode.com/method/weight-decay
    [25] towardsdatascience.com/what-is-gradient-clipping-b8e815cdfb48
    [26] arXiv:2001.08361 [cs.LG]
    [27] arXiv:1803.05457 [cs.AI]
    [28] arXiv:1905.07830 [cs.CL]
    [29] arXiv:2009.03300 [cs.CY]
    [30] arXiv:2109.07958 [cs.CL]
    [31] huggingface.co/blog/evaluating-mmlu-leaderboard
    [32] www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf

    • @amortalbeing
      @amortalbeing Před 7 měsíci

      thanks a lot for the refs , Shahin Jan ❤
      keep up the great job 👍

  • @LudovicCarceles
    @LudovicCarceles Před měsícem +4

    "Garbage in, garbage out" is also applicable to our brain. Your videos are certainly high quality inputs.

  • @seanwilner
    @seanwilner Před 4 měsíci +43

    This is a about as perfect a coverage of this topic as I could imagine. I'm a researcher with a PhD in NLP who trains LLMs from scratch for a living and often find myself in need of communicating the process in a way that's digestible to a broad audience without back and forth question answering, so I'm thrilled to have found your piece!
    As an aside, I think the token order on the y-axis of the attention mask for decoders on slide 10 is reversed

    • @ShawhinTalebi
      @ShawhinTalebi  Před 4 měsíci +3

      Thanks Sean! It's always a challenge to convey technical information in a way that both the researcher and general audience can get value from. So your approval means a lot :)
      Thanks for pointing the out. The blog article has a corrected version: medium.com/towards-data-science/how-to-build-an-llm-from-scratch-8c477768f1f9?sk=18c351c5cae9ac89df682dd14736a9f3

    • @AritraDutta-tz4je
      @AritraDutta-tz4je Před 21 dnem

      Sir can you tell me how are you training your llms?

    • @xxcusme
      @xxcusme Před 4 dny

      most of people watching this video is through certain prompt of how to build LLM and these people is the rest 10% by your logic, the makers & inventors

  • @barclayiversen376
    @barclayiversen376 Před měsícem +6

    Pretty rare that I actually sit through an entire 30+ minute video on youtube. Well done.

  • @tehreemsyed8621
    @tehreemsyed8621 Před 16 dny +1

    This is such a fantastic video on building LLMs from scratch. I'll watch it repeatedly to implement it for a time-series use case. Thank you so much!!

  • @dauntlessRx
    @dauntlessRx Před 2 měsíci +7

    This is literally the perfect explanation for this topic. Thank you so much.

  • @shilpyjain6147
    @shilpyjain6147 Před měsícem +1

    Hey Shaw - Thank you for coming up with this extensive video on building LLM from Scratch, it certainly gives a fair idea on, how some of the existing LLMs were created !

  • @racunars
    @racunars Před 7 měsíci +9

    All the series on using large language models (LLMs) are really very helpful. This 6th article, really helps me to understand in a nutshell the transformer architecture. Thank you. 👏

  • @asha328
    @asha328 Před 2 měsíci +1

    One of the best videos explaining the process and cost to build LLM🎉.

  • @ethanchong1026
    @ethanchong1026 Před 5 měsíci

    Thanks for putting together this short video. I enjoy learning this subject from you.

  • @theunconventionalenglishman

    This is excellent - thanks for putting this together and taking the time to explain things so clearly!

  • @GBangalore
    @GBangalore Před 4 měsíci +3

    Thank you so much for putting these videos together and this one in particular. This is such a broad and complex topic and you have managed to make it as thorough as possible in 30ish minute😮 timeframe which I thought was almost impossible.

    • @ShawhinTalebi
      @ShawhinTalebi  Před 4 měsíci

      My pleasure, glad it was informative yet concise :)

  • @bradstudio
    @bradstudio Před 4 měsíci +2

    This was a very thorough introduction to LLMs and answered many questions I had. Thank you.

  • @Hello_kitty_34892
    @Hello_kitty_34892 Před 4 měsíci +6

    Your voice is relaxing.. I love that you don't speak super fast like most tech bros... And you seem relaxed about the content rather than having this "in a rush" energy. def would watch you explain most things LLM and AI! Thanks for the content.

    • @ShawhinTalebi
      @ShawhinTalebi  Před 4 měsíci

      Thanks for the feedback. More AI/LLM content to come!

  • @DigsWigs2022
    @DigsWigs2022 Před 28 dny

    Great explanation. I will have to watch it a few times to have a basic understanding 😂

  • @aldotanca9430
    @aldotanca9430 Před 6 měsíci +1

    Thoroughly researched and referenced, clear explanations inclusive of examples. I will watch it again to take notes. Thanks so much!

    • @ShawhinTalebi
      @ShawhinTalebi  Před 6 měsíci +1

      Great to hear! Feel free to reach out with any questions or suggestions for future content :)

    • @aldotanca9430
      @aldotanca9430 Před 6 měsíci

      Thanks! I would have plenty of questions actually, but they are probably a bit too specific to make for a generally relevant video. I am exploring options for a few non-profit projects related to musical education and research. They need to integrate large bodies of text and produce precise referencing to what comes from where, so I was naively toying with the idea to perhaps produce a base model partially trained on the actual text in question. Which, I understood from the video, is a non-starter. So I will look into fine-tuning, RAG and prompt engineering. I suspect I will spend quite a lot of time watching your convent, given you covered quite a lot. I also learned quite a bit more from this specific video. Right now I am studying the basics, including a bit of the math involved, and it is a bit slow going, so I am quite grateful :)

    • @ShawhinTalebi
      @ShawhinTalebi  Před 6 měsíci +1

      @@aldotanca9430 That sounds like a really cool use case (I've been a musician for over 14 years)!
      If you want to chat about more specific questions feel free to set up some office hours: calendly.com/shawhintalebi/office-hours

    • @aldotanca9430
      @aldotanca9430 Před 6 měsíci

      @@ShawhinTalebi That's very generous of you! I will book a slot, would love to chat, I think it would help me immensely to rule out blind alleys and at least get a well informed idea of what is feasible to attempt. I did notice the congas, piano and Hanon lurking in the background, so I suspected the topic will be interesting to you. It is about historical research, but it is also very applicable and creative for improvvisation. Perhaps I can compile a very short list of interesting resources, in case you want to check it out at some point for musical reasons :)

  • @goldholder8131
    @goldholder8131 Před 3 měsíci +2

    This is the most comprehensive and well rounded presentation I've ever seen in my life, topic aside. xD Bravo, good Sir.

  • @chrstfer2452
    @chrstfer2452 Před 6 měsíci +3

    That was simply incredible, how the heck does it have under 5k views. Literal in-script citations, not even cards but vocal mentions!! Holy shit im gonna share this channel with all my LLM enamored buddies

    • @ShawhinTalebi
      @ShawhinTalebi  Před 6 měsíci +2

      Thanks, I'm glad it was helpful. You're referrals are greatly appreciated 😁

  • @joedigiovanni8758
    @joedigiovanni8758 Před 3 měsíci +1

    Great job demystifying what is happening under the hood of these LLMs

  • @EigenA
    @EigenA Před 2 měsíci

    Great channel, 3rd video in. You earned a sub. Thank you!

  • @sinan325
    @sinan325 Před 6 měsíci +8

    I am not a programmer or now anything about programming or LLMs but I find this topic fascinating. Thank you for your videos and sharing your knowledge.

    • @ShawhinTalebi
      @ShawhinTalebi  Před 6 měsíci

      Happy to help! I hope they were valuable.

  • @qicao7769
    @qicao7769 Před 4 měsíci +1

    Best and most efficient video about the basic of LLM!!!! I think I have saved 10h for reading. Thanks!

  • @robwarner1858
    @robwarner1858 Před 4 měsíci +1

    Amazing video. Lost me through a fair bit, but I came away understanding more than I ever have on the subject. Thank you.

  • @lihanou
    @lihanou Před 2 měsíci

    clicked with low expectation, but wow what a gem. Great clarity with just the right amount of depth for beginners and intermediate learners.

  • @aalaptube
    @aalaptube Před 4 měsíci

    Very good information! Answered a lot of core questions I had.

  • @ifycadeau
    @ifycadeau Před 7 měsíci

    Love these videos! Keep it up Shaw!

  • @mujeebrahman5282
    @mujeebrahman5282 Před 4 měsíci +3

    I am typing this after watching half of the video as I am already amazed with the clarity of explanation. exceptional.

    • @ShawhinTalebi
      @ShawhinTalebi  Před 3 měsíci

      Thanks, hope the 2nd half didn't disappoint!

  • @ares106
    @ares106 Před 7 měsíci +3

    thank you, this is infinitely more enjoyable for me than reading a paper.

    • @ShawhinTalebi
      @ShawhinTalebi  Před 7 měsíci +1

      😂😂 I’m glad you liked it!

    • @fab_spaceinvaders
      @fab_spaceinvaders Před 7 měsíci +1

      second this, keep the good work flowing all around 🎉 🙏

  • @shih-shengchang19
    @shih-shengchang19 Před 3 měsíci

    Thanks for your video; it's awesome. You explain everything very clearly and with good examples.

    • @ShawhinTalebi
      @ShawhinTalebi  Před 3 měsíci

      Thanks for the feedback, glad it was clear :)

  • @rezNezami
    @rezNezami Před 5 měsíci

    excellent job Shawhin. Merci.

  • @randomforest_dev
    @randomforest_dev Před 3 měsíci

    Awesome Video! Thanks.

  • @funnymono
    @funnymono Před 3 měsíci

    Exceptional material

  • @romantolstykh7488
    @romantolstykh7488 Před 3 měsíci

    Great video!

  • @LezzGoPlaces
    @LezzGoPlaces Před 2 měsíci

    Brilliant!

  • @Kiririn
    @Kiririn Před 3 měsíci

    thank you for making this

  • @user-cu7vl1jj9v
    @user-cu7vl1jj9v Před 3 měsíci

    Very very good!!

  • @MegaBenschannel
    @MegaBenschannel Před 5 měsíci

    Thanks for the great and pack expose. 😀

  • @SamChughtai
    @SamChughtai Před 4 měsíci +2

    Thanks, Shaw!! Great video and excellent data, would love to be your mentee, sir!!

    • @ShawhinTalebi
      @ShawhinTalebi  Před 4 měsíci +1

      Thank you for your generosity! I don't currently do any formal mentorship, but I try to give away all my secrets on CZcams and Medium :)
      Feel free to share any suggestions for future content.

  • @YohannesAssefa-wk5oo
    @YohannesAssefa-wk5oo Před 5 měsíci

    thankyou bro for your help

  • @PorterHarris
    @PorterHarris Před 3 měsíci

    Great content Shaw!
    Next step Im having troubles figuring out, is there a way to run locally an existing GPT and do prompt engineering or model fine-tuning on it with my own training data?

    • @ShawhinTalebi
      @ShawhinTalebi  Před 3 měsíci

      Thanks! While this depends on your local machine specs, the short answer is yes! My next video will actually walk through how to do this using an approach called QLoRA.

  • @echofloripa
    @echofloripa Před 7 měsíci

    Wow, what a great content, thanks for that!! In LLM Fine-tuning, is there also a suggestion table between number of trainable parameters and tokens used (dataset size)?

    • @ShawhinTalebi
      @ShawhinTalebi  Před 7 měsíci +1

      That’s a great question. While I haven’t come across such a table, good rule of thumb is 1k-10k examples depending on the use case.

    • @echofloripa
      @echofloripa Před 7 měsíci

      @@ShawhinTalebi thanks for the quick reply! What about the number of trainable parameter, should we worry about that? What if my number of examples is smaller than that let's say a 100 to 200?

    • @ShawhinTalebi
      @ShawhinTalebi  Před 7 měsíci

      ​@@echofloripa IMO you've got to work with what you've got. I've heard some people get sufficient performance from just 100-200 examples, but it ultimately comes down to what is acceptable for that particular use case. It might be worth a try.
      Hope that helps!

  • @akramsystems
    @akramsystems Před 3 měsíci

    This is Gold

  • @arpadbrooks5317
    @arpadbrooks5317 Před 5 měsíci

    very informative thx

  • @wilfredomartel7781
    @wilfredomartel7781 Před 5 měsíci +1

    🎉❤❤❤amazing video

  • @gRosh08
    @gRosh08 Před měsícem +1

    Cool.

  • @mitdasondi2171
    @mitdasondi2171 Před 2 měsíci

    watching this right before my interview.

    • @ShawhinTalebi
      @ShawhinTalebi  Před 2 měsíci

      Good luck!

    • @mitdasondi2171
      @mitdasondi2171 Před 2 měsíci

      @@ShawhinTalebi cleared 1st round, now its on Thursday, i hope your luck brings me my dream job ❤️

  • @ronakbhatt4880
    @ronakbhatt4880 Před 6 měsíci +1

    @17:08 isnt weight of decoders are wrong if 0 is the weight of token to the future token to it?

    • @ShawhinTalebi
      @ShawhinTalebi  Před 6 měsíci

      Sorry I didn't understand your question. Could you rephrase?

  • @philtoa334
    @philtoa334 Před 7 měsíci

    Nice.

  • @vsudbdk5363
    @vsudbdk5363 Před 6 měsíci +1

    Any resources on enrichment of prompt template, I feel in my case difficult one to understand and implement as an LLM returns response based on how we define the template overcoming unecessary context...

    • @vsudbdk5363
      @vsudbdk5363 Před 6 měsíci

      Recently begun exploring Generative AI need like proper guidance on where to learn and do the code part, ik it will be a long journey understanding the math behind it, learning concept and code, staying all night for checkpointing metrics, performance and all.. thank you

    • @ShawhinTalebi
      @ShawhinTalebi  Před 6 měsíci +1

      Great question. The video on prompt engineering might be helpful: czcams.com/video/0cf7vzM_dZ0/video.html

    • @ShawhinTalebi
      @ShawhinTalebi  Před 6 měsíci +1

      That's a good mindset to have. AI is an ocean, with endless things one can learn.
      This playlist could be a good starting place: czcams.com/play/PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0.html

    • @vsudbdk5363
      @vsudbdk5363 Před 6 měsíci

      @@ShawhinTalebi thank you very much

  • @techdiyer5290
    @techdiyer5290 Před 5 měsíci +3

    What if you could make a small language model, that maybe only understand english, can understand code, and is easy to run?

    • @ShawhinTalebi
      @ShawhinTalebi  Před 5 měsíci

      That is a compelling notion. If we can get there, then it would make this technology even more accessible and impactful.

    • @shrinik1969
      @shrinik1969 Před 5 měsíci

      Size = accuracy...small may not give u what u want

    • @Decapodd
      @Decapodd Před 14 dny

      NanoChatGPT

  • @rajez.s7157
    @rajez.s7157 Před 5 měsíci +1

    Can Ray clusters be used here for mutiple GPUs training of LLMs?

    • @ShawhinTalebi
      @ShawhinTalebi  Před 5 měsíci

      I haven't used Ray clusters before, but skimming their website it seems like it was specifically made for ML workloads.

  • @vijayakashallenki7275
    @vijayakashallenki7275 Před měsícem +1

    Waiting for the complete AI-ML playlist! sir please

  • @jackflash6377
    @jackflash6377 Před 6 měsíci +3

    Just now asking GPT4.0 to help me with training text. It is not allowed to assist in training any LLMs and would not give me anything.

    • @ShawhinTalebi
      @ShawhinTalebi  Před 6 měsíci +1

      I believe it’s now against OpenAI’s policy to use their models to train other models. You may need to look to open-source solutions eg Llama2, Mistral

    • @petevenuti7355
      @petevenuti7355 Před 5 měsíci

      ​@@ShawhinTalebi . How could it possibly stop it ? If the model being trained fed the prompt and used the response for reenforcment and alignment?

  • @Nobody2310
    @Nobody2310 Před měsícem

    what is the most basic technical artifact that is used/required to build any LLM? Is that an existing LLM such as Llama 2?

    • @ShawhinTalebi
      @ShawhinTalebi  Před 25 dny

      I am not quite sure of the meaning of "most basic technical artifact," but here's one perspective. There are two ways to build an LLM: from scratch and fine-tuning. When training from scratch, the essential piece is the training data used to develop the model. When fine-tuning, the essential piece is the pre-trained model you start from (e.g., Llama2).
      Hope that helps!

  • @user-oo2co6xb8u
    @user-oo2co6xb8u Před 5 měsíci

    Hi, i have domain specific pdf files . How do i train using transfer learning? Please advise

    • @ShawhinTalebi
      @ShawhinTalebi  Před 5 měsíci

      Depends on what you mean by transfer learning. If you simply want to extract knowledge from a PDF I'd recommend exploring RAG or using off-the-shelf solutions like OpenAI Assistants interface.
      Happy to clarify, if I misinterpreted the question.

  • @nobafan7515
    @nobafan7515 Před 3 měsíci

    Thank you for the video! I was wondering if you can help me. Lets say i ask gpt if romeo and juliet was a comedy or a tragedy, and the only data it has was put in by people that didnt have time to fact check the data, and i wanted my own gpt (lets say this is one of the tiny ones that can easily run on my laptop) so it can explain the history of it so it can explain to me the facts of it.
    Do i need to dive in the llm model and find that specific data to correct it? Can i fine tune it to improve it (lets say i have a gpu big enough to train this llm)? Is the model fine, but i need a different gpt?

    • @ShawhinTalebi
      @ShawhinTalebi  Před 2 měsíci +1

      If I understood correctly, the question is on how to ensure the LLM gives accurate response.
      While there are several ways one can do this, the most effective way to give a model specialized and accurate information is via a RAG system. This consists of providing the model specific information from a knowledge base depending on the user prompt.

  • @amparoconsuelo9451
    @amparoconsuelo9451 Před 6 měsíci

    Can a fine-tuned LLM be repurposed and re-fine-tuned for more than one task?

    • @ShawhinTalebi
      @ShawhinTalebi  Před 5 měsíci

      Yes it can! In fact, that is what OpenAI did with their RLHF technique to create their InstructGPT models

  • @abhishekfnu7455
    @abhishekfnu7455 Před 2 měsíci

    Is there a way to use Data Dictionary to train LLM model to generate SQL queries later on?

    • @ShawhinTalebi
      @ShawhinTalebi  Před 2 měsíci

      Yes, but you will likely need to transform the data a bit before it can be used for fine-tuning. I give a concrete example of this here: czcams.com/video/4RAvJt3fWoI/video.html

  • @hashifvs519
    @hashifvs519 Před 5 měsíci

    Can you post a video onu continual pretraining of llms like Llama

    • @ShawhinTalebi
      @ShawhinTalebi  Před 5 měsíci +1

      Thanks for the great suggestion. I’ll be doing more content on fine-tuning so that will be a good topic to cover there.

  • @Nursultan_karazhigit
    @Nursultan_karazhigit Před 3 měsíci

    Hello , Thanks . Do you know is it possible to create an own LLM for own startup?

    • @ShawhinTalebi
      @ShawhinTalebi  Před 2 měsíci

      Of course this is possible. However, it is rarely necessary. I'd suggest seeking simpler (and cheaper) solutions before jumping to training an LLM from scratch.

    • @Nursultan_karazhigit
      @Nursultan_karazhigit Před 2 měsíci

      @@ShawhinTalebi thanks

  • @siyufan1084
    @siyufan1084 Před 7 měsíci

    Arrived right on time! The quality of the video is consistently excellent as always

    • @ShawhinTalebi
      @ShawhinTalebi  Před 7 měsíci

      Great to hear! I'm glad they are helpful :)

  • @Sunnyangusyoung
    @Sunnyangusyoung Před 2 měsíci

    What if I don’t want to build my model but work for someone who is building one.

  • @varadacharya2802
    @varadacharya2802 Před dnem

    Can you make a series on Data Science and Artificial Intelligence Topics

  • @dohua_ai
    @dohua_ai Před 7 měsíci

    So my dreams about own LLM are broken(( So as i understood the only way to build some personal LLM is FineTuning? Atleast while cheap ways of training not appeared yet...

    • @ShawhinTalebi
      @ShawhinTalebi  Před 7 měsíci +1

      I wouldn't give up on it! My (optimistic) conjecture is as we better understand how these models actually work we will be able to develop ones that are much more computationally efficient.

  • @issair-man2449
    @issair-man2449 Před 5 měsíci

    Hi, hoping that my comment will be seen and responded... I FAIL to understand:
    If a simple model learns/predicts, couldn't we prompt it to delete the trash data and train itself by itself autonomously until the model becomes super intelligent?

    • @ShawhinTalebi
      @ShawhinTalebi  Před 5 měsíci

      LLMs alone only do token prediction, as discussed in the first video of this series: czcams.com/video/tFHeUSJAYbE/video.html
      While an AI system could in principle train itself, it would require much than just LLM to pull that off.

  • @hari_madh
    @hari_madh Před 4 měsíci

    bro i want to build an LLM.. does this video help me learn myself and build LLM myself? possible? (i did not see it till now)

    • @ShawhinTalebi
      @ShawhinTalebi  Před 4 měsíci

      While this video may be a helpful first step, more resources will be necessary. Here are a few additional resources I recommend.
      - czcams.com/video/kCc8FmEb1nY/video.html&ab_channel=AndrejKarpathy
      - huggingface.co/learn/nlp-course/chapter1/1?fw=pt

  • @lyonspeterson1094
    @lyonspeterson1094 Před 2 měsíci

    Good contents. But when I watch the video, there are so many ads. I;m even confused what I am supposed to watch.

  • @abcoflife6420
    @abcoflife6420 Před 5 měsíci +1

    Thank you so much for rich information, my target is to DIY one from scratch .. 😢 for sure it wont be billions of tokens, I want to make it practical for example for home management, or school reporting system ... instead of static reports . to enable it to create and run its own sql queries and run it .. 😅

    • @ShawhinTalebi
      @ShawhinTalebi  Před 5 měsíci

      Happy to help! To make something practical I'd recommend using an existing model fine-tuned to generate SQL queries e.g. huggingface.co/defog/sqlcoder

  • @TheIronMason
    @TheIronMason Před 2 měsíci

    When it comes to transformers. Are you saying they're more than meets the eye?

  • @catulopsae
    @catulopsae Před měsícem

    What does it mean the amount of parameters???

    • @ShawhinTalebi
      @ShawhinTalebi  Před měsícem +1

      Good question. A model is something that takes an input (say a sequence of words) and produces an output (e.g. the next most likely word). Parameters are numbers which define how the model takes inputs and translates them into outputs.

    • @catulopsae
      @catulopsae Před měsícem

      @@ShawhinTalebi thank you

  • @hypercoder-gaming
    @hypercoder-gaming Před 5 měsíci +1

    When you were calculating the cost, you estimates that a 10b model would take 100k GPU hours but Llama 2 took 180k GPU hours and that was 7b. These estimates are way off. How is it that 100b costs less than 70b?

    • @ShawhinTalebi
      @ShawhinTalebi  Před 5 měsíci +1

      The numbers from Llama 2 were only meant to give an idea of scale. More precise estimates will depend on the details of the use case.

  • @guerbyduval4104
    @guerbyduval4104 Před měsícem

    Do you have a course on how to do it as a programmer instead of *like a chat gpt talker* ?

    • @ShawhinTalebi
      @ShawhinTalebi  Před měsícem

      I don't have a from scratch coding tutorial yet. But I am a fan of the one from Andrej Karpathy: czcams.com/video/kCc8FmEb1nY/video.html

  • @MaxQ10001
    @MaxQ10001 Před 4 měsíci +2

    Is it just me, or does the math on the "how much does it cost" not make sense? 7b uses 180,000 hours, so 10b used 100,000 🤔 hours

    • @ShawhinTalebi
      @ShawhinTalebi  Před 4 měsíci +1

      This is only meant to give a sense of the cost's scale, so I round to the nearest order of magnitude :)

  • @crosstalk125
    @crosstalk125 Před 4 měsíci

    Hi, I like your content. But I want to point out that what you are calling tokenization is vectorization. Tokenization breaks documents/sentences/words into subpart and vectorization converts tokens into numbers. Thanks

    • @ShawhinTalebi
      @ShawhinTalebi  Před 4 měsíci

      Thanks for raising that point. Here I'm lumping the two together, but these are 2 separate steps.

  • @Joooooooooooosh
    @Joooooooooooosh Před 3 měsíci +1

    Wait how did we get from $180K for a 7B model to $100K for a 10B model...

    • @ShawhinTalebi
      @ShawhinTalebi  Před 3 měsíci

      This is what we Physicists call an "order-of-magnitude estimate"

  • @jamesmurdza
    @jamesmurdza Před 4 měsíci

    The matrices at 16:40 don't look right to me. I think the words labelling the rows should go from top to bottom, not bottom to top.

    • @ShawhinTalebi
      @ShawhinTalebi  Před 4 měsíci

      Good catch! Yes, the word labels are inverted on the Y axis. A corrected visualization is provided in the blog: medium.com/towards-data-science/how-to-build-an-llm-from-scratch-8c477768f1f9?sk=18c351c5cae9ac89df682dd14736a9f3

  • @MrSpikegee
    @MrSpikegee Před 2 měsíci

    The cost estimation does not seem correct, you have to take into account the time taken to train to estimate the hardware buying scheme, here you estimate as if we wanted to train the model in one hour - if you accept for eg. 4 days approx. 100h, you need only ten A100 which would be 100k€, so less than with the renting option.

    • @ShawhinTalebi
      @ShawhinTalebi  Před 2 měsíci

      I may be missing something here. Based on the numbers in the Llama paper, training 10B and 100B parameter models with 10 A100s, would take about 10,000 hr (1.1 years) and 100,000 hr (11 years), respectively.

  • @imaspacecreature
    @imaspacecreature Před 3 měsíci

    Some of you need to look into your own "internal libraries", this video is someone attempting to teach you where to get the fish. Some of you get so hungry for the fish, but won't even understand the water it resides in.

  • @Vermino
    @Vermino Před 7 měsíci

    Just my prediction on data curation & copyright, what is going to happen is the companies of LLM's will do what Google did back in the day with scraping websites. Then once legislation passes, they will say "if you don't want your data crawled, opt-out with robots.txt". Right now it's a grey area and companies are building their data as quickly as possible to get in before the regulation.
    "Better to ask for forgiveness, rather than permission."

    • @ShawhinTalebi
      @ShawhinTalebi  Před 7 měsíci +1

      I can see that. It seems to be a hot-button topic these days.

  • @PabloPernambuco
    @PabloPernambuco Před 2 měsíci

    Now, I am discovering my low QI... 0,001% of learning...😂

  • @gamea9g349
    @gamea9g349 Před měsícem

    why cant they use algorithims insted of this huge data sets

    • @ShawhinTalebi
      @ShawhinTalebi  Před měsícem

      This is the big paradigm shift with machine learning (ML). Traditionally programmers implement logic into computers line by line. However, there are some tasks that are difficult (if not impossible) to explicitly program computers to do (e.g. driving, responding to open-ended prompts, etc). For these tasks ML allows us to program computers with data.

  • @DYLOGaming
    @DYLOGaming Před měsícem

    Well this was a video that explains things conceptually but does 0 building from scratch

    • @ShawhinTalebi
      @ShawhinTalebi  Před měsícem

      Sorry for the lack of code in this one, maybe other videos in the series will be more helpful. An end-to-end technical guide on building an LLM isn't something I can fit in one video, but I hope to share more videos toward this end in the future.

  • @guerbyduval4104
    @guerbyduval4104 Před měsícem

    3:23 What is the point of talking gpu hours and money spent on cloud. I have and Nvidia RTX 3070 with 5120 cuda cores. I don't care spending money to train. I just can train on my laptop.

    • @ShawhinTalebi
      @ShawhinTalebi  Před měsícem

      That was to give context on training an LLM from scratch. This is negligible for smaller models.

  • @bittertruthnavin
    @bittertruthnavin Před 4 měsíci +43

    90 % of the people out there are just trying learn how to drive a car, not how to build a car. All these youtubers are showing is how to build a car which further complicates the process of new AI adoption and demotivate people. I just saw video by this lady somewhere and she showed she trained mistral 7b LLM in less than 1 hours for $1. Folks keep looking for better and simple information, remember we don't need to know how to build a car in order drive one.

    • @saisriramakarthikeyakollur3989
      @saisriramakarthikeyakollur3989 Před 4 měsíci +4

      I'm curious to know more about mistral 7b. Any chance you could share your source? All I found was the trained model on huggingface

    • @ShawhinTalebi
      @ShawhinTalebi  Před 4 měsíci +14

      Excellent point. Using these models to solve problems is much more impactful than knowing how they work.
      Other videos in this series are more hands-on than this one. Suggestions for future content are also appreciated :)
      czcams.com/play/PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0.html

    • @Drdozzer
      @Drdozzer Před 4 měsíci +29

      What if what I want to do is build a car 😅

    • @scotthjackson5651
      @scotthjackson5651 Před 3 měsíci +31

      I literally want to know about how to build an LLM.

    • @imaspacecreature
      @imaspacecreature Před 3 měsíci +12

      This comment is extremely concerning. Do what you will, but a LOT of humans are PROGRAMMED CONSUMERS. Aka, you are like a Tool, you only do what you are given, like a task. Don't uplift or look down on other for wanting to know how things work.
      And also, some people drive cars and don't even know *how to drive, and they kill people every minute on a roadway. You should rethink your own logic on this, before you even entertain a "artificial intelligence"..

  • @genkidama7385
    @genkidama7385 Před měsícem

    internet is full of trash to be honest. train llm on books would be a great leap for mankind. code datasets are full of "oh its too complex" and only produces comments instead of code. Machine speaking as a human and complaining about things beeing too complicated to do. thats how lame datasets are. computer is a computing device and refuses to work because the dataset says things are too complex. jesus. unbelievable levels of trash.

    • @ShawhinTalebi
      @ShawhinTalebi  Před měsícem

      The greatest challenge is that their aren't enough (high-quality) books to keep scaling these models. How we solve this problem is going to define the next wave of LLMs. (Maybe we'll switch to lectures and other videos)

  • @handleking1
    @handleking1 Před 2 měsíci

    Bro doesn't even know how to spell NVDIA correctly.

  • @nick066hu
    @nick066hu Před 2 měsíci

    Thank you for putting together this video, helped me a lot to understand LLM training.
    One question: with the advent of trillion token models and beyond, I wonder where will we get all that training input data from. I guess we already consumed what all humanity has produced in the last 5000 years, and by adding another 10M digitized cat videos, the models will not be smarter.

    • @ShawhinTalebi
      @ShawhinTalebi  Před 2 měsíci

      Good question! I suspect there is still much content out there that hasn't been touched by LLMs i.e. non-digital text and proprietary data. Nevertheless, this content is still finite and the "just make a bigger model" approach will eventually hit a limit.

  • @ButchCassidyAndSundanceKid
    @ButchCassidyAndSundanceKid Před 2 měsíci

    A bit disappointed with this video, there is no coding !

    • @ShawhinTalebi
      @ShawhinTalebi  Před 2 měsíci

      Thanks for the feedback. Yes this one stays relatively high-level since a coding video would be (much) longer.
      For a hands-on guide I'd recommend Andrej Karpathy's video: czcams.com/video/kCc8FmEb1nY/video.html

  • @thechoosen4240
    @thechoosen4240 Před 4 měsíci

    Good job bro, JESUS IS COMING BACK VERY SOON;WATCH AND PREPARE

  • @gameplaychulangado494
    @gameplaychulangado494 Před 4 měsíci

    i am from brazil and my english is not so nice but i did understand you almost well. I think that biggest challenge will be math and matrix 3D and more. 🥲