Deep-dive into the AI Hardware of ChatGPT

Sdílet
Vložit
  • čas přidán 17. 05. 2024
  • With our special offer you can get 2 years of NordPass with 1 month free for a personal account: www.nordpass.com/highyieldnor...
    Or use code highyieldnordpass at checkout.
    Business accounts (must register with a biz domain) can get a free 3-month trial of NordPass: www.nordpass.com/highyieldbus... with code highyieldbusiness in the form.
    ....
    What hardware was used to train ChatGPT and what does it take to keep it running? In this video we will take a look at the AI hardware behind ChatGPT and figure out how Microsoft & OpenAI use machine learning and Nvidia GPUs to create advanced neural networks.
    Support me on Patreon: www.patreon.com/user?u=46978634
    Follow me on Twitter: / highyieldyt
    Links
    The Google research paper that changed everything: arxiv.org/abs/1706.03762
    The OpenAI research paper confirming Nvidia V100 GPUs: arxiv.org/abs/2005.14165
    0:00 Intro
    0:28 AI Training & Inference
    2:22 Microsoft & OpenAI Supercomputer
    4:08 NordPass
    5:57 Nvidia Volta & GPT-3
    9:05 Nvidia Ampere & ChatGPT
    13:23 GPT-3 & ChatGPT Training Hardware
    14:41 Cost of running ChatGPT / Inference Hardware
    16:06 Nvidia Hopper / Next-gen AI Hardware
    17:58 How Hardware dictates the Future of AI
  • Věda a technologie

Komentáře • 401

  • @HighYield
    @HighYield  Před rokem +25

    With our special offer you can get 2 years of NordPass with 1 month free for a personal account: www.nordpass.com/highyieldnordpass
    Or, use code highyieldnordpass at the checkout.
    Business accounts (must register with a biz domain) can get a free 3-month trial of NordPass at www.nordpass.com/highyieldbusiness with code highyieldbusiness in the form.

    • @fitybux4664
      @fitybux4664 Před rokem

      Are we all speaking to the same ChatGPT? What if OpenAI trained a sub-model with fewer parameters if users don't ask complicated questions or only for a certain subject? Then they could maybe run inference cheaper by using a smaller model. Maybe this could be detected after the first question or two that ChatGPT is asked?
      If I had a bill of a million dollars a day, these sorts of optimizations would definitely make sense!

    • @GermanMythbuster
      @GermanMythbuster Před rokem

      Who the F*** listens to 2 Min. of Ads?!
      Skipped that crap.
      Any more than 20 sec. and nobody cares any more.
      Don't know if Nord had this stupid idea or you to make the ad so long but who ever it was it is a f***ing stupid idea!

    • @akselrasmussen3386
      @akselrasmussen3386 Před rokem

      I prefer Avira PWM

    • @memerified
      @memerified Před rokem

      Ju

  • @wiredmind
    @wiredmind Před rokem +518

    I think in a few years, AI accelerator cards will be the next video cards. A race to the top for the most powerful accelerator to be able to train and run AI locally on our own PCs, bypassing the need to pay for filtered models from large companies. Once people can run this kind of thing independently, that's when things will start getting _really_ exciting.

    • @mishanya1162
      @mishanya1162 Před rokem +6

      Its not about running AI locally. Todays gpus (40 series) mostly gain fps from AI upscaling. Frame generation and so. So, who makes better AI and soft - will win

    • @mrmaniac9905
      @mrmaniac9905 Před rokem +21

      training a model is a very involved process.. I could see consumer cards that can run pre-trained sets, but the average consumer will not be training on their own

    • @Amipotsophspond
      @Amipotsophspond Před rokem

      a break threw is coming out that will enable black boxes to be in part of the networks, this will enable to build networks in small parts that go in big networks rather then training it all at once and getting rid of it for the next unrelated network. it will not be a service forever that's just to trick WEF/CCPchina money in to giving them start up money.

    • @ralphclark
      @ralphclark Před rokem +9

      This will require someone to create an open source ontology that you can use to preload your AI as a starting point. Training one completely from bare metal with only your own input will be beyond everyone except large corporations with deep pockets.

    • @Tomas81623
      @Tomas81623 Před rokem +6

      I don't think people in their home will really have a need to run large models as they would still be very expensive and other than privacy, little advantage. On the other hand, I can definitely see business having a need for them, multiple.

  • @matthewhayes7671
    @matthewhayes7671 Před 8 měsíci +5

    I'm a newer subscriber, working my way back through your recent videos. I just want to tell you that I think this is the best tech channel on CZcams right now, hands down. You are a wonderful teacher, you take frequent breaks to summarize key points, you provide ample context and visual aids, and when you do make personal guesses or offer your own opinions, it's always done in a transparent and logical manner. Thank you so much, and keep up the amazing work. I'll be here for every upload going forward.

  • @guyharris8007
    @guyharris8007 Před rokem +3

    Dev here... gotta say I love it. Thoroughly enjoyable thank you for your time!

  • @klaudialustig3259
    @klaudialustig3259 Před rokem +19

    Great video!
    I'd like to add one thing: in the segment starting at 16:06 where you talk about the Nvidia Hopper H100, in the context of Neural Networks the most important number to compare to the previous A100 should be the memory. As far as I know, as long as there is *some* kind of matrix multiplication acceleration, it doesn't matter much how fast it is. Memory bandwidth becomes the major bottleneck again.
    I looked it up and found the number of 3TB/s, which would be 50% higher than the A100 80GB-version. I wonder where the number of 4.9TB/s shown in the video at 18:50 comes from. It seems unrealistically high to me.
    Nvidia's marketing does not like to admit this. They like to instead compare other numbers, where they can claim some 10x or 20x or 30x improvement.

    • @klaudialustig3259
      @klaudialustig3259 Před rokem +2

      They call that 4.9TB/s "total external bandwidth" and I think they get it by adding the 3TB/s HMB3 memory bandwidth, plus 0.9TB/s NVLink bandwidth, plus something else?
      Also I have seen Nvidia claim that H100 has 2x higher memory bandwidth than A100. Note that this only when comparing it to the A100 40GB-version, not the 80BG-version.

    • @JohnDlugosz
      @JohnDlugosz Před rokem

      I recall the Intel roadmap showing the next big thing is being able to put or use memory resources in different places. The PCIe will be so fast that you don't have to put all the RAM on the accelerator card. You'll be able to use system RAM or special RAM cards, and thus easily expand the RAM as needed.

  • @kaystephan2610
    @kaystephan2610 Před rokem +4

    I find the increase in cumpute performance incredible. As shown at 7:38 the GV100 from the beginning of 2018 had 125 TFLOPS of FP16 Tensor core compute performance.
    The current generation of Enterprise AI accelerators from NVIDIA are the NVIDIA H100. And the H100 provides up to 1979 TFLOPS of FP16 Tensor core compute Performance.
    And the Fp32 and FP64 Tensor Core performance has also obviously increased massively.
    Within ~5 years the raw compute performance of Tensor Cores has increased by around 16x. What previouslym required 10,000 GPUs could now be done with ~632.

  • @garyb7193
    @garyb7193 Před rokem +73

    Great video! Hopefully it will put things into perspective, that Nvidia, Intel, and AMD's world does not revolve around graphic card sales and squeezing the most performance out of Cyberpunk. Hundreds of millions of dollars are a stake in areas much more lucrative than $500 CPUs or $800 videocards. They must meet the demands of all their various customers as well as investors and stockholders too. Thanks!

    • @marka5968
      @marka5968 Před rokem +5

      This is the single whale > millions of peons attitude that is in video game now. Video games are not designed for fun but for grind and feeding the whales. Apparently video cards are going to be designed that way as well. I don't know how tens of millions of customers are lower priority than some big whale for a tech that hasn't made a red cent yet and costs millions per day to run. Certainly data center makes nVidia more money than gamers buying cards but I don't know that all works out. nVidia is no. 8 on most valuable companies in the world and I guess selling GPUs once every 3-4 years to games isn't going to make that much revenue to be no.8 in the most valuable company in the world. These numbers don't seem to make sense in my mind.

    • @garyb7193
      @garyb7193 Před rokem

      @@marka5968 Okay?!

    • @jimatperfromix2759
      @jimatperfromix2759 Před rokem +2

      In its last quarter results, although AMD is improving market share in its consumer divisions (CPUs and gamer GPUs), it took a slight loss on consumer products. Partly that's the recession coming in for an ugly landing. Good thing for consumers, though, is that AMD is using its massive profits in servers and AI (plus some new profits in the embedded area via its recent purchase of Xilinx) to "support its addiction to making good hardware for the computer/gamer retail market. By the way, one of its next-gen laptop APU models not only contains an integrated GPU that rivals the low-end of the discrete GPU market, but also contains a built-in AI engine (thanks to the Xilinx people). So you can get its highest-end laptop CPU/APU chip (meant for the big boys like Lenovo/HP/Dell/Asus/Acer et al. to integrate into a gemer laptop along with a discrete GPU from AMD or NVidia (or even Intel)), or its 2nd-from-the-top series of laptop CPU/APU chip (the one described above that already has a pretty darn good integrated GPU plus an AI engine (think: compete with Apple M1/M2)), or one of a number of slower series of CPU/APU (that are meant for more economy laptops, and mostly just faster versions of older chips that have been redone on faster silicon to fill that market segment at a cheaper cost). Think of the top-two tiers of laptops built on the new AMD laptop chips as each being about 1/10,000th of the machine they trained ChatGPT on - sort-of. By the way, did I mention you can do AI and Machine Learning on your laptop, starting about *next month*.

    • @snailone6358
      @snailone6358 Před rokem +1

      800$ video cards. We’re past that point for a while now

    • @emkey6823
      @emkey6823 Před rokem +3

      ...and we were told btc was consuming too much of that co2 power, and because of it the cards went so expensive. I think the bad people in cHARGE did make those models using AI for their profits which worked out pretty well for them. Let's keep our eye on that and give the AI they created some good vibes individually

  • @marka5968
    @marka5968 Před rokem +7

    Great and very informative video, sir. I remember watching your very early videos and thought it was bit meh. But, this is absolutely world class stuff and happy to listen to such a sharp and insightful mind.

    • @HighYield
      @HighYield  Před rokem +2

      Thanks, comments like this really mean a lot to me!

  • @og_jakey
    @og_jakey Před rokem +5

    Fantastic presentation. Appreciate your pragmatic and reasonable research, impressive work. Thank you!

  • @Fractal_32
    @Fractal_32 Před rokem

    I just saw your post on the community page, I wish CZcams would have notified me when the video was posted instead of pushing the post without the video. I cannot wait to see what is talked about in the video!
    Edit: This was great, I’m definitely sharing it with some friends, keep up the great work!

  • @SteveAbrahall
    @SteveAbrahall Před rokem

    Thanks for the tech background on what it's running from a hardware angle. The interesting thing is I think when some one comes up with a hunk of code that saves billions of hours of computational power. That disruptive type of thing from a software angle. It is an amazing time to live. Thanks for all your hard work, and an interesting vid!

  • @KiraSlith
    @KiraSlith Před rokem +3

    Been beating my head against the Cost vs Scale issue of building my own AI compute rig for training models at home, and this gave me a better idea of what kind of hardware I'll need long-term by looking at what the current bleeding edge looks like. Thanks for doing all the research work on this one!

    • @Transcend_Naija
      @Transcend_Naija Před 8 měsíci +1

      Hello, how did it go?

    • @KiraSlith
      @KiraSlith Před 8 měsíci

      @@Transcend_Naija I ended up just starting cheapish for my personal rig, a T7820 with a pair of 2080tis on an NVlink bridge. I couldn't argue the V100s just yet, and the P40s I was using at work lacked the grunt for particularly large LLM (works fine for large-scale oject recognition though).

  • @newmonengineering
    @newmonengineering Před rokem +7

    I have been an OpenAI beta member for 3 years now. It has only become better over the years. I wonder what it will look like in 5 years.

  • @vmooreg
    @vmooreg Před rokem

    Thank you for this! I’ve been looking around for this content. Great work!!👍🏼

  • @kanubeenderman
    @kanubeenderman Před rokem +1

    MS will for sure use its Azure based cloud system for hosting its ChatGPT, so that they can load balance the demand, and be able to scale out to more VM's and instances if needed to meet demand, and to increase resources on any individual instance if needed. That would be the best use of that set up and provide the best user experience. So basically, the hardware specifics would be whatever servers are running in the 'farms'. I doubt if they will have separate and specific hardware set aside just for ChatGPT as it would run like any other service out there.

  • @frizzel4
    @frizzel4 Před rokem

    Congrats on the sponsorship!! Been watching since you had 1k subs

  • @backToFreedom
    @backToFreedom Před rokem +1

    Thank you very much for bringing this kind of information.
    Even chatgpt is unware about the hardware is running on!

  • @EmaManfred
    @EmaManfred Před rokem +1

    Good job here sir! Mind if you did a quick breakdown of language model like Bluewillow that also utilizes diffusion?

  • @novadea1643
    @novadea1643 Před rokem +4

    Logically the inference costs should scale pretty linearly to the amount of users since it's pretty much a fixed amount of computation and data transfer, or can you elaborate why the requirements would scale exponentially as you state at @15:40?

    • @HighYield
      @HighYield  Před rokem +2

      The most commented question :D
      I meant if the amount of users increases exponentially, so does the amount of inference computation. I now realize it want very clear. Plus, scaling is a big problem for AI, but thats not what I meant.

  • @Speak_Out_and_Remove_All_Doubt

    A super interesting video, really well explained too, thanks for all your hard work. It's always impressive how Nvidia seems to always be playing the long game with it's hardware development and as you mention I can't wait to see what Jim Keller comes up with at Tenstorrent because I can't think of a job he's had where he hasn't changed the face of computing with what he helps develop. I just wish Intel had backed him more and done whatever was needed to keep him for a little longer and we would maybe be in a very different Intel landscape right now.

    • @AjitMD
      @AjitMD Před rokem

      Jim Keller does not stay at a company for very long. Once he creates a new product, he moves on. Hopefully he gets paid well for all his contributions.

    • @Speak_Out_and_Remove_All_Doubt
      @Speak_Out_and_Remove_All_Doubt Před rokem +1

      @@AjitMD I think more accurately, what he does is stay until he's achieved what he set out to achieve and then wants a fresh challenge, he didn't get to do that at Intel. He was essentially forced out or put into a position that he was not comfortable with so chose to leave but he still had huge amounts of unfinished work left to do at Intel plus becoming the CEO or at least head of the CPU division would have been that fresh new challenge for him.

  • @tee6942
    @tee6942 Před rokem +1

    What a valuable information👌🏻 thank you for sharing, and keep up the good work

  • @theminer49erz
    @theminer49erz Před rokem +2

    Fantastic!! First of all, I cpuld be wrong, but I don't remember you having an in video sponsor before. Either way, that I awesome!! I'm glad you are getting the recognition you deserve!!
    You must have done a lot of work to get these numbers and configurations. Very interesting stuff! I am looking forward to AI splitting off from GPUs too. Especially with the demand for them going up as investment in AI grows. I, as I'm sure many others are as well, am kinda sick of having to pay or consider paying a lot more for a gaming GPU because the higher demand is in non gaming sectors that are saturated with capital to spend on them. Plus I'm sure they will do a much better job. The design, at least in regards to Nvidia because of it is quite annoying too. Tensor cores for example were mainly put there for AI and Mining use, the marketing of them for upscaling and the cost added for a gamer to use it is kinda ridiculous. If you have a lower end card with them where you wpuld benifiet from the upscaling, you could probably buy a card without them that wouldnt need to upscale. It seems to me that their existence is almost the cause for their need in that use case. I don't know how much of the cost of the card is just for them, but I imagine it's probably around 20-30% maybe?? IDK, just thinking "aloud".
    Anyway, thanks again for the hard work and please let us know when you get a Patreon account!! I would be proud to sponsor you as well!! Cheers!!

    • @brodriguez11000
      @brodriguez11000 Před rokem

      " I, as I'm sure many others are as well, am kinda sick of having to pay or consider paying a lot more for a gaming GPU because the higher demand is in non gaming sectors that are saturated with capital to spend on them." Blame cryptocurrency for that. Otherwise those non-gaming sectors are what's keeping the lights on and driving the R&D that gamers enjoy the fruits of.

  • @alexcrisara4902
    @alexcrisara4902 Před rokem

    Great video! Curious what you use to generate graphics / screenshot animations for your videos?

  • @hendrikw4104
    @hendrikw4104 Před rokem

    There are interesting approaches like LLaMA, which focus on inference efficiency over training efficiency. These could also help to bring down inference costs to a reasonable level.

  • @josephalley
    @josephalley Před rokem

    Great to see this video well. I loved your m2 chip video breakdowns ages ago

  • @OEFarredondo
    @OEFarredondo Před rokem

    Mad love bro. Thanks for the vid

  • @senju2024
    @senju2024 Před rokem

    Very very good video. SUBSCRIBE. Reason. You did not talk about hype. You explain tech concepts based on AI. I knew about Nvidia hardware running chatGPT but not the details. Thank you.

  • @AgentSmith911
    @AgentSmith911 Před rokem +1

    10:14 is so funny because "what hardware are you running on?" is one of the first questions I asked that bot 😀

  • @zerodefcts
    @zerodefcts Před rokem +2

    I remember when I was growing up, I thought to myself...geez...it would have been great to live in the past, as there were so many undiscovered things that I could have figured out. Grown up, I have been working in AI for the past 7 years, and looking at this very point in time I can't help but think reflect on that moment...geez...there is just so much opportunity for new discovery.

    • @HighYield
      @HighYield  Před rokem +2

      I'm honestly excited to see what's coming next. If we use it to improve our lives, it will be amazing.

  • @Noobificado
    @Noobificado Před rokem +1

    Some time around 1994, the era of search engines started.
    And now, the era of free access to general purpose Artifical intelligence, is becoming a reality in front of our eyes.
    What a time to be alive.

  • @alb.1911
    @alb.1911 Před rokem

    Do you have any idea why they are back to Intel CPU for the NVIDIA DGX H100 hardware?

  • @markvietti
    @markvietti Před rokem

    could you do a video on memory cooling..Seems most of the video card manufactures don't cool the memory. some do . why is that?

  • @Embassy_of_Jupiter
    @Embassy_of_Jupiter Před rokem +1

    Kind of mind blowing that we can already run something very similar on a MacBook. The progress in AI is insane and it hasn't even started to self-improve, it's just humans that are that fast.

  • @petevenuti7355
    @petevenuti7355 Před rokem

    Out of curiosity and considering my current hardware, How many orders of magnitude slower, would a neural network of this magnitude, run on a simple CPU and virtual memory?

  • @hxt21
    @hxt21 Před rokem +1

    I want to say thank you very much for a really good video with good information.

  • @legion1791
    @legion1791 Před rokem

    Cool I was exactly wanting to know that!

  • @Tomjones12345
    @Tomjones12345 Před rokem +1

    i was wondering how far off running inference is on a local machine. Or could a more focused model (one language, specific subjects/sites) run on today's hardware?

  • @BGTech1
    @BGTech1 Před rokem

    Great video I was wondering about this

  • @memejeff
    @memejeff Před rokem +3

    I asked gpt3 half a year ago what it was running on. I kept asking more leading questions. I was able to get to a point where it said that it used specific mid range FPGA accelerators that retail between 4000-6000 dollars. The high end fpga are connected by pcie and the lower end use high speed uart. The servers used a lot of K series gpu's too.

    • @jordanrodrigues1279
      @jordanrodrigues1279 Před rokem +4

      The specs aren't in the training dataset, there's no way for it to have that information; it's like asking it to give you my passwords.
      Or in other words you just told yourself what you wanted to hear with extra steps.

  • @zahir2942
    @zahir2942 Před rokem +1

    Was cool working on these servers

  • @olafschermann1592
    @olafschermann1592 Před rokem

    Great research and presentation

  • @StrumVogel
    @StrumVogel Před rokem

    We have 8 of those at the Apple data center I worked at. NVidia cheaped out on the CMOS bracket, and it’ll always crack. You’ll have to warranty work the whole board to fix it.

  • @virajsheth8417
    @virajsheth8417 Před rokem

    Really insightful video. Really appreciate.

  • @user-zk4xq3mn2q
    @user-zk4xq3mn2q Před 5 měsíci

    Really good video, and lots of efforts! Thanks man!

  • @garydunken7934
    @garydunken7934 Před rokem

    Nice one. Well presented.

  • @genstian
    @genstian Před 6 měsíci

    We do run into lots of problems where general AI models isn't good, the future is to make a new submodel that can specifiacally solve specific tasks or just add weights to general models, but such a version of chatgpt would probably require 1000xbetter hardware.

  • @elonwong
    @elonwong Před rokem +1

    From what I understood from chatgpt, it’s a strip down version of gpt3. Where its hardware requirement and model size are massively cut down.
    It’s a lot lighter running the model compared to gpt3. Itself even said chatgpt can even be ran on a high end pc.

    • @HighYield
      @HighYield  Před rokem

      That’s what I gathered to. IMHO it’s also using a lot less parameters, but since there is nothing official I’m rather careful with my claims.

  • @theminer49erz
    @theminer49erz Před rokem

    Yay!! Happy day! Been looking forward to this! Thanks!

  • @MemesnShet
    @MemesnShet Před rokem

    And to think now anyone can run a GPT 3.5 Turbo-like AI on their local computer without the need for crazy specs is just incredible
    Stanford Alpaca and GPT4All are some models that achieve it

  • @johannes523
    @johannes523 Před rokem +2

    Very interesting! I was wondering about Tom Scott's statement on the curve, and I think your take on it is very accurate 👍🏻

    • @HighYield
      @HighYield  Před rokem +1

      I really feel like the moment you don't look at what AI does, but how it's "created", you get a much clearer picture. Ofc I might be completely wrong in my assumption :p

  • @prodromosregalides3402
    @prodromosregalides3402 Před rokem +2

    15:19 That's not a problem at all. Even if only 10 millions out of 100 are using OpenAI servers at every single time, that's 29000/10000000 gpus, or 0.0029 gpus per user. Probably less.
    So instead of them running the model , the end-users could , easily on their machines.
    Bloody hell , even small communities of few thousands of people could train their own AIs on their machines, soon to be a much smaller number.
    Few major problems with that.
    They lose much control on their product. They haven't figured out , yet , the details of monetizing these models, so they are restricted to running them in their own servers instead.
    Third major problem for Nvidia , it will be forced to return to the gaming gpus , their rightful capabilities , from which they were stripped back in 2008. This would mean
    no lucrative sales of the same hardware (with some tweaks ) to corporations and instead rely on massive sales of cheaper units.
    And last but not least , an end-user, gamer or not, will be able to acquire much more compute power with their 1000-3000 dollar purchases. Because , now a pc may be sporting the same cpus and gpus, but difference is gpus will be unlocked to their full computing potential . We are talking about many tens to hundreds of teraflops performance available now for the end-user to do useful work. Anf how will the mobile sector compare to this? Due to the fact that it runs on lower power budgets there is no way it could compete with fully-fledged pcs. Many will start forgetting buying a new smartphone especially the flagship-ones ; in fact the very thought of spending to buy sth that is an order of magnitude less compute-capable would be hugely annoying.
    Now, that I am thinking of it , losing control , worries them much more than anything else. And it would not only be control lost on a corporate level , but on a much much higher level.
    Right now , top heads at computer companies and possibly state planners must have shit their pants, because of what is seeminlgly an unavoidable surrendering of power from power centers to the citizens. To paraphrase Putin "Whoever becomes the leader in this sphere will not only forget about ruling this world, but lose the power he/she already has"
    All top leaders got this all thing wrong. This is not necessarily a bad thing.

  • @davocc2405
    @davocc2405 Před rokem

    I can see a rise in private clouds particularly within government at least on a national level. The utilisation of the system itself may give away sensitive information to other nations or even corporations that may have competing self interests so a few of these systems may pop up in the UK, Australia, several in the US and probably Canada to start with (presuming each European nation may have one or two as well). Whosoever develops a homogenised and consistent build for such a system will be suddenly in demand with competing urgency.

  • @dorinxtg
    @dorinxtg Před rokem

    Thanks for the video
    I was looking at the images you created with the GPU specs, and I'm not sure if your numbers are correct. Just for comparison, I checked the numbers in TechpowerUp GPU DB.
    So if we'll look at GH100, for example, you mention 1000 TFlops (FP16) and 50 TFlops (FP32)
    On TechPowerUP GPU DB, an H100 (I checked both the SXM5 and PCIe versions) the numbers are totally different: 267 TFlops (FP16) and ~67 TFlops (FP32).

    • @HighYield
      @HighYield  Před rokem +1

      TechPowerUp doesn’t show the Tensor core FLOPS. If you look up the H100 specs at Nvidia you can see the full performance.

    • @dorinxtg
      @dorinxtg Před rokem

      @@HighYield I see. Ok, thanks ;)

  • @nannesoar
    @nannesoar Před rokem

    This is the type of video im thankful to be watching

    • @HighYield
      @HighYield  Před rokem

      I’m thankful you are watching :)

  • @MultiNeurons
    @MultiNeurons Před rokem

    Yes it's very interesting, thankyou

  • @THE-X-Force
    @THE-X-Force Před rokem +1

    Excellent excellent video!
    (edit to ask: at 19:25 .. "In-Network Compute" of ... *_INFINITY_* ... ? Can anyone explain that to me, please?)

  • @shrapnel95
    @shrapnel95 Před rokem +1

    I find this video funny in that I've asked ChatGPT about what kind of hardware it runs on; never got straightforward answer and it kept running me around loops lol

    • @HighYield
      @HighYield  Před rokem +2

      I've noticed exactly the same thing, that's where I got the idea for this video from!

  • @karlkastor
    @karlkastor Před rokem

    15:50 Now with GPT-3.5 Turbo they have decreased the cost 10 times, but likely not with new hardware, but with an improved, discretized and/or pruned model.

    • @HighYield
      @HighYield  Před rokem

      That sounds super interesting! Do you have any further links for me to read up on this?

  • @TheGabe92
    @TheGabe92 Před rokem

    Interesting conclusion, great video!

    • @HighYield
      @HighYield  Před rokem +1

      I’m usually quiet resistant to hype, but AI really has the potential to fundamentally change how we work. It’s gonna be an interesting ride for sure!

  • @ZweiBein
    @ZweiBein Před rokem

    Good and informative video, thanks a lot!

  • @sa1t938
    @sa1t938 Před rokem +1

    Something important to note is that OpenAI works with Microsoft and likely doesn't pay all that much for the GPUs. Microsoft OWNS the hardware, so their cost per day is just electricity, employees, and rent. The servers were already paid in full when the datacenter was built. Its up to Microsoft how much they want to charge OpenAI (who they are working with and just supplied Microsoft with Bing Chat, which made Bing really popular), so I'm guessing Microsoft gives them a huge discount or for free

    • @knurlgnar24
      @knurlgnar24 Před rokem

      If you own a shirt is the shirt free because you already own it? If you own a car and let your friend drive it is it free because you already owned a car? This hardware is extremely expensive, it depreciates, requires maintenance, floor space, etc. Economics 101. Ain't nothin' free.

    • @sa1t938
      @sa1t938 Před rokem

      @@knurlgnar24 did you read my comment? I literally mentioned all of those costs, and I said they are the only thing Microsoft is ACTUALLY paying. Microsoft chooses how much they want to charge, so they could charge a business partner like openAI almost nothing, or just foot the bill of maintenance costs instead. If they did either of those they already would have made their money back by how popular bing is because of bing chat.

    • @sa1t938
      @sa1t938 Před rokem

      @@knurlgnar24 And I guess to your analogy, is a shirt free if you already own it? And the answer is, yes. You can use the shirt for no cost, minus the maintenance of washing it. You can also give that shirt to a friend for free. The shirt wasn't free, but you paid for it up front and now it's free every time you use it.

  • @boronat1
    @boronat1 Před rokem

    wondering if we could run a software that uses your gpu to give power to ai network? like we do with crypto mining

  • @1marcelfilms
    @1marcelfilms Před rokem +1

    The box asks for more ram
    The box asks for another gpu
    The box asks for internet access

    • @HighYield
      @HighYield  Před rokem

      What's in the box? WHAT'S IN THE BOX???

  • @Alexander_l322
    @Alexander_l322 Před rokem

    I literally saw this on South Park yesterday and now it’s recommended to me on CZcams

  • @jagadeesh_damarasingu

    when thinking about huge capital costs involving in setting up AI hardware farm, Is it possible to take advantage of shared computing power of public peer network like we are doing now with blockchain nodes ?

  • @glenyoung1809
    @glenyoung1809 Před rokem +2

    I wonder how fast ChatGPT would have trained on a Cerebras CS-2 system with their Wafer scale 2 architecture?

  • @n8works
    @n8works Před rokem +1

    15:30 You say that the inference hardware must scale exponentially, but that must be hyperbole right? At the very most it's 1 to 1 and I'm sure there are creative ways to multiplex. The interesting thing to see would be transactions/sec for a single cluster instance.

    • @HighYield
      @HighYield  Před rokem

      I meant in relation to its users. If users increase exponentially, so do the hardware requirements. Since you are like comment no 5 about this I realize I should have said it differently.
      A point that could play into that question is as scaling, but that’s not what I was talking about.

    • @n8works
      @n8works Před rokem

      @@HighYield ahh. I understand what you were saying, yes. It scales with users in some way. The more users the more hardware in some degree

  • @zyxwvutsrqponmlkh
    @zyxwvutsrqponmlkh Před rokem

    For GAI we really need to be perpetually training during inference.
    You want this stuff to run cheaply open source it, folk have gotten Llama to run on an RPI.

  • @vincentyang8393
    @vincentyang8393 Před rokem

    Great talk! Thanks.

  • @willemvdk4886
    @willemvdk4886 Před rokem

    Of course the hardware and infrastructure behind this application is interesting, but what I find even more interesting is how this is done in software. How are alle these GPU's clustered? How is the workload actually divided and balanced? How do they maximize performance during training? And how in the world is the same model used by thousands of GPU's to server the inferencing by many, many users simultaneously? That's mindboggling to me, actually.

  • @miroslawkaras7710
    @miroslawkaras7710 Před rokem

    Does quantum coputer could be used for AI training?

  • @DSTechMedia
    @DSTechMedia Před rokem

    AMD made a smart move in acquiring Xillinix, and it mostly went unnoticed at the time. But it could pay off heavy in the long run.

  • @gab882
    @gab882 Před rokem

    Would be both amazing and scary if AI neural networks run on quantum computers or other advanced computers in the future

  • @robinpage2730
    @robinpage2730 Před rokem

    How powerful would a model be that could be trained on a gaming laptop rtx 1650 ti?
    How about a natural language compiler, that translates English input into executable machine code like GCC?

  • @lolilollolilol7773
    @lolilollolilol7773 Před rokem +1

    AI progress is far more software bound than hardware bound. The deep learning algorithms are incapable of making logical reasoning and thus knowing if a proposition is true or not. That's the real breakthrough that needs to be done. Once deep learning gains this capability, we will really be confronted to superintelligence, with all the massive consequences that we are not really ready to face.

  • @mrpicky1868
    @mrpicky1868 Před rokem +1

    what are the most advanced models like megatron are actually doing? anyone knows?

  • @legion1791
    @legion1791 Před rokem +2

    I would love to have a local and unlocked offline chatGPT

  • @albayrakserkan
    @albayrakserkan Před rokem

    Great video, looking forward to AMD MI300.

  • @lucasew
    @lucasew Před rokem

    Fun fact: DGX A100 has a configuration with a quad socket EPYC 7742, 8 DGX A100 and 2TB of RAM
    Source: I know someone who work with one
    He said it works nice with Blender renders too, but the focus is tensor number crunching using PyTorch

    • @HighYield
      @HighYield  Před rokem

      All I could find is this dual socket config: images.nvidia.com/aem-dam/Solutions/Data-Center/nvidia-dgx-a100-80gb-datasheet.pdf
      But a quad one would be much nicer :D

  • @TMinusRecords
    @TMinusRecords Před rokem +2

    15:39 Exponentially? How? Why not linearly

  • @ethanroland6770
    @ethanroland6770 Před rokem

    Curious about a gpt4 followup!

  • @LukaszStafiniak
    @LukaszStafiniak Před rokem +1

    Hardware requirements increase linearly for inference, not exponentially.

  • @stefanbuscaylet
    @stefanbuscaylet Před rokem +2

    Does anyone have any references on how big the storage required for this was? Was it a zillion SSDs or was it all stored on HDDs?

    • @dougchampion8084
      @dougchampion8084 Před rokem

      The training data itself is pretty small in relation to the compute required to process it. Text is tiny.

    • @stefanbuscaylet
      @stefanbuscaylet Před rokem

      @@dougchampion8084 I feel like that is over simplifying things. When there are over 10K cores distributed over a large network and the training data is “all Wikipedia and tons of other data” there has to be quite a bit of disaggregated storage for that along with every node seems to have some local/fast NAND SSD storage. As far as I can tell they mostly use the CPUs to orchestrate and feed the data to the GPUs and the GPUs then feed the data back to the CPUs to be pushed to storage. Be nice if someone just mapped this all out along with capacity and bandwidth needs.

  • @drmonkeys852
    @drmonkeys852 Před rokem

    My friend is actually already training the smallest version of GPT on 2 A100s for his project in our ML course

    • @HighYield
      @HighYield  Před rokem +1

      That's really interesting, I wonder how much time the training takes. And having access to 2x A100 GPUs is also nice!

    • @drmonkeys852
      @drmonkeys852 Před rokem

      @@HighYield Yea it's from our uni. We still have to pay for time on it unfortunately but it's pretty cheap still. He estimates it'll cost around 30$, which is not bad

    • @gopro3365
      @gopro3365 Před rokem

      @@drmonkeys852 $30 for how many hours

  • @omer8519
    @omer8519 Před rokem +1

    Anyone got chills when they heard the name megatron?

  • @fletcher9328
    @fletcher9328 Před 3 měsíci

    Great video!

  • @yogsothoth00
    @yogsothoth00 Před rokem

    Power efficiency comes in to play as well, will AMD be able to compete with slower hardware? Only if they can beat Nvidia in the overall value proposition.

  • @oxide9717
    @oxide9717 Před rokem +1

    Let's Gooo, 🔥🔥🔥

  • @GraveUypo
    @GraveUypo Před rokem

    the best part is they already made very similar AIs that are open source and you can run in your own computer. which is way preferable than handing over data to openai and paying by character used, which is kinda absurd. and they used chat gpt to train them, lol

  • @maniacos2801
    @maniacos2801 Před rokem

    What we need are locally ran AI models. Optimisations will have to be made for this, but it is a huge flaw that this type of high-speed interactive knowledge is in the hands of a few multi-billion global players. And we all know by now, "Open"AI is anything but open. This is what scares me most about this whole development. In the early days of the internet, everyone could run a server at home or people could get together and run a dedicated hardware in some co-location. With AI this is impossible because no one can afford this hardware requirement.
    If Chat-AI is the new internet, we need public access to the technology otherwise only few will be in control of such a huge power to decide what information should be available and what should be filtered or even altered.

  • @simonlyons5681
    @simonlyons5681 Před rokem

    I am interested to know the hardware requirements to run inference for a single user. What do you think?

    • @HighYield
      @HighYield  Před rokem

      I think its VRAM bound. So you might still need a full Nvidia DGX/HGX A100 server, but not because of the raw computing power, but because of the VRAM capacity. Maybe 4x A100 GPUs would work too, depending on how much smaller ChatGPT is compared to GPT-3.
      It's really hard to say since we don't have official numbers.

  • @auriplane
    @auriplane Před rokem

    Love the new thumbnail!

  • @MrEnyecz
    @MrEnyecz Před rokem +1

    Why improves the HW requirement of inference exponentially by the number of users? That should be linear, shouldn't it?

    • @HighYield
      @HighYield  Před rokem

      Ofc you are right, it was just a term I used in conjunction with the exponential user growth of ChatGPT.

  • @zwenkwiel816
    @zwenkwiel816 Před rokem +1

    everyone always asks what is ChatGPT? but no one ever asks how is chatGPT? :(

  • @MM-24
    @MM-24 Před rokem

    Any pricing analysis to go with this?

    • @HighYield
      @HighYield  Před rokem

      That currently outside of my capabilities, as such large projects are priced very different from off-the-shelve hardware.

  • @JazevoAudiosurf
    @JazevoAudiosurf Před rokem

    I think it's a very simple equation: more layers and thus more params lead to better abstraction and deeper understanding. the brain would not be so huge if it wasn't necessary. we need to scale transformers up until they reach couple trillion params and for that we need stuff like H100 and whatever they announce at GTC next month. transformers are probably enough to solve language. that combined with CoT and as papers have shown, it will surpass humans

  • @seraphin01
    @seraphin01 Před rokem

    Great video thank you
    Been trying to ask chatgpt about its hardware but obviously you don't get the answer haha
    Those who think we're already reaching the top of ai right now are grossely mistaken.
    Those Terra flops we're talking for the new architecture will sound so ridiculously crap in a few years time, just like a top end gpu in 2010 wouldn't even be good enough for a cheap smartphone nowadays.
    And with focus turning to AI only for hardwares now it's just gonna improve exponentially for a while
    Although like the guys at openAI stated, don't expect chat gpt4 to be skynet level of AI, the results might look like minor improvements at first glance, but the cost, reliability, speed a'd accuracy of those models will improve a lot before going to the next phase with is actual artificial INTELLIGENCE.
    By 2030 the world won't be the same as it is now, that's granted imo, and most people are not ready for it

  • @paulchatel2215
    @paulchatel2215 Před rokem

    You don't need that much power to train ChatGPT. You can't compare the full training of a LLM (GPT-3) with an instruct finetune (ChatGPT). Remember that Stanford trained Vacuna which has performances similar to ChatGPT 3 for only $300 , by instruct finetuning the LLM Llama. And other open source chatbots have been trained on single gpu setups. So it's unlikely that OpenAI needed a full datacenter to train ChatGPT, the data collection was the hardest part here. Maybe they did, but then the training would have lasted less than one second, it seems useless to use 4000+ GPUs.

  • @CrypticConsole
    @CrypticConsole Před rokem

    based on the fact we have almost no mainstream multimodal models I think we are near the start of the curve

  • @DeadCatX2
    @DeadCatX2 Před 5 měsíci

    When I look at AI right now, I liken ChatGPT to the first transistor radio. It was the first time that the unwashed masses had access to the magic unlocked by science and engineering. With that said, imagine bring born in the era where you see the transistor radio with wonder as a child, and then how everything progresses with television, home computers, the internet, and mobile phones. That is what my father has seen in his lifetime, and that is the kind of exponential improvement I expect a 12 year old of today to see over the course of their life. And that's my conservative guess, because the growth of science and engineering is exponential itself.