Stanford MLSys Seminars
Stanford MLSys Seminars
  • 112
  • 513 131

Video

Pixie CEO Zain Asgar - How I Started Pixie
zhlédnutí 325Před 2 lety
Pixie CEO Zain Asgar - How I Started Pixie
Stanford PhD Albert Gu - Why S4 Works
zhlédnutí 2,4KPřed 2 lety
Stanford PhD Albert Gu - Why S4 Works
Stanford PhD Albert Gu on the Research Journey behind S4
zhlédnutí 814Před 2 lety
Stanford PhD Albert Gu on the Research Journey behind S4
Stanford PhD Albert Gu Presents S4's Impressive Performance
zhlédnutí 1,1KPřed 2 lety
Stanford PhD Albert Gu Presents S4's Impressive Performance
Baharan Mirzasoleiman - How Structure Helps in Machine Learning
zhlédnutí 1,1KPřed 2 lety
Baharan Mirzasoleiman - How Structure Helps in Machine Learning
Baharan Mirzasoleiman - Dangers of Scaling Up Machine Learning
zhlédnutí 291Před 2 lety
Baharan Mirzasoleiman - Dangers of Scaling Up Machine Learning
Baharan Mirzasoleiman on Fast and Efficient Machine Learning with CRAIG
zhlédnutí 392Před 2 lety
Baharan Mirzasoleiman on Fast and Efficient Machine Learning with CRAIG
Baharan Mirzasoleiman - The Problems with Big Data in Machine Learning
zhlédnutí 511Před 2 lety
Baharan Mirzasoleiman - The Problems with Big Data in Machine Learning
Comet CEO Gideon Mendels on why industry is behind academia in machine learning
zhlédnutí 528Před 2 lety
Comet CEO Gideon Mendels on why industry is behind academia in machine learning
Comet CEO Gideon Mendels on what Tesla has to overcome for full self-driving
zhlédnutí 327Před 2 lety
Comet CEO Gideon Mendels on what Tesla has to overcome for full self-driving
Stanford MLSys Seminar Episode 0: ML + Systems
zhlédnutí 40KPřed 3 lety
Stanford MLSys Seminar Episode 0: ML Systems

Komentáře

  • @lpang
    @lpang Před 16 dny

    I am glad you talked about inverse lithography technology (ILT), which I named twenty years ago, and I am still working on it using GPU acceleration. BTW, I also got my PhD from Stanford

  • @ostrov11
    @ostrov11 Před 20 dny

    ... какие то откровения ML джуна

  • @LazyDotDev
    @LazyDotDev Před 25 dny

    Great talk, but why didn't anyone ask questions around competition. What is to prevent Nvidia, AMD, or Intel from producing niche chips like this? With their R&D teams, Quality Assurance systems, Warranties, and supply chains, they likely thought of this and if not should be able to deploy a more competitive and reliable solution fast. That being said I really appreciate Gavin breaking down the history here I learn a lot of new things.

    • @manonamission2000
      @manonamission2000 Před 18 dny

      Corporations tend to move slowly... it is less expensive (relatively, $ and time) for a nimble co to attempt to innovate like this... also, the gamble is the Sohu platform becomes so appetizing that it ends up as an acquisition target... again, both are simply bets... not without risk

    • @LazyDotDev
      @LazyDotDev Před 18 dny

      @@manonamission2000 Sure, you could argue some leaders like Blockbuster moved slow when the rising leader Netflix transitioned to online and on-demand content. However, unlike on-demand streaming services, Gen Ai is the most revolutionary technology of our time and if this direction was so promising and yet as simple as creating a niche chip focused solely on transformers then you'd think Intel and AMD with it's massive R&D teams would already be doing it to get an edge on Nvidia. These serious business questions should have been asked, I'll do more research but hard to take any of this seriously if such as basic question could not have been asked/answered.

  • @briancase6180
    @briancase6180 Před 25 dny

    Dude, you're at Stanford; I think students know what an inverter does. This was an ML seminar talk? How? And, how did this have anything to do with the topics explicitly raised in the Abstract? Just asking.... And, BTW, HBM isn't the only type of memory that's relevant especially for inference, which is, BTW, the focus of his company.

  • @peaceworld5885
    @peaceworld5885 Před měsícem

    Awesome, I think this model will succeed one day, and transformer will lose! Remember my comment! Sheldon!

  • @kvotheosem-sangue
    @kvotheosem-sangue Před měsícem

    Explained so clearly! The paper gets you confused when gets into the math due to the material being so dense, thanks for extending to a video format

  • @radicalrodriguez5912
    @radicalrodriguez5912 Před měsícem

    great presentation. thanks

  • @rfernand2
    @rfernand2 Před měsícem

    This is a presentation that Ben "threw together" at the last minute? Amazingly well done!

  • @MatijaGrcic
    @MatijaGrcic Před 2 měsíci

    Great talk, thanks for sharing.

  • @samsgregson
    @samsgregson Před 2 měsíci

    What is the paper being referred to at 55:40? "Step"?

  • @ppujari
    @ppujari Před 2 měsíci

    He is describing hhis company for 10 minutes apprx instead of MLOps

  • @laurenpinschannels
    @laurenpinschannels Před 2 měsíci

    related to this, I'd recommend looking up the story "a disneyland without children", by strataoftheworld.

  • @user-el2vz9cb1t
    @user-el2vz9cb1t Před 2 měsíci

    Great stuff.

  • @nauy
    @nauy Před 2 měsíci

    Nice history lesson. Nothing about the ‘next 100x’ promised in the title.

  • @kenchang3456
    @kenchang3456 Před 2 měsíci

    I enjoyed the discussion and experience sharing. Thank you very much.

  • @sabrango
    @sabrango Před 2 měsíci

    Amazing

  • @muhannadobeidat
    @muhannadobeidat Před 2 měsíci

    Good presentation, everyone that tried this reached similar conclusions. It is great to see that confirmation and similar though process here

  • @kevon217
    @kevon217 Před 2 měsíci

    Thanks for the great walkthrough. Looking forward to reading these papers.

  • @nathanhelmburger
    @nathanhelmburger Před 3 měsíci

    I'm not sure it makes sense to describe LLMs as lossless compressors. Wouldn't it be more accurate to say they are lossy compressors which asymptote towards becoming lossless as you train them? Ah, watched further and now see it a different way, but am still puzzled. Maybe you could anchor a different term, and say for a given level of training you can perfectly reconstruct an uncompressed message from a compressed message, and the thing that improves as training continues is the ratio of uncompressed to compressed. But then, as other commentors mention, you talk about the integral of the training loss curve. I don't get why the early and intermediate losses are relevant instead of only the end loss you can achieve. Ah, got clarification at 56:39. It makes sense to consider the integral of the loss curve only for the first epoch.

    • @StanfordMLSysSeminars
      @StanfordMLSysSeminars Před 2 měsíci

      The simplest way to see an LLM as a lossless compressor is to construct an arithmetic code over the predicted probabilities. That LLMs are good at compression is not really surprising, either, it comes from the fact that there's a KL divergence embedded within the crossentropy loss used in training. (and KL(P||Q) quantifies the inefficiency of Q being used to code for P.)

  • @420_gunna
    @420_gunna Před 3 měsíci

    danfu cooked in this one

  • @420_gunna
    @420_gunna Před 3 měsíci

    Snippy responses 😒

  • @JaisidhSinghBAI
    @JaisidhSinghBAI Před 3 měsíci

    Awesome work. I was looking for a resource to explain butterfly matrices and their usage and came across this talk. Invaluably helpful and an incredible contribution to deep learning.

  • @jayasimhatalur5503
    @jayasimhatalur5503 Před 3 měsíci

    Synthetic data generation FTW

  • @smsubham342
    @smsubham342 Před 3 měsíci

    Can we also have the slides?

  • @m.d.4979
    @m.d.4979 Před 3 měsíci

    Hello! Great talk! I am currently studying your SSM-related works. They are amazing! Please share your ideas, challenges, and outcomes for implementing your MAMBA model into human(sports athlete) action forecasting. Thank you for your kind reply!

  • @ppujari
    @ppujari Před 3 měsíci

    This talk is more about Gemini rather than MLSys. I was expecting more on MLSys

  • @Gerald-iz7mv
    @Gerald-iz7mv Před 3 měsíci

    hi, do you have any links to benchmarks you can run to measure latency, throughput for different model and frameworks etc?

  • @Karl-Asger
    @Karl-Asger Před 4 měsíci

    Great video thanks

  • @for-ever-22
    @for-ever-22 Před 4 měsíci

    These videos are amazing

  • @user-nx9nr3jn1g
    @user-nx9nr3jn1g Před 4 měsíci

    Stanford MLSys

  • @vicaya
    @vicaya Před 4 měsíci

    37:40, as you already realized that LLM (and transformer architecture in general) is memory constrained, the extra FLOPS are wasted until TSMC productize SOT-MRAM. groq with SRAM is a more realistic short term approach for small models.

  • @sucim
    @sucim Před 4 měsíci

    Very interesting and well presented!

  • @jaytau
    @jaytau Před 4 měsíci

    Would it be possible to use an external mic for the speaker and the person who asks the question? Its quite challenging to hear

  • @georgehart5182
    @georgehart5182 Před 4 měsíci

    it's cool, but this is going to be a long road. The main problem is software at the IR (e.g. CUDA), not necessarily hardware. There are many companies that can make interesting transistor permutations that have been doing it for a long time and they are not magically "accelerating superintelligence". This is a software ecosystem problem more than anything else. good luck.

  • @sucim
    @sucim Před 5 měsíci

    Great talk and even greater work!!

  • @420_gunna
    @420_gunna Před 5 měsíci

    Ben continues to be a stud 💪💪💪 Thanks Stanford students/faculty for putting these online, they're among the beast learning opportunities for people on the sidelines 😄

  • @420_gunna
    @420_gunna Před 5 měsíci

    (After finishing) -- What an awesome video! Data-centric modeling is awesome. Thanks MLSys for putting this on CZcams.

  • @420_gunna
    @420_gunna Před 5 měsíci

    Ludwig da goat 🐐

  • @andrewm4894
    @andrewm4894 Před 6 měsíci

    Love this! Thanks!

  • @maximliu
    @maximliu Před 6 měsíci

    Great presentation! Wondering if there is any literatures or papers, tutorials on the similar topics? The talk was kind of quick, need read more specifics from literatures. Any pointer would be appriciated. Thanks!

    • @BenjaminFSpector
      @BenjaminFSpector Před 6 měsíci

      I blew through a ton of different topics in the course of the talk, so it really depends what you're looking for. If you want more on making the most of an H100, NVIDIA has fairly good docs on both the CUDA programming model as well as the specific features of the H100, but actually using them can be tricky, so your best bet is probably to read the CUTLASS repo and see how they do things. If you want more on hardware design, I'm not sure there are great alternatives to taking a class. Hardware design seems to me like an awful lot of work -- writing good RTL is hard enough, but the whole EDA stack is a bit of a nightmare. If you want more on semiconductor manufacturing, I'd highly recommend the Asianometry YT channel, which has a lot of really excellent content. Otherwise, some of my main sources for this talk were SemiAnalysis ($500/yr, but I like it enough that I pay for it even from a grad student stipend), Bill Dally's HC2023 talk, and various coursework, particularly 6.172 from MIT for performance engineering. (It's on OCW at ocw.mit.edu/courses/6-172-performance-engineering-of-software-systems-fall-2018/video_galleries/lecture-videos/ and while it's focused on CPU performance engineering many of the principles apply across both.) Hope this helps!

    • @prasannaprabhakar1323
      @prasannaprabhakar1323 Před 6 měsíci

      @@BenjaminFSpector Thanks a ton man! What you have shared here is gold. I really appreciate it.

  • @420_gunna
    @420_gunna Před 6 měsíci

    Dan Fu == The Rizzler

  • @jjh5474
    @jjh5474 Před 6 měsíci

    Thank you for sharing this insightful video. In the introduction of Mamba, it says "parellelizable training", can you explain how parallel training is possible in an autoregressive model?

    • @robertjflynn4206
      @robertjflynn4206 Před 6 měsíci

      Teacher forcing

    • @icriou
      @icriou Před 5 měsíci

      Follow this video and you will have hands on understanding why AR model could be trained in parallel. czcams.com/video/kCc8FmEb1nY/video.html

    • @matthewnorton2315
      @matthewnorton2315 Před 5 měsíci

      I think you might be looking for the "selective scan" part of Mamba. In section 3.3.2 of the paper arxiv.org/ftp/arxiv/papers/2312/2312.00752.pdf, they say "To avoid the sequential recurrence, we observe that despite not being linear it can still be parallelized with a work-efficient parallel scan algorithm (Blelloch 1990; Martin and Cundy 2018; Smith, Warrington, and Linderman 2023)". In short, they use a well known parallel algorithm trick to calculate a prefix sum. See en.wikipedia.org/wiki/Prefix_sum#Parallel_algorithms and you'll notice the similarity. Hope this helps!

  • @420_gunna
    @420_gunna Před 6 měsíci

    martin is a based skyrim nord 👍😍

  • @Mpumzar
    @Mpumzar Před 6 měsíci

    Wow great information I am trying to pivot my career to modelling and AI

  • @420_gunna
    @420_gunna Před 6 měsíci

    Ben has W white boy rizz

  • @BR-hi6yt
    @BR-hi6yt Před 6 měsíci

    Makes no sense, I'll ask ChatGPT for a better explanation.

  • @truehighs7845
    @truehighs7845 Před 6 měsíci

    The stack is a cluster-fuck, pun intended.

  • @jawadmansoor6064
    @jawadmansoor6064 Před 6 měsíci

    axriv link please?

    • @backtofocused438
      @backtofocused438 Před 6 měsíci

      Indeed! It is such a wonderful work and such a fantastic way to learn and I world have expected that for such a fantastic scientic exploration about this

    • @StanfordMLSysSeminars
      @StanfordMLSysSeminars Před 6 měsíci

      Added to the description!

  • @voncolborn9437
    @voncolborn9437 Před 7 měsíci

    Great presentation. It is interesting to see the practical side of running a bunch of LLMs. Ops makes it happen. Coming from the old, really old, school of computing with massive multi-user, time-share systems, it is interesting to see how no matter how much computing changes, aspects of it remain the same. Through-put, latency, caching and scheduling is still central. All that seems to have changed is the problem domain. We do, in deed, live in intereswting times.

  • @suleimanshehu5839
    @suleimanshehu5839 Před 7 měsíci

    Please create a video on fine tuning MoE LLM using LoRa adapters such as Mixtural 8x7B MoE LLM within your framework