[ML News] Chips, Robots, and Models

Sdílet
Vložit
  • čas přidán 22. 05. 2024
  • OUTLINE:
    0:00 - Intro
    0:19 - Our next-generation Meta Training and Inference Accelerator
    01:39 - ALOHA Unleashed
    03:10 - Apple Inks $50M Deal with Shutterstock for AI Training Data
    04:28 - OpenAI Researchers, Including Ally of Sutskever, Fired for Alleged Leaking
    05:01 - Adobe's Ethical Firefly AI was Trained on Midjourney Images
    05:52 - Trudeau announces $2.4billion for AI-related investments
    06:48 - RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
    07:15 - CodeGemma - an official Google release for code LLMs
    07:24 - Mistral AI: Cheaper, Better, Faster, Stronger
    08:08 - Vezora/Mistral-22B-v0.1
    09:00 - WizardLM-2, next generation state-of-the-art-LLM
    09:31 - Idefics2, the strongest Vision-Language-Model (VLM) below 10B!
    10:14 - BlinkDL/rwkv-6-world
    10:50 - Pile-T5: Trained T5 on the Pile
    11:35 - Model Card for Zephyr 141B-A39B
    12:42 - Parler TTS
    13:11 - RHO-1: Not all tokens are what you need
    14:59 - Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
    References:
    / 1780263768968273923
    ai.meta.com/blog/next-generat...
    soumithchintala/s...
    deepnewz.com/tech/apple-inks-...
    TolgaBilge_/statu...
    javilopen/status/...
    paulg/status/1781...
    / 1776706907295846628
    www.cbc.ca/news/politics/fede...
    arxiv.org/pdf/2404.07839
    huggingface.co/blog/codegemma
    mistral.ai/news/mixtral-8x22b/
    MistralAILabs/sta...
    huggingface.co/Vezora/Mistral...
    huggingface.co/Vezora/Mistral...
    WizardLM_AI/statu...
    _philschmid/statu...
    huggingface.co/BlinkDL/rwkv-6...
    blog.eleuther.ai/pile-t5/
    huggingface.co/HuggingFaceH4/...
    huggingface.co/MaziyarPanahi/...
    reach_vb/status/1...
    arxiv.org/pdf/2404.07965
    arxiv.org/pdf/2404.05719
    sambanova.ai/blog/samba-coe-t...
    www.microsoft.com/en-us/resea...
    twelve_labs/statu...
    drive.google.com/file/d/1Av5j...
    arxiv.org/pdf/2404.12387
    arxiv.org/abs/2404.12241
    arxiv.org/pdf/2404.12241
    / 1780650122382049596
    audiodialogues.github.io/
    os-world.github.io/
    ai.meta.com/blog/openeqa-embo...
    arxiv.org/pdf/2404.07503
    arxiv.org/pdf/2404.06654
    amanrsanger/statu...
    huggingface.co/datasets/xai-o...
    github.com/PygmalionAI/aphrod...
    github.com/jina-ai/reader/?ta...
    r.jina.ai/x.com/elonmusk
    r.jina.ai/github.com/...
    github.com/rogeriochaves/lang...
    mvpatel2000/statu...
    github.com/databricks/megablocks
    github.com/nus-apr/auto-code-...
    github.com/nus-apr/auto-code-...
    karpathy/status/1...
    karpathy/status/1...
    github.com/BasedHardware/Friend
    argmaxinc/status/...
    awnihannun/status...
    / 1776399292036501898
    pytorch.org/blog/torchtune-fi...
    If you want to support me, the best thing to do is to share out the content :)
  • Věda a technologie

Komentáře • 73

  • @lone0017
    @lone0017 Před 22 dny +80

    wow you are really productive on CZcams lately, good for us

    • @EdFormer
      @EdFormer Před 22 dny +7

      Couldn't agree more. I find that my appetite for machine learning research is a strictly monotonically increasing function of Kilcher's productivity on CZcams. As a postdoc, that's quite significant lol

    • @lone0017
      @lone0017 Před 22 dny +5

      @@EdFormer I find Kilcher's videos to be the path of least resistance for me to keep up with ML research. As a working person trying to get into a PhD programme, that's extremely valuable :D

    • @EdFormer
      @EdFormer Před 22 dny

      @@lone0017 you're getting the right skeptical and evidence-based perspective for your goals while being brought up to date too, which most other channels in this day and age really don't provide. Best of luck with your applications!

    • @makhalid1999
      @makhalid1999 Před 21 dnem +1

      Don't jinx it

  • @the_primal_instinct
    @the_primal_instinct Před 22 dny +45

    1:58 No, Yannic, the difference between high cost robots and low cost robots is their price.

  • @andreamarisio
    @andreamarisio Před 21 dnem +18

    Here the missing chapter timings!
    15:42 Samba
    17:06 Vasa-1
    17:55 Pegasus-1
    18:17 Reka
    19:58 AI safety introduction v0.5
    20:20 Longer context benchmark x post
    21:17 New Audio Dialogue Dataset
    22:33 OSWorld: Agents Environment
    24:39 OPENEQA AR QA Dataset
    26:41 Best Practices and Lessons Learned on Synthetic Data for Language Models
    27:07 Ruler: Real Context size paper
    29:57 Aman's post: SWE-bench contaminated supposition
    31:28 Realwor1dQA Dataset
    32:34 Repositories
    36:13 X Posts
    38:09 torchtune: fine tuning library from Pytorch

  • @TheTruthOfAI
    @TheTruthOfAI Před 17 dny +2

    A scientific channel, looking at papers. Yessss sir

  • @thenoblerot
    @thenoblerot Před 22 dny +4

    Holy cow, you're turning out the videos. Thank you so much for sharing your expertise

  • @OperationDarkside
    @OperationDarkside Před 22 dny +11

    "CZcams, Audio, No"
    The content I'm here for.

  • @meselfobviouslyme6292
    @meselfobviouslyme6292 Před 22 dny +2

    Thank you Mr Yannik for giving us the latest updates in this edition of Machine Learning news.

  • @lexer_
    @lexer_ Před 22 dny +3

    I would be really interested in a video about the different cutting-edge fine tuning methods and how they differ.

  • @RollingcoleW
    @RollingcoleW Před 21 dnem

    Keep it up! Forgot about this series but like it!

  • @jayl9053
    @jayl9053 Před 22 dny +1

    Another certified Tuesday banger 😎

  • @herp_derpingson
    @herp_derpingson Před 22 dny +3

    39:00 Well you see... the difference between high cost robots and low cost robots is the cost.

  • @wwkk4964
    @wwkk4964 Před 22 dny +2

    Best AI youtube channel!

  • @luke2642
    @luke2642 Před 22 dny +2

    An excellent video. I wonder if the RHO concept is more profound than you give credit for, it seems inherently better if you could filter out bad tokens, identify the most meaningful tokens, and not necessarily need synthetic data? Token quality is more important than token quantity for humans, why shouldn't it be the same for LLMs?

  • @mohamedhatem325
    @mohamedhatem325 Před 17 dny +1

    very good video

  • @pladselsker8340
    @pladselsker8340 Před 22 dny +1

    I would love to have your point of view on what Extropic is building.

  • @axelmarora6743
    @axelmarora6743 Před 22 dny +2

    25:20 😂😂said so casually!

  • @kaikapioka9711
    @kaikapioka9711 Před 21 dnem +1

    Thx yan!

  • @mahdipourmirzaei1048
    @mahdipourmirzaei1048 Před 22 dny +1

    I am looking for a paper review video about Rho-1!

  • @wiktorm9858
    @wiktorm9858 Před 22 dny +1

    Very informative video

  • @EsotericAI
    @EsotericAI Před 21 dnem

    Gpt2-chatbot could be openAI gpt5, I guess it automatically pre-process input text as low colordepth image (containing the text). So any text prompt is feed into the model as text+image. The ascii-art performance is a tell of something like that.

  • @chispun2
    @chispun2 Před 22 dny +1

    In most of cases we don’t know what are the training data for LLMs. How can we be sure those benchmarks were not part of their training data?

    • @therainman7777
      @therainman7777 Před 22 dny

      You can’t, that’s why you need to come up with novel evals of your own creation, which you never publish anywhere.

    • @john_blues
      @john_blues Před 22 dny

      Teaching to the test we used to call it in K12 education.

  • @tomCatzy
    @tomCatzy Před 22 dny +1

    Since i began following here - my AI comprehension lately skyrocketed.

  • @pedrogorilla483
    @pedrogorilla483 Před 22 dny +1

    Yes!

  • @calvingrondahl1011
    @calvingrondahl1011 Před 22 dny +1

    You are fun Yannic 😎

  • @neq825
    @neq825 Před 21 dnem

    The robot arm used by Aloha is the TrossenRobotics ViperX 300. Costs $6129.95

  • @schilling3003
    @schilling3003 Před 21 dnem

    I just wish there were better options for DIY local/edge inference. Basically nothing since the Google Coral Edge TPU 5+ years ago.

  • @JerryFederspiel
    @JerryFederspiel Před 20 dny

    The last fabric manipulation demo I saw maybe last year looked pretty similar... but the video was sped up 64x. The Google demo was actual speed, tremendously better.

  • @damienteney
    @damienteney Před 22 dny +2

    Any chance you can release these as podcasts again? I have to rip the youtube videos and convert to mp3 every time to get them on my music player.

    • @therainman7777
      @therainman7777 Před 22 dny

      Who owns a music player in 2024? Not trying to be a dick. But can’t you use your phone?

  • @joe_limon
    @joe_limon Před 22 dny +1

    RHO-1 seems like a perfect application for mobile ai's

  • @Timotheeee1
    @Timotheeee1 Před 21 dnem +3

    8:10 the 22b model is nonsense, it performs worse than a 3b model. makes sense since averaging the experts just gets you a randomly initialized FFN with some extreme outlier weights

  • @donmiguel4848
    @donmiguel4848 Před 22 dny +1

    @0:33 Int operations are ops, not flops.

  • @leeme179
    @leeme179 Před 22 dny +1

    OpenAI has signed deal with Microsoft I think in 2018, it took them really long to create their own chip

  • @SteffenProbst-qt5wq
    @SteffenProbst-qt5wq Před 21 dnem

    16:04
    gpt2-chatbot is a top performing not (yet) public new model. A recent (30.04) X post from Sam Altman hints at exactly this. It is currently being tested in the Hugging Face Chatbot Arena.
    It appears to be a GPT-2 model trained using OpenAI's industry-leading training methods.

  • @ariaden
    @ariaden Před 22 dny +2

    1. Low cost robots: Wake me up when they become good enough in sorting garbage to make plastic recycling profitable.

  • @PCPTMCROSBY
    @PCPTMCROSBY Před 22 dny

    good show today.

  • @john_blues
    @john_blues Před 22 dny +1

    Mmmm, chips. Oh, those kind of chips. Can't eat those.😞

    • @KEKW-lc4xi
      @KEKW-lc4xi Před 22 dny

      TLC will find someone to challenge that notion.

  • @ACLozMusik
    @ACLozMusik Před 21 dnem

    So, the typical UI is optimized for non-technical human users for accesibility, since they are the expect end-users. We have UI for technical people with a mix of performance and usability
    How would a UI designed for models/machines and optimized for performance look like? How much performant could a model trained on it become? It's a bit of a loop thinking about an UI optimized for a model and a model optimized for an UI. At the end the model could become a more powerful UI getting closer to both the user through natural language and conversation and also closer to the machine doing the actual work
    Basically another layer of high-low language

  • @jeffwads
    @jeffwads Před 22 dny +1

    GPT-2 solves the Aunt Agatha riddle correctly. Even Claude Opus fails to do that.

    • @therainman7777
      @therainman7777 Před 22 dny +6

      Using well-known riddles and puzzles is a really terrible test of intelligence. The model could easily just be recalling the solution from its training data.

  • @dogme666
    @dogme666 Před 22 dny +1

    vroom vroom vroom gpu vroom vroom vroom

  • @biesman5
    @biesman5 Před 21 dnem

    Did you hear that George Hotz hacked the RTX 4090 driver to enable P2P? I'm pretty interested of the implications this has for running local LLMs, really fascinating stuff. It was on Hacker News' front page a few weeks ago!

  • @TheRev0
    @TheRev0 Před 22 dny +1

    and so on and what not

  • @LouisChiaki
    @LouisChiaki Před 21 dnem

    "Has anyone heard of Tensorflow lately?" Google engineers who is still using TF: crying behind the scene.

  • @TiagoTiagoT
    @TiagoTiagoT Před 20 dny

    Oh, and talking about ideas I don' t have the money to test (see my other comment here for another crazy idea); could someone please try to train a model with training-time random ablation of both parameter count and quantization level, all on the same training, so that the final model from the get go is already optimized to work at whatever ablation level of parameter count and quantization derived versions end up having, essentially training not only for the outputs but also for sorting it''s own parameters and bits by how important they are?

  • @FredPauling
    @FredPauling Před 21 dnem

    Canada investing in EhI

  • @TiagoTiagoT
    @TiagoTiagoT Před 20 dny

    Could someone with money to burn on dataacenter waste heat please test if it would work to have a model that is trained to re-write a certain fraction of it's own weights at inference time (half, a third, 10 percent etc); sorta like if instead of predicting tokens directly, it first predicts some of it's own weights based on the inputs; possibly also comparing if it's better to keep just certain layers unfrozen, just certain neurons unfrozen, let it decide what to freeze at training time, or even not freeze anything in specific and just let it "predict". not only the weights values but also which neurons to change as well Or some variation of this idea that makes a little more sense if what I described has some details that don't really match how the pieces fit together or whatever.

  • @nahuelpiguillem2949
    @nahuelpiguillem2949 Před 20 dny

    5:59 $2.4 billion for some ai programm, $100 million for house problems, I guess what are the priorities of the governemnt

  • @YRTB2830
    @YRTB2830 Před 22 dny +1

    just tried idefics, kinda suck on zero shot task to be honest...

  • @NeuroScientician
    @NeuroScientician Před 22 dny +1

    Ilja used to make cool tutorials, hope he hasnt been Jack Ma-ed :D

  • @codewithtj
    @codewithtj Před 21 dnem

    Can you please make a video on AI Report 2024

  • @calvingrondahl1011
    @calvingrondahl1011 Před 22 dny

    I like therefore I am…

  • @eladwarshawsky7587
    @eladwarshawsky7587 Před 20 dny

    just casually calling out tensorflow lol

  • @leeme179
    @leeme179 Před 22 dny +1

    How many Google robots does it take to hang a shirt...

    • @KEKW-lc4xi
      @KEKW-lc4xi Před 22 dny +2

      The aloha project is pretty neat. If I had an extra $20k laying around I'd build my own and tinker too. It's all open source

  • @memegazer
    @memegazer Před 7 dny

    Yannic...these companies are designing chips, they are not making them.
    There is a real bottleneck there, there are like 3 major chip manufactures, and geographically their fabs are very centralized.
    It is a major source of limitation on supply, and a major source of geopolitical tension.
    But I digress.

  • @ZooDinghy
    @ZooDinghy Před 22 dny

    I don't get Shutterstock? Arent they digging their own grave? Selling a photo for 1€ so that nobody has to ever buy it again from them? That doesnt make sense to me.

    • @AvastarBin
      @AvastarBin Před 21 dnem

      I think their logic is like : if someone is gonna dig my grave, might as well be paid for it

    • @ZooDinghy
      @ZooDinghy Před 21 dnem

      @@AvastarBin with $875 million in revenue, one would assume they would focus on building that thing themselves. Their financial targets for 2027 is $1.2 billion according to the Q4/2023 earnings call. The first thing I would do is to find my biggest competitor, suggest a merger and bet everything on the card that they will be the ones with the larges high quality corpus and they would sue anyone out there who uses their pictures for training.

  • @hansdietrich1496
    @hansdietrich1496 Před 21 dnem

    Dear Yannic, with all due respect: This green screen causes eye cancer. Please just put _ANYTHING_ there :) Thanks!

  • @jeffspaulding43
    @jeffspaulding43 Před 22 dny +1

    I think the whole "Ally" of suskever is thrown in the title cause our brains autocorrect to "Ilya suskever" stupid clikbaiters

  • @kalilinux8682
    @kalilinux8682 Před 22 dny +1

    Some of the news seems old

  • @MoFields
    @MoFields Před 22 dny +4

    I think bro misunderstands the concept of green screen 😂😂😂😎🟩