Francois Chollet recommends this method to solve ARC-AGI

Sdílet
Vložit
  • čas přidán 10. 06. 2024
  • ARC Prize is a $1,000,000+ public competition to beat and open source a solution to the ARC-AGI benchmark.
    Hosted by Mike Knoop (Co-founder, Zapier) and François Chollet (Creator of ARC-AGI, Keras).
    --
    Website: arcprize.org/
    Twitter/X: / arcprize
    Newsletter: Signup @ arcprize.org/
    Discord: / discord
    Try your first ARC-AGI tasks: arcprize.org/play

Komentáře • 34

  • @el_chivo99
    @el_chivo99 Před měsícem +13

    chollet has been my favorite voice on AI for 5+ years and I don't see that changing!

  • @davefaulkner6302
    @davefaulkner6302 Před měsícem +4

    So he is saying that LLMs guiding hyperparameter space will get us to AGI? Seems a little simplistic to me ... and there are better ways to search that space.

  • @2394098234509
    @2394098234509 Před měsícem +1

    Love this

  • @BigFatSandwitch
    @BigFatSandwitch Před měsícem +4

    The question is if someone manages to get high score in ARC style problems will they really share the code for such little amount of money rather than raising money from venture capital farms based on that for their startup.

    • @ARCprize
      @ARCprize  Před měsícem +3

      We encourage an open source solution to ARC-AGI!

  • @-mwolf
    @-mwolf Před měsícem +1

    These recent code diffusion models sound simmilar to this idea 🤔

  • @wwkk4964
    @wwkk4964 Před měsícem +2

    This can be achieved fairly easily by employing a diffusion based LLM reasoning module that has accurate image captioning capabilities that ignore irrelevant details when feeding the llm during inference or test time?

    • @Hohohohoho-vo1pq
      @Hohohohoho-vo1pq Před měsícem +2

      Stop overusing the term LLM. GPTs are not LLMs. LLMs are GPTs.

    • @wwkk4964
      @wwkk4964 Před měsícem

      @@Hohohohoho-vo1pq please refer to the presentation here. Author of the research calls it LLM. czcams.com/video/kYtvqbgCxFA/video.htmlsi=nrFEIED7mmZAE_Zf

    • @RecursiveTriforce
      @RecursiveTriforce Před měsícem +6

      ​@@Hohohohoho-vo1pq
      LLMs can be GPTs.
      GPTs can be LLMs.
      GPT is the architecture. LLM is the size.
      They neither imply nor contradict each other.

    • @Hohohohoho-vo1pq
      @Hohohohoho-vo1pq Před měsícem +2

      @@RecursiveTriforce reducing GPTs to as "mere LLMs" is very misleading. People don't even understand what that means when they say that.

  • @fayezsalka
    @fayezsalka Před měsícem +6

    But we, as humans, don’t do tree search when solving ARC. We solve it in one shot, almost immediately without trying / searching through different solutions, because the spacial patterns look very obvious, no?
    Large multimodal models with ability to input AND output images natively will be able to solve this in one shot. Case in point: gpt5 with native image output.

    • @stevo-dx5rr
      @stevo-dx5rr Před měsícem +2

      I’m not a researcher, but the notion that ‘this is obviously not what humans do’ seems moot given that the same can be said about transformers.

    • @ARCprize
      @ARCprize  Před měsícem +9

      > We solve it in one shot
      > Spacial patterns look very obvious
      The human brain is very good at having 'intuition' to prune a search space. Though it may happen quickly, there are many (maybe infinite) possibilities for decisions humans can make, but yet we prune that down to just a few very quickly

    • @stevo-dx5rr
      @stevo-dx5rr Před měsícem

      @@ARCprize What do you think of MITs “Introduction to Program Synthesis” course as a starting point?

    • @fayezsalka
      @fayezsalka Před měsícem +1

      @@ARCprize this implies that multi modal models can do much better if there is an internal “for loop” that allows it to iterate through different solutions in the hidden space manifold before decoding into output. Only question becomes how to train such a model and what dataset / loss function objective can be used.
      Alternatively, would we be able to imitate such a process by having the “thinking” happen in the decoded output in an auto regressive fashion? We human have the ability to “think out loud” as one option and having an LLM think in the decoded output space might make it easier and more familiar to train
      (Basically, is chain of thought considered one crude form of discreet search?)

    • @shawnvandever3917
      @shawnvandever3917 Před měsícem +1

      We don't do it in one shot. We go through 100s or 1000s of prediction updates to answer a question. We update our mental models in real time while this is happening.

  • @bladekiller2766
    @bladekiller2766 Před měsícem +3

    This is how Stockfish (Chess Engine) works.
    Not sure whether it will lead to AGI.

    • @ARCprize
      @ARCprize  Před měsícem +3

      We'd love a submission that tried this approach to see how it goes - super interesting

  • @bedev1087
    @bedev1087 Před měsícem +1

    Did Francois say he would like to see a solution which can solve these puzzles without having been trained on a lot of “ARC like” input output pairs?
    This benchmark seems to exist as a subset of the permutation group of operations on coloured grids
    (expanding/contracting grid, extruding masses, rotating masses, filling in holes, adding single colours, etc…)
    If the “core knowledge” claim is true about ARC, then the discovery of the correct set of “core knowledge” matrix operations could be used to synthetically generate a dataset.
    You could then sample thousands of games from a pre-trained policy network given only the first training input, and reinforce the trajectories closest to the revealed answer.
    Then sample games given the 1st training input, the first training output, and the 2nd training input, and reinforce again before the test set.
    Or is this not allowed?

    • @ARCprize
      @ARCprize  Před měsícem +1

      Francois said he would *like* to see a solution that isn't trained on a bunch of input/output ARC tasks but there isn't a rule that says this isn't allowed.
      If as long as your submission only makes 2 final attempts per task you can use it. This means you can test and iterate on the example pairs as much as you'd like.

    • @bedev1087
      @bedev1087 Před měsícem +1

      ​ @ARCprize Hey thanks for the reply! :)
      You guys obviously have the generating operations locked away, so this method would only be able to train on "ARC like" operations,
      So can I ask what his intuition is on a model being able to center on "core knowledge" priors from training on tasks orthogonal to ARC?
      Thanks for making such a great opportunity for the community 👍

  • @gustavnilsson6597
    @gustavnilsson6597 Před měsícem +2

    Perhaps we want the model to search actively as humans do by manipulating it's environment.
    I don't think this problem is going to be easy to solve.

  • @tycrenshaw6968
    @tycrenshaw6968 Před měsícem +2

    I dont know if he is nervous or what but his is totally red and looks like it is hurting alot.

    • @ARCprize
      @ARCprize  Před měsícem +4

      Francois is on fire, with knowledge.

  • @harithnawalage9355
    @harithnawalage9355 Před měsícem +1

    It's a good idea that the solution has to be made open source.

    • @ARCprize
      @ARCprize  Před měsícem +2

      Awesome, yes, this is to counter act closed-AI research recently