Modular Learning and Reasoning on ARC

Sdílet
Vložit
  • čas přidán 8. 09. 2024
  • Speakers: Dr. Andrzej Banburski and Simon Alford​ (Poggio Lab)
    Abstract: Current machine learning algorithms are highly specialized to whatever it is they are meant to do - e.g. playing chess, picking up objects, or object recognition. How can we extend this to a system that could solve a wide range of problems? We argue that this can be achieved by a modular system - one that can adapt to solving different problems by changing only the modules chosen and the order in which those modules are applied to the problem. The recently introduced ARC (Abstraction and Reasoning Corpus) dataset serves as an excellent test of abstract reasoning. Suited to the modular approach, the tasks depend on a set of human Core Knowledge inbuilt priors. We implement these priors as the modules of a reasoning system and combine them using neural-guided program synthesis. We then discuss our ongoing efforts extending execution-guided program synthesis to a bidirectional search algorithm via function inverse semantics.​
  • Věda a technologie

Komentáře • 6

  • @sdmarlow3926
    @sdmarlow3926 Před 2 měsíci +2

    From Q around 22min mark: The tasks are designed to avoid brute force methods, and don't require "world knowledge" or language as a prior. But, more than testing for simple cognitive skills, the point is to have someone build a system that can "see" some new pattern and store that as a new ability. Definitions and benchmarks are not enough if your only goal is to meet that definition or score high on those benchmarks. There is no honor system when it comes to building "AGI" because everyone just takes shortcuts. A system that is actually dynamic, and can go from ARC to ATARI 2600 games to playing Doom in the span of a week is going to use much the same def and benchmarks as everyone else.. but is ACTUALLY different. Of course, saying it's an architecture problem implies all of ML/DL is on the wrong path, which many will take issue with. ;p

  • @brandomiranda6703
    @brandomiranda6703 Před 2 lety +1

    46:20 Current work (or future work) after their initial DreamCoder baselining on ARC (Abstraction and Reasoning Corpus): execution-guided, bidirectional search for program synthesis/how to search for programs the way humans do?.

  • @brandomiranda6703
    @brandomiranda6703 Před 2 lety +1

    doesn't Francois C. have a definition on AGI (informed on cognitive priors) and construct his ARC benchmark based on it? Question based on dicussion at 24:20 ish

  • @DavenH
    @DavenH Před rokem

    26:26 "you can do about 80% of the tasks solved" in the hand-picked subsample of the training set, not the test set... It's not generalizing. Its performance on the test set is undisclosed and not state of the art.

  • @googleyoutubechannel8554
    @googleyoutubechannel8554 Před 9 měsíci +1

    ARC seems like a bunch of random tasks that are heavily human-interacting-with-humanscale-3d-enivronment biased. I can imagine there are perhaps a near infinite array of other patterns that could form ARC tasks, but don't... because the researches didn't include them, for no other reason, than the researchers are humans with eyeballs that take in information basically as a 2d array, so are biased towards certain types of patterns. There doesn't seem to be any, even rudimentary framework, that underpins ARC tasks other than 'this particular researcher thought they were a good idea'? This is the first LLM benchmark I've looked into, and I have a sinking feeling the whole field is like this....
    *example of one of a huge set of patterns these human researcher didn't pick, but could be just as 'valid' as an ARC task: (if you have no framework for validity, which they don't) sequence of increasing numbers in binary that are huffman encoded

    • @sdmarlow3926
      @sdmarlow3926 Před 2 měsíci

      The tasks are built around "simple" cognitive priors, such as counting, flipping or mirroring, and directionallism (that lines and shapes extend in different directions). Of the hundreds of tasks, there are only a handful of these priors (the point of an ARC 2 is to have no one "operation" happen more than once out of all the samples).