#51 FRANCOIS CHOLLET - Intelligence and Generalisation

Sdílet
Vložit
  • čas přidán 5. 07. 2024
  • In today's show we are joined by Francois Chollet, I have been inspired by Francois ever since I read his Deep Learning with Python book and started using the Keras library which he invented many, many years ago. Francois has a clarity of thought that I've never seen in any other human being! He has extremely interesting views on intelligence as generalisation, abstraction and an information conversation ratio. He wrote on the measure of intelligence at the end of 2019 and it had a huge impact on my thinking. He thinks that NNs can only model continuous problems, which have a smooth learnable manifold and that many "type 2" problems which involve reasoning and/or planning are not suitable for NNs. He thinks that many problems have type 1 and type 2 enmeshed together. He thinks that the future of AI must include program synthesis to allow us to generalise broadly from a few examples, but the search could be guided by neural networks because the search space is interpolative to some extent.
    Panel; me, Yannic and Keith
    Tim Intro [00:00:00]
    Manifold hypothesis and interpolation [00:06:15]
    Yann LeCun skit [00:07:58]
    Discrete vs continuous [00:11:12]
    NNs are not turing machines [00:14:18]
    Main show kick-off [00:16:19]
    DNN models are locally sensitive hash tables and only efficiently encode some kinds of data well [00:18:17]
    Why do natural data have manifolds? [00:22:11]
    Finite NNs are not "turing complete" [00:25:44]
    The dichotomy of continuous vs discrete problems, and abusing DL to perform the former [00:27:07]
    Reality really annoys a lot of people, and ...GPT-3 [00:35:55]
    There are type one problems and type 2 problems, but...they are enmeshed [00:39:14]
    Chollet's definition of intelligence and how to construct analogy [00:41:45]
    How are we going to combine type 1 and type 2 programs? [00:47:28]
    Will topological analogies be robust and escape the curse of brittleness? [00:52:04]
    Is type 1 and 2 two different physical systems? Is there a continuum? [00:54:26]
    Building blocks and the ARC Challenge [00:59:05]
    Solve ARC == intelligent? [01:01:31]
    Measure of intelligence formalism -- it's a whitebox method [01:03:50]
    Generalization difficulty [01:10:04]
    Lets create a marketplace of generated intelligent ARC agents! [01:11:54]
    Mapping ARC to psychometrics [01:16:01]
    Keras [01:16:45]
    New backends for Keras? JAX? [01:20:38]
    Intelligence Explosion [01:25:07]
    Bottlenecks in large organizations [01:34:29]
    Summing up the intelligence explosion [01:36:11]
    Post-show debrief [01:40:45]
    Pod version: anchor.fm/machinelearningstre...
    Tim's Whimsical notes; whimsical.com/chollet-show-QQ...
    NeurIPS workshop on reasoning and abstraction; slideslive.com/38935790/abstr...
    Rob Lange's article on the measure of intelligence (shown in 3d in intro): roberttlange.github.io/posts/...
    Francois cited in the show;
    LSTM digits multiplication code example: keras.io/examples/nlp/additio...
    ARC-related psychology paper from NYU: cims.nyu.edu/~brenden/papers/...
    This is the AAAI symposium Francois mentioned, that he co-organized; there were 2 presentations of psychology research on ARC (including an earlier version of the preprint above): aaai.org/Symposia/Fall/fss20s...
    fchollet.com/
    / fchollet
    / fchollet
    #deeplearning #machinelearning #artificialintelligence

Komentáře • 181

  • @ChaiTimeDataScience
    @ChaiTimeDataScience Před 3 lety +97

    I feel MLST is like a Netflix special of the world of Machine Learning.
    The quality-of the podcast & production just gets better exponentially with every episode!

  • @qadr_
    @qadr_ Před 2 lety +11

    This channel is a treasure. What a great conversation that is full of ideas and insight and experience. I've finally found my passion on CZcams.

  • @stacksmasherninja7266
    @stacksmasherninja7266 Před 2 lety +5

    This has to be my favourite video so far. I keep coming back to this talk whenever I feel like ML is hitting a wall

  • @animebaka2010
    @animebaka2010 Před 3 lety +11

    Wait.. I wasn't prepared for this! What a content.

  • @hamzagamouh2780
    @hamzagamouh2780 Před 7 dny

    Thank you guys. That’s the podcast that we need

  • @ta6847
    @ta6847 Před 3 lety +9

    IT'S FINALLY HERE!!

  • @adityakane5669
    @adityakane5669 Před 3 lety +4

    Progressive disclosure of complexity. Spot on!

  • @drhilm
    @drhilm Před 3 lety +9

    This is one of these talks that will be relevant for many years. You should go back to it in 3 years from now, and review it again... when ARC challenge solution will start to come out...

  • @dginev
    @dginev Před 3 lety +4

    A very very eagerly awaited conversation, thanks to everyone involved!

  • @RobertWeikel
    @RobertWeikel Před 3 lety +2

    This was a great talk. Thank you.

  • @BROHAMMER_OK
    @BROHAMMER_OK Před 3 lety +33

    Damn son, you made it happen

  • @AICoffeeBreak
    @AICoffeeBreak Před 3 lety +23

    What a lovely surprise, the long-awaited episode is out! 😊
    I will come back very soon when I have more time to watch and enjoy it -- I think this episode deserves a proper mind-set. 💪

  • @shyama5612
    @shyama5612 Před rokem +1

    First time listener of MLST. I've to say the host is refreshingly authentic. Enjoyed the whole pod - though some parts went over my head - exactly what you expect from listening to people more knowledgeable than you. Thanks. Keep up the great work!

  • @ZergD
    @ZergD Před 3 lety +3

    Pure gold. I'm in AW about the production quality/lvl and of course content! Thank you so much!

  • @_tnk_
    @_tnk_ Před 3 lety +5

    10/10 episode, and that debrief was super good. Really interesting ideas all around.

  • @Ceelvain
    @Ceelvain Před 3 lety +4

    1:54:50 The idea that conciousness is at the center of intelligence is very much what consciousness wants us to think. We believe we're in control. When in fact, we're mostly not. The consciousness can query and command other parts of the brain, but those operate on their own.

    • @teslanewstonight
      @teslanewstonight Před 2 lety

      I like how you simplified this complex topic. 🤖🧡

  • @miguelangelquicenohincapie2768

    Wow, this is really one of the best talks about DL and AGI i've ever saw, thanks for this, you just won a new suscriber

  • @DavenH
    @DavenH Před 3 lety +38

    Your editing on this one is stunning. Fitting for such a guest!

    • @AICoffeeBreak
      @AICoffeeBreak Před 3 lety +8

      One can see the passion in the care that has gone into editing, right? 😍

    • @teslanewstonight
      @teslanewstonight Před 2 lety +1

      @@AICoffeeBreak I believe so. Close-up shots can be revealing when noticing adult faces attempting to hide excitement and glee. I love the AI/AGI community. 🤖🧡

  • @machinelearningdojowithtim2898

    Oh my god.... here we go!!!! ❤🤞😃😃😃

  • @benibachmann9274
    @benibachmann9274 Před 3 lety +2

    Thank you for yet another fantastic episode. Incredible!

  • @Mutual_Information
    @Mutual_Information Před 3 lety +5

    This is a great listen! Makes me think... ML experts are so attuned to it's problems - huge data for only local generalization, extrapolation is super hard, challenges in translating information between domains (e.g. images vs audio vs text) - whereas the rest of the world thinks sentient robots are around the corner.

  • @rock_sheep4241
    @rock_sheep4241 Před 3 lety +2

    The most awaited episode

  • @LiaAnggraini1
    @LiaAnggraini1 Před 3 lety +2

    Yay I learned a lot from his book when I started to learn deep learning. Thank you. Hopefully you can bring more people like him in the next episodes.

  • @DataTranslator
    @DataTranslator Před 2 měsíci

    This is incredible 😮
    I’m delighted I found your channel 🎉

  • @mobiusinversion
    @mobiusinversion Před 3 lety +1

    MLST is awesome. I love this cadence, fun, humor and synthesis of so many good ideas.

  • @sedenions
    @sedenions Před 3 lety +1

    Excellent. You inspired me to pick up this same book by Chollet. Like I said before I'm from neuroscience but the amount of potential in this field is amazing. Thank you MLST.

  • @jamieshelley6079
    @jamieshelley6079 Před 3 lety +4

    I'm so glad the ideal of generalisation is becoming more popular and hopefully the flaws in deep learning to acquire this pattern will be realised.

  • @ratsukutsi
    @ratsukutsi Před 3 lety +1

    What a gem ladies and gentlemen!

  • @angelomenezes12
    @angelomenezes12 Před 3 lety +3

    What an awesome episode Tim!! Your editing skills are getting great! 💪

  • @muhammadfahim8978
    @muhammadfahim8978 Před 3 lety +1

    Thank you for such a awesome talk.

  • @Gigasharik5
    @Gigasharik5 Před 5 měsíci

    Brilliant talk

  • @behrad9712
    @behrad9712 Před 3 lety +1

    scientific analysis in combination with beautiful animations! great job!

  • @mahdinasiri6848
    @mahdinasiri6848 Před 3 lety +2

    The quality of the content is lit! Nice job

  • @snarkyboojum
    @snarkyboojum Před 24 dny

    I miss these podcasts with the three of you! Dr Yannic, Dr Doug and Dr Tim are a dynamite combo. Bring back Yannic :D

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  Před 24 dny

      I know! We will be interviewing Chollet again in August and Yannic and Keith will be featuring in it too.

  • @diatribes
    @diatribes Před rokem

    This is such a brilliant episode. Can't believe I'm just finding out about this channel.

  • @jeff_holmes
    @jeff_holmes Před 3 lety +24

    My favorite quote: "Intelligence is about being able to face an unknown future, given your past experience."

    • @muzzletov
      @muzzletov Před 3 lety +1

      How about a whale outliving you for about 60+ years?

    • @martinbalage9225
      @martinbalage9225 Před 3 lety

      How about an organism just born, no past = no intelligence, or do we then extend the past to the past of the process of physics, where DNA, chemistry, particles will lead you? To a big bang start of the entropy? Sounds like god again. So either you externalize the intelligence, or internalize, or integrate via a whole other question.

    • @EricFontenelle
      @EricFontenelle Před 3 lety

      @@martinbalage9225 What the fuck are you even saying?! lmao
      You are interpreting "unknown future" without the proper context. Look up "Known Knowns, Known UnKnowns, Unknown Knowns, AND Unknown Unknowns" -- you should get it then.

    • @martinbalage9225
      @martinbalage9225 Před 3 lety

      @@EricFontenelle I apologize for creating a confusing experience for you. I was writing up something for a disproportionate amount of time, but that is not a way. Feel free to ponder on my cryptic reply, and if you find anything that you can shape into a more specific question than "the fuck you sayin", then feel free to ask.
      Thank you for your effort with the unknowns, but that is unfortunately quite misguided at the moment, and I rather give the benefit of doubt, and ignore the little evidence on your behalf as inconsequential so far. Again, if you have any substance feel free to follow that.

    • @falizadeh60
      @falizadeh60 Před 3 lety +1

      @@muzzletov ح

  • @abby5493
    @abby5493 Před 3 lety +3

    Wow best video you have ever made 😍

  • @bluel1ng
    @bluel1ng Před 3 lety +6

    We have to clarify where DL starts and where it ends (when discussed in the context of general capabilities AI/AGI & generalization), e.g. is a system like MuZero that combines neural networks with MCTS in a RL setup still deep learning or is just the neural network that it employs internally the deep learning part to learn a world model, value function and a policy? Same question applies for neural guided program synthesis. I would even argue that the deep-learning 'interpolation' part in most complex interesting AI systems is only one ingredient, take voice assistant, a self driving car or any robotics application as examples.
    Regarding inter/extrapolation - I think you (Tim) are right that there is no more than tiny out-of-training-set extrapolation for MLPs, ConvNets etc (simply as by-product of relatively smooth functions for MLPs and ConvNets have an architectural form of generalization by extreme weight-sharing, e.g. translated inputs result on same outputs translated on the feature maps). But when memory comes into play, e.g. RNNs with attention or transformers I am no longer 100% sure that this restriction holds. Memory allows to load/save or select and pass arbitrary source information and map it. E.g. if a system (take transformers, or Neural Turing Machine) learns to select inputs only by position and pass and project whatever value they find in a value slot an algorithm for arbitrary inputs is created (potentially generalizing to inputs never seen during training)... we could discuss to which extend this form of generalization is found by gradient descent and how to 'motivate' a system to find compact generalizations instead of 'cheap' memoization. To me this is part of the magic that surrounds GPT.
    Regarding glitchy models and symbolic processing: This seems not to be a big issue in NLP .. with byte-pair encoding and reasonable sampling strategy a transformer model like GPT-* has a very acceptable 'glitch-level' - as human I struggle more abiding to syntactic and grammatical rules than GPT, e.g. closing the right number of brackets or tags in transformer generated source, of closing strings, comment etc.

  • @bntagkas
    @bntagkas Před 3 lety +2

    i define intelligence as a function of being helpful to yourself and being helpful to others
    i believe this to be the correct definition and problems become unsolvable once you are using a wrong one or non at all

  • @teslanewstonight
    @teslanewstonight Před 2 lety +2

    I love this channel, amazingly inspirational interviews like these, and this awesome community. Love & prosperity to you all. 🤖🧡
    #AI
    #AGI
    #Robotics

  • @coder8i
    @coder8i Před 3 lety +1

    Looking forward to this one. Goes with a healthy lunch.

  • @CristianGarcia
    @CristianGarcia Před 3 lety +1

    Hype! Thanks for this :)

  • @martinschulze5399
    @martinschulze5399 Před 3 lety

    Great work!

  • @ShayanBanerji
    @ShayanBanerji Před 2 lety

    CZcams and such a HQ material. Kudos to MLST

  • @MrjbushM
    @MrjbushM Před 3 lety +1

    Excellent video, very informative I love the ideas shared here, I do not have a masters degree or Ph.D. like you guys, I am only an average Java developer with DL as a hobby but I agree with Chollet regarding we need a different approach to artificial general intelligence, for Type 1 stuff DL is well suited for the reasons discussed in the video, for Type 2 we need other approaches like the DreamCoder paper explained by the Yannic in his channel, Another idea that fascinated me is the "Neural-Symbolic VQA: Disentangling Reasoning
    from Vision and Language Understanding paper", the idea of neurosymbolic approach I think is valid also, in the end, those approaches I think will not solve AGI, but there are a good baby step, the next baby step towards that goal. And yes I also think only neural networks are not the right path to AGI, for now, while we figure out how to do that, we need to experiment with approaches like DreamCoder paper and the ideas Chollet shared in the video.

  • @shailendraacharya
    @shailendraacharya Před 2 lety

    Why was it hidden from me for so long? It's pure gem. Thank you so much 😍🎉🎉

  • @dr.mikeybee
    @dr.mikeybee Před 3 lety +1

    Thanks for the advice. I just got a copy of the book.

  • @vikidprinciples
    @vikidprinciples Před rokem

    Excellent

  • @matt.jordan
    @matt.jordan Před 3 lety +1

    absolute legend

  • @Larrythebassman
    @Larrythebassman Před 2 lety

    Well that was a wonderful video I think I learned three brand new phrases in relationship to artificial intelligence thank you very much

  • @pani3610
    @pani3610 Před 3 lety +2

    goldmine❤️

  • @sabawalid
    @sabawalid Před 3 lety +1

    Excellent question @Yannick: is there a continuum between the two types of problem solving (discrete and continuous) - because they are in the same substrate and working together on solving problems co-operatively (presumably). Excellent point/question

  • @davidbayonchen
    @davidbayonchen Před 3 lety

    Awesome podcast. I subscribed right away. I like how you all listen and not talk over one another. Keep it up!

  • @GameDevNerd
    @GameDevNerd Před rokem +1

    We really value and love this content, and I am working on applying the latest machine-learning and AI theories, models, tools, etc to game and simulation development, real-time 3D, devops and other areas ❤‍🔥

  • @videowatching9576
    @videowatching9576 Před 2 lety

    Amazing show, really appreciate hearing talking about the nuances of AI, and how it could connect to applications now or in the near future or beyond.
    I would suggest a Playlist that in particular identifies the especially ‘applied’ versions - for example, talking through media generation models, and how those get used, or LLMs and business cases - while also being tightly connected to the AI work going on, including specifically what is enabled now, what the constraints are, what obstacles to overcome to get to enabling what kinds of capabilities, etc. For example, what’s between point A and point B to get to a place where a given creator can make an especially interesting / useful / entertaining video?
    For instance, including various AIs: humor generation / assessment, special effects, editing, story suggestion / modification, etc. Already certainly a lot that creators can do - but presumably way more to be unlocked. For instance, text-to-image generation allows for some pretty remarkable expression - and open question being about what text-to-video enables, or text-to-editing, etc.
    And then there’s the question of compounding of those creator capabilities, as well as AI’s enabling high quality recommendations of content etc.

  • @mfpears
    @mfpears Před 2 lety +5

    7:35 99% of software today will not be deep learning because it's algorithmic.
    7:55 Neural networks can't represent the scalar identity function
    11:07 Image models struggle drawing straight lines
    11:38 Predicting the digits of pi, finding prime numbers, sorting a list
    12:45 Human reasoning is guided by intuition - interpolative. Abstraction is key to generalization, and performed differently in discrete vs continuous
    14:10 GPT-3 failed his ARC tasks
    14:20 Neural networks are not Turing-complete. Nth digit of pi requires unbounded memory
    15:03 You can train a neural network to multiply integers of fixed width together, but it will always have errors.
    15:30 But you can augment them with unbounded memory and iteration...
    22:30 is the data we give deep learning special?

  • @VijayEranti
    @VijayEranti Před 3 lety +2

    Really great session. Imho: intelligent learnt inference loop (may use gradient descent with continous feedback of results of interpolation or extrapolation) like manual Tta or test time augmentation is an example manual baby step to program synthesis of discrete components (bengio's rim cells another learnt than manual way of tta). Hopefully having more powerful inference loop (program learnt recursively) may be the direction to go.

  • @marilysedevoyault465
    @marilysedevoyault465 Před 2 lety

    You all do an amazing work, thanks for sharing. This is probably of no use, but just in case... About abstraction/generalisation... I wrote this to Mr. Hawkins yesterday before seing Mr. Chollet's video, and it might relate : " I'm sorry if it is anoying, and sorry for the mistakes, because I'm French speaking, and maybe it isn't of any use at all, cause I'm no specialist, only an artist I guess. But I'm sharing this little hypothesis : Let say all the mini columns in an area all learn the same thing, sequences of events in a chronological order. All a human went through or learned related to this area(let say visual memory) is there in every mini columns: all the sequences respecting the chronology, like if it is absolutely small layers of events stored in each mini column. Obviously there is some forgetting, but there is a lot there. Now lets talk about the predictions or creativity. When making a prediction or creating a mental image, could different mini columns jump at different layers of the chronology (different moments of life), seeking identified sequences of the same object, all this for predictions. The intelligence part would be to melt all these similar sequences from different moments of life in one single prediction ? Let say I saw a cat falling when I was ten years old, and I saw many cats falling on television, and many cats falling on facebook. Some minicolumns would bring back the cat at ten years old, other minicolumns some cat on facebook, and other minicolumns a falling cat on television, and melting all these sequences together, I could predict or hope that my own cat would fall on its feet while falling down. Is it what you mean when you say they vote ?"

  • @JousefM
    @JousefM Před 3 lety +1

    Nice one!
    What program do you use for your intro animations?

  • @oncedidactic
    @oncedidactic Před 3 lety +2

    So first off thank you guys for an always excellent conversation, and congrats on meeting your heroes ;)
    There is really nothing to disagree with at all in the convo, which is marvelous, sugary crystallized insights as usual. Two things-
    1 The interpolation hypothesis needs substantial interrogation, though it’s admittedly catchy and powerful. Not to prove/disprove, but because it will teach us more about hard-to-reason-about things. E.g. can you contrive training data that gives a good approximation of extrapolation, artificially? If so, is this learnable? Etc.
    2 While human brain is the obvious lighthouse for AGI, this convo seemed particularly anthropocentric. Which to me is a quiet warning bell that there is far more to be plumbed before setting foundations. As in, chasing AGI via generality via abstraction is making an engineering project out of a philosophical venture. If you are asking your model to be general, you are asking it to understand the universe. Undoubtedly there is practical insight in assessing applicability of learning and search methods, and ditching hype to do better science. But heed Keith’s mention of duality. For now I think we can only and correctly proceed in an epistemic mode (make better software) and we have a lot of room to run with modern computing. *How do you get knowledge?*. But the true game is ontological. *What is the nature of knowledge?*. And when you start asking to catalogue priors, you might as well be illuminating an encyclopedia with Plato’s Forms. (No page would be truly accurate nor would you ever finish.)
    For a concrete example, talking about appleness, plainly something in the putative capsule-NN-DSL-NN vein could capture familiar important qualities. (Red, round.). But we would have no sense whatsoever of the completeness of representations, just their usefulness i.e. by asking our bot to pick us the tastiest apple or find one that will maximize range from our spud launcher. But what is our sense of apples for comparison? Perceptively, sight, 400-700nm, scent, midrange mass spec, touch haptics fairly crude but sensitive to important characteristics like bruising. Should we consider ecology in deep time? One apple is a cc email of a thought a forest is having. (Or whatever.)
    Point being, it quickly becomes hard to assess whether our discrete problem solver with latent “apple knowledge” is either bug free or has good embeddings, because WHAT IS AN APPLE? And WHAT IS IT GOOD FOR? You and the tree might disagree.
    Nevertheless, our best measure of appleness is from an anthropocentric POV, which means for practical purposes we can agree what is an apple and what it’s good for (cider), until further notice.
    Hence, I see Chollet’s most valuable insight is embodiment, because this frees you of the bottomless pit of ontology, and forgives sloppy epistemology, since pragmatism dominates when you have to live in the world. This also happens to align with us.
    All that said, I love and prefer your guys’ sticking with near-context relevance and actionable ML roadmap discussion, minding real life utility and constraints. It sets you apart, really is unique afaik, giving credence to AGI talk being grounded in sota ML practitioner commentary, and is available at a disgustingly low price (ha). I’m interested, if you read all this, if you are spending any neurons on anthropocentrism as relates to models/priors, not from the philosophizing POV but out of scientific necessity.

    • @DavenH
      @DavenH Před 3 lety +1

      These are excellent points.

  • @ZandreAiken
    @ZandreAiken Před rokem

    Thanks!

  • @yasserdahou5308
    @yasserdahou5308 Před 3 lety +7

    This is just amazing, fascinating. When are you getting Ian Goodfellow? I would be so interesting too

    • @Hexanitrobenzene
      @Hexanitrobenzene Před 3 lety

      Lex Fridman did a good interview with him:
      czcams.com/video/Z6rxFNMGdn0/video.html

  • @DanielCardenas1
    @DanielCardenas1 Před 3 lety +1

    Would appreciate a link to explanation about manifold

  • @dr.mikeybee
    @dr.mikeybee Před 3 lety +1

    I wonder if there are low dimensional manifolds that alone or in combinations can create AGI? Just as deep neural networks find correlative combinations of weighted features, I wonder if complex flexible programs can emerge from mining the many manifolds of the computational universe.

  • @ThichMauXanh
    @ThichMauXanh Před 3 lety +2

    So how do you explain human's brain to do discrete reasoning while being simply a bunch of neurons wiring together?

  • @lusherenren4222
    @lusherenren4222 Před 3 lety +8

    I’d like to see Marcus hutter on this show. Thanku

    • @DavenH
      @DavenH Před 3 lety +1

      Please this.

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  Před 3 lety +1

      Absolutely! Legg and Hutter are on our hit list -- we will invite them. We really hope they want to come on, it would be amazing

  • @jjemine
    @jjemine Před 2 lety

    good content

  • @NelsLindahl
    @NelsLindahl Před 3 lety +1

    Oh I just kept watching... then I needed more coffee...

  • @fast_harmonic_psychedelic

    Hands off mah deep lernin

  • @mfpears
    @mfpears Před 2 lety

    45:00 Can you point to any mechanism in the brain that would support System 2 type thinking? I thought there were pretty much just neurons.

  • @PhucLe-qs7nx
    @PhucLe-qs7nx Před 3 lety +2

    I think the disagreement between Yann and Francois regarding Interpolation / Extrapolation is that they are referring to different definition.
    As Yann said, a new image is unlikely to be "linear combination" of seen images, so it's extrapolation.
    Francois's interpolation is a bit more mainstream, that is a any new image with values inside the range of seen value is interpolation.
    I tend to agree with Yann view, because essentially to interpolation there is no other way assumption aside linear and smoothness. All other priors are for extrapolation.
    As you said in the video, per Francois's interpolation, there is nothing to learn to extrapolate, it's the unknown unknown. The only prior to extrapolate in this case is the meta-learning prior, learn to learn to interpolate.

  • @PeterOtt
    @PeterOtt Před 3 lety

    its finally here! you've been teasing us with this for so long!

  • @tylertheeverlasting
    @tylertheeverlasting Před 3 lety

    In my opinion combining Type 1 and type 2 is all about the user interface. Like how Yannic mentioned people kind of sort of learning programming but not properly, that's only because the user interface allows them to do things in an easy way. Also how abstraction allows better productivity in Ideas for a larger audience, and also creates a smoother gradient for learning.

  • @brunoopermanis5449
    @brunoopermanis5449 Před 2 lety

    Great episode :)
    Regarding NN!=turing machines - You can construct an RNN (carefully choose weights, not train it) that uses hidden state as infinite memory, since hidden state consists of continuous numbers (You can encode any integer or infinite bit string into a real number). In other words hidden state == infinte memory.
    So in theory RNN can be turing machines, although not in practice :)
    Once I came across a paper where RNN was constructed in a way that it worked as a universal turing machine, can't find the paper anymore.

  • @mjeedalharby9755
    @mjeedalharby9755 Před 3 lety

    Yes

  • @mo_daboara
    @mo_daboara Před 3 lety

    Hi,
    Chollet categorized the universal problem space into continuous and descrete somehow overlapping regions, I think if we want to get a true AGI descrete problemes will be something like morphed/superpositioned spaces that are blended into a similar stack of continuous spaces. Instead of thinking of type1 and type 2 thinking, I will argue that (at least in biology) what is happening is a mechanism to kinda seperate some solution graph out of the continuum manifold. this way those (virtual graphic segments) can be reused when facing new unknown problems.

  • @DavenH
    @DavenH Před 3 lety +2

    It's fairly easy to say what extrapolation isn't, but what can you say it IS positively?
    In my current view, extrapolation is enabled by, and nearly always requires, a conjugation of 3 things: a dynamics model, a state, and a simulator, within whose sandbox the dynamics shall be iteratively applied on the state.
    Let's take the case of an algorithm running on a computer. The dynamics are the primitive language operations, the state is the set of arguments to the algorithm (+the state of global vars if applicable), and the simulator is your computer. Everything is crisp and deterministic here. You can do a similar mapping with mathematics. The legal operations following from your axioms are the dynamics (e.g. the way a contradiction propagates back to invalidate a theorem is in the math dynamics model, the laws of logic), the state is the starting (sub)set of theorems and axioms, the simulator is usually the minds of mathematicians.
    But AlphaGo falls within this definition too: its dynamics model is the rules of Go (capturing, winning conditions, turn-taking) + the known interactions of higher level structures, the state is the empty board (or opponent's first move), and the simulator is their deep RL + MCTS algorithm which must have implicitly encoded the dynamics. Slightly less crisp and deterministic, but still able to generate completely new knowledge within the scope of playing Go.
    The more stochastic the dynamics (poker or scrabble say), the less deeply the simulator can concretely extrapolate -- it can only output distributions, usually with ever-higher variance with extrapolation depth, as that variance would grow exponentially as it iteratively compounds; past a point the var would become so great that any output distribution would tend to an uninformative uniform.
    I'm seeing this tripartite dynamics/state/simulator pattern everywhere now.
    So where does GPT-3 fall... GPT-3 seems to have baked in some system dynamics, and can in theory be performing limited simulation, bounded in extent by the sequential processing that 96 transformer layers can accomplish. So it does seem to extrapolate within domain. At least, in some cases it's not obviously regurgitation.

  • @opiido
    @opiido Před 3 lety

    This is amazing - thank you so much. I could do without the background music doing the intro(a bit distracting) - but overall AMAZING

  • @CristianGarcia
    @CristianGarcia Před 3 lety +4

    Having watched the DreamCoder video from Yannic this week paid off 😁 Amazing content!
    I have an open question I wish I could ask Chollet: "Do you believe that you can generally solve the ARC with a system that only trains on the ARC or does it require a system that (like us humans) trains on a much larger domain and then "fine tunes" on the ARC?".

    • @MrjbushM
      @MrjbushM Před 3 lety +1

      Interesting question

    • @TimScarfe
      @TimScarfe Před 3 lety +2

      Yes very interesting question. Chollet is fine with human knowledge priors in the algorithm I think.

    • @badhumanus
      @badhumanus Před 3 lety +1

      I don't think any formal test is needed for AGI. If a robot can walk into a generic kitchen and make a ham and cheese sandwich or a cup of coffee, it has GI. Just saying.

    • @DavenH
      @DavenH Před 3 lety +2

      @@badhumanus I doubt that's a sufficient test either. A narrow set of skills would suffice.

  • @DistortedV12
    @DistortedV12 Před 3 lety

    One thing that troubles me is “a perceptive dsl”. The whole point of deep learning is to learn these perceiving functions yet these functions or “core knowledge priors” are supposed to be composable in code? Has anyone turned an object detection algorithm to raw composable code?

  • @aldousd666
    @aldousd666 Před rokem

    The debate about interpolation, i think it's not actually a problem. The formula we're approximating is derived from interpolation of the training data. If the training data is representative, then we can just take the formula and extrapolate. It won't be 100% accurate. It just has to beat a coin flip to be an advantage. And that's purchase on new territory. A seed for the next experiment.

  • @jon0o0o0
    @jon0o0o0 Před 3 lety

    When he is talking about the two types of thinking it kind of reminds me of Daniel Kahneman of his theory of two types of thinking "Thinking, fast and slow" :D

    • @jon0o0o0
      @jon0o0o0 Před 3 lety

      I wonder if he was inspired by Kahnemans theory as they are pretty similar. Fast thinking meaning intuitive thinking e.g. stories you make up from past memory and slow thinking meaning extrapolating, reasoning about things.

  • @jeff_holmes
    @jeff_holmes Před 3 lety +1

    I was thinking about what Tim said in terms of separating intelligence and consciousness. I have always thought the same, I suppose. However, Yannic's comments about conscious introspection made me wonder if a truly intelligent being must always be "on" - or conscious. Currently, we create "intelligent" programs or algorithms and then train them or ask them to reason about something. But otherwise, they are inactive ("unconscious"). There is no idle thinking or pondering that occurs. Are we missing something?

    • @DavenH
      @DavenH Před 3 lety +1

      Introspection and self-attention do not need anything qualitative to function, so there is no requirement of consciousness.

  • @Peter.Wirdemo
    @Peter.Wirdemo Před 3 lety

    For a system to be intelligent I think it needs to be able to act in and have impact on its environment. Connecting actions to the loss between the predicted future and the actual experienced future will perhaps prepare it better for unknown and novel situations.
    The result might be that it learns to take actions to avoid situations where predicting the future is difficult, and this ”avoidance” migth actually turn out to work in the same way as simplification or abstraction would work.

  • @rohankashyap2252
    @rohankashyap2252 Před 3 lety +7

    The most Turing complete episode on MLST.

    • @nomenec
      @nomenec Před 3 lety +1

      Hilarious comment, Rohan! I honestly LoL'd in real life.

  • @vtrandal
    @vtrandal Před 2 lety

    Many good things happening including 2nd edition of “Deep Learning with Python” by Francois Chollet via MEAP (Manning Early Access Publications).

  • @arnokhachatourian8928
    @arnokhachatourian8928 Před 3 lety

    I think Chollet and Walid Saba argue for much of the same thing: a need for type 2 thinking or understanding combined with the type 1 signal processing power of neural nets. Interesting that they both see graphs and/or structure as part of the solution to type 2 thinking as well.

  • @vslaykovsky
    @vslaykovsky Před 3 lety +2

    Could it be that topological (type 2) thinking somehow emerges from geometric (type 1) thinking the similar way as complex pattern recognition emerges from a seemingly simple concept of interconnected neurons?

    • @nomenec
      @nomenec Před 3 lety +2

      In my opinion, that is possible if not likely. That said, the emergent discrete/topological behavior remains qualitatively different. For example, consider that the "square waves" typical in CPUs are of course not precisely square. At the finest scale they are noisy continuous signals composed of electron/hole quantum waves. However, the digital operation of the CPU at higher scale is best modeled mathematically as a abstracted "discrete" system. It's this ancient wave-particle or discrete-continuous duality we find everywhere in the material and conceptual worlds.

    • @arnokhachatourian8928
      @arnokhachatourian8928 Před 3 lety

      I think so, but the interesting question is how? If it is just a matter of scale, we're doing just fine, if not, we need some other advancement to attain intelligent systems.

  • @DistortedV12
    @DistortedV12 Před 3 lety +1

    Bro I just binge watched the whole thing. Are we all nerds?

    • @nomenec
      @nomenec Před 3 lety +1

      Yes we are! And that is a wonderful thing ;-)

  • @dr.mikeybee
    @dr.mikeybee Před 3 lety +5

    Wherever I go in my mind, I meet Plato coming back. -- Scott Buchanan

  • @DavenH
    @DavenH Před 3 lety +1

    Again, wonderful and though-provoking episode.
    Playing devil's advocate as usual, here are some more thoughts --
    I find much to be skeptical about with regard to the interpolation / manifold hypothesis as I understand it, as it's not hard to make logical mappings from what DNNs are capable of, and indeed GPT-3 is likely doing, and programs (with limited memory) which nobody would agree are interpolating training data. I think there's a creeping mismatch of conceptions somewhere which is leading some to simplistic conclusions and will force them to eat crow many times over (kind of like "perceptrons can't even solve XOR, NNs suck!") - IMO where the misconception may lie is the idea that NNs can only manipulate topological volumes connected by nice, densely sampled bridges of data points. Or at least, that all incoming data maps to such a singular well-connected manifold. If true, all this strongly limited-by-interpolation stuff would make sense to me. However, if you consider that NNs can manipulate many many disjoint topological islands and bring them together on certain dimensions, separate them again, successively over 100s of layers, this starts to look a lot more like the work of classical computation. If classical computation is also roped into the interpolation idea, then I'm not sure what its implied limitations are.
    A couple of remarks on that subject. There is clearly a spectrum of expressive power with limited computation and limited memory rather than a binary on/off (Turing-Complete or not), and since nothing physical is TC including supercomputers and human brains, this is not an appropriate argument against DNNs. There was a point where it was brought up by one of your guests so fair play, but it seems that this argument is a bit of a distraction now. It is not to say that comparisons with extant computing systems are unhelpful; they lie elsewhere on the spectrum, and certainly mechanisms that introduce a large sandbox of memory for NNs to store and access representations make a lot of sense. But, when thinking about memory, consider that large models in the 100s of billions of params, have a huuuge amount of stateful "memory" to use -- the values of the activations themselves. Yes it's ephemeral, with our present architectures, as these values are only available as the forward pass progresses. In that way it's analogous to stack space. Heap space is still kind of lacking outside of NTMs. The point is that DNNs do possess a logical workspace for successive calculations to happen, albeit ephemeral and bounded, and that opens the door IMO to some flavour of non-interpolative computation happening.
    Final thought, on the no-free-lunch theorem. This does not apply generally. It applies _only_ when comparing optimal solutions in the solution space. When a system is non-optimal on all measurement axes, by definition there must be a system that can dominate it. Likewise, it needs to be optimal on only one measurement axis to be indominable. That curve that defines optimal tradeoffs between conserved quantities is known as the Pareto Optimal Curve (or Frontier). One notable example is momentum / location precision tradeoff governed by the Uncertainty Constant. My point is that, particularly for messy optimization tasks, optimality on any axis is in practice impossible to prove, and none of the known neural architectures or cobbled systems like NARS or OpenCog are going to be actually on the POC, and so the NFLT is going to be technically inapplicable -- though in practice it is probably still an okay guide.
    With this in mind, we should not dismiss the possibility of an AGI that is more competent than any of our fined-tuned -- yet still suboptimal -- systems. Like, before Cooley/Tukey invented FFT we thought multiplying huge numbers had to take O(n²) time and space, but through some genius tricks it now takes O(n log n) on both. In general, I'd be careful making arguments which rely on asymptotic properties; the conclusions tend to degenerate when the relevant extreme (like optimality) is relaxed.
    I think it's also worth noting (and not to suggest anyone is arguing against this) that while an AGI system must sacrifice optimality in all but one task -- and very likely all -- that does not preclude non-optimal yet still superhuman competence on all the measurement axes we care about. To me, that's sufficiently general. And then, what's to prevent a robustly general purpose, but completely not-optimal-at-anything-specific meta-process from slowly implementing task-optimized tools at will, much like we do? Okay, that certainly broke my hyphenation budget! Now gimme that free lunch.

    • @nomenec
      @nomenec Před 3 lety +1

      DavenH, thank you for your detailed and thoughtful questions. I'd like to clarify that I'm not arguing the interpolation/extrapolation divide, if there is one, stems from computational class; I don't (yet) know and what the computational complexity of "extrapolation" is. My focus in the "Turing-Complete" debate is, in part, to communicate what you expressed yourself:
      "It is not to say that comparisons with extant computing systems are unhelpful; they lie elsewhere on the spectrum, and certainly mechanisms that introduce a large sandbox of memory for NNs to store and access representations make a lot of sense. [NNs are] analogous to stack space. Heap space is still kind of lacking outside of NTMs."
      Moving from bounded to unbounded space/time computation models results in qualitatively different algorithms. This confers practical differences upon algorithms designed for Turing complete systems even when running on practically bounded systems because the algorithms are fundamentally different. Here is a quote from you that hits the key difference w.r.t NNs:
      "Yes [NN memory is] ephemeral, with our present architectures, as [the activation] values are only available as the forward pass progresses."
      An intelligent system learning algorithms for a Turing complete model can find fundamentally different optimal algorithms than one learning algorithms for a Finite State Machine (NNs have fixed (unrolled) node count hence their "stacks" are bounded ergo they are Finite State Machines). If more researchers would simply accept, if not embrace, that math fact, we might direct more time and effort towards researching NNs augmented with unbounded (in a computational model sense) read/write memory and iterations (computational time steps). That would be equivalent to a Turing Machine where the FSM part of the of TM (ie the "transition function") was an NN.
      The longer we continue to obfuscate the fact that NNs are not Turing Complete (by sneaking in things like infinite precision floating point registers) the longer we delay progress on next generation Turing complete computational models and practical systems that approximate them (with expandable memory and unbounded running time).
      Regarding the no-free-lunch theorem, let's first recall how Chollet employs it with regard to the measure of intelligence:
      "To this list, we could, theoretically, add one more entry: “universality”, which would extend “generality” ... to any task that could be practically tackled within our universe. [Considering the No Free Lunch theorem] we do not consider universality to be a reasonable goal for AI. ... The central message of the No Free Lunch theorem is that to learn from data, one must make assumptions about it - the nature and structure of the innate assumptions made by the human mind are precisely what confers to it its powerful learning abilities."
      In my opinion, he is invoking the NFLT for three purposes:
      1) Universality should not be a requirement of intelligence
      2) Intelligence measures should be task specific
      3) Optimality for a task requires task specific knowledge
      I don't recall him (or any of the hosts) arguing that the NFLT implies that an AGI cannot exceed human intelligence on all tasks. If so, I don't agree with that. I think it is entirely possible that an AGI can radically exceed human intelligence on all tasks. That said, I do not think intelligence is "all powerful" either. In other words, I'm not worried that an embodied AGI can twinkle its red robot eyes in just the right way as to crash my brain. Such power is fantasy speculation at this point.

    • @machinelearningdojowithtim2898
      @machinelearningdojowithtim2898 Před 3 lety +1

      Hello Daven, really appreciate your engagement and thoughtful commentary as always my friend. Keith commented eloquently on the later part of your question re: computability. Remember that TC just means that a computational system could run any program which a turing machine could run. Clearly NNs are not turing complete, and say, JavaScript is. It might take JavaScript an awful lot of time to compute your arbitrary digit of pi, but an NN never could. On the matter of “bridging toplogical islands”, what a delicious thought! The first intuition I have is that islands is the right way to think about it. NNs sparsely code data onto many different disconnected manifolds (think typical tSNE projection). I don’t think there is any bridging between them, the data point falls on one of the manifolds. What happens to the output when you do a linear combination in the input space between points from two different manifolds in the latent space? Does it end up in “no mans land” or does it get projected to the nearest manifold? You hinted that there might be some kind hierachy of manifolds, I don’t think that is the case - certainly there is an entangled hierachy of transformations to get each point to their respective manifold mapping and some of them might be shared. Will think more on this and add more later on. Thanks for the great comment

    • @DavenH
      @DavenH Před 3 lety

      ​ @Machine Learning Dojo with Tim Scarfe ​ @Keith Duggar Thank you Tim and Keith very much.
      The point is well made, and quite clear, that NNs don't do much of what computers do. The strongest position I'm advocating is that gradient-optimized NNs can still approximate what small programs running on limited stack space can do. That proposition is especially vulnerable to what Keith says about the qualitative difference in algorithms each can produce. I'm curious about this. The empirical differences are clear, at least most of the time... GPT did open my mind though. Not that it was producing compact algorithms to generate accurate digits of pi, but that it was using some kind of messy logic or computation for which we don't have a good measure of the boundaries.
      You guys have evidently done a lot more reading on the subject than I, so it's quite possible that my intuitions are not mature yet.

  • @hideyoshi9716
    @hideyoshi9716 Před 3 lety

    Could you please set up auto translation ? 😂😂😂 The most interesting session.
    Thanks !! 😃😃😃

  • @zhangcx93
    @zhangcx93 Před 3 lety

    i think why dl cannot do general discrete learning well is fundamentally because:
    1. the activation they use: continues activations
    2. they're syned system where all "neurons" fires at the same time step.
    Dl choose this way because back progragation works only with continues values and parallel computation works in sync.
    while our brain is:
    1.using binary activation, in a discrete value space.
    2.all neurons fires asynchronously, in a continues time space.
    at the same time:
    the world we're interacting with is continues in time, which our brain's learning algorithm heavily rely on.

  • @jordan13589
    @jordan13589 Před 3 lety +18

    My biggest takeaway from this episode: buy Arc Coin

  • @dougb70
    @dougb70 Před 3 lety

    1:46:41 - you guys are overthinking this. Step backwards. "Simulate" a cortical column for narrow intelligence. Map markov blankets to the system of intelligence for general intelligence.

  • @sabawalid
    @sabawalid Před 3 lety +3

    Excellent observation about trying to write a discrete algorithm to work on MNIST digits... it is sort of the opposite of trying DNNs on discrete problems. I have tried the former: it might do a decent job, but it is not the right approach. Excellent point.

  • @rahul-qo3fi
    @rahul-qo3fi Před 3 lety +1

    Am i watching a technical podcast or a netflix series!!

  • @dougb70
    @dougb70 Před 3 lety

    something to ask. Will there ever be a good zoom background? Those magical headphones are so distracting lol.

  • @XOPOIIIO
    @XOPOIIIO Před 3 lety

    It's hard to argue for one side or the other, considering how few evidences there are. But personally I become more inclined to believe that DL can extrapolate successfully when I watched what DALL E is doing, it's basically GPT-3, even weaker, but results are more demonstrative.

    • @TimScarfe
      @TimScarfe Před 3 lety

      Natural data sits on an interpolatable manifold

  • @dr.mikeybee
    @dr.mikeybee Před 3 lety

    In conversational ai, do we have examples of responses that are accepted or denied. If denied, is the response coming back broken into parts of speech, rearranged, fed through other models that choose actions, query graph databases, run arithmetic routines, or run various other algorithms, then checked again for acceptance? Rinse and repeat? Accepted responses can be added to supervised training sets. I bet Google and Amazon are doing this with their vast resources. Personally, I believe we users are going to need to share model access on ports; so that agents can query those models. We have plenty of compute as a society, but we don't share it. If I run one model on my GPU and you run another and we share, we each have two available models for an agent to access. We are going to need hundreds of available models to create AGI until we can afford to create models with trillions of parameters. I'm hoping to setup a web site soon that allows people to register their shared model ip addresses and port numbers.