#50 Dr. CHRISTIAN SZEGEDY - Formal Reasoning, Program Synthesis

Sdílet
Vložit
  • čas přidán 5. 08. 2024
  • Dr. Christian Szegedy from Google Research is a deep learning heavyweight. He invented adversarial examples, one of the first object detection algorithms, the inceptionnet architecture, and co-invented batchnorm. He thinks that if you bet on computers and software in 1990 you would have been as right as if you bet on AI now. But he thinks that we have been programming computers the same way since the 1950s and there has been a huge stagnation ever since. Mathematics is the process of taking a fuzzy thought and formalising it. But could we automate that? Could we create a system which will act like a super human mathematician but you can talk to it in natural language? This is what Christian calls autoformalisation. Christian thinks that automating many of the things we do in mathematics is the first step towards software synthesis and building human-level AGI. Mathematics ability is the litmus test for general reasoning ability. Christian has a fascinating take on transformers too.
    With Yannic Lightspeed Kilcher and Dr. Mathew Salvaris
    Whimsical Canvas with Tim's Notes:
    whimsical.com/mar-26th-christ...
    Pod version: anchor.fm/machinelearningstre...
    Tim Introducton [00:00:00]
    Show Kick-off [00:09:12]
    Why did Christian pivot from vision to reasoning? [00:12:07]
    Autoformalisation [00:12:47]
    Kepler conjecture [00:17:30]
    What are the biggest hurdles you have overcome? [00:20:11]
    How does something as fuzzy as DL come into mathematical formalism? [00:23:05]
    How does AGI connect to autoformalisation? [00:30:32]
    Multiagent systems used in autoformalisation? Create an artificial scientific community of AI agents! [00:36:42]
    Walid Saba -- the information is not in the data [00:41:58]
    Is generalization possible with DL? What would Francois say? [00:45:02]
    What is going on in a neural network? (Don't Miss!) [00:47:59]
    Inception network [00:52:42]
    Transformers negate the need for architecture search? [00:55:58]
    What do you do when you get stuck in your research? [00:58:08]
    Why do you think SGD is not the path forward? [00:59:59]
    Is GPT-3 on the way to AGI [01:02:01]
    Is GPT-3 a hashtable or a learnable program canvas? [01:05:01]
    What worries Christian about the research landscape? [01:07:14]
    The style that research is conducted [01:11:10]
    Layerwise self supervised training [01:13:59]
    Community Questions: The problem of reality in AI ethics [01:15:33]
    Community Questions: Internal covariate shift and BatchNorm [01:20:03]
    Community Questions: What is so special about attention? [01:23:08]
    Jürgen Schmidhuber [01:24:18]
    Community Question: Data efficiency and is it possible to "learn" inductive biases? [01:27:13]
    Francois's ARC challenge, is inductive learning still relevant? [01:31:13]
    A Promising Path Towards Autoformalization and General Artificial Intelligence [Szegedy]
    link.springer.com/chapter/10....
    Learning to Reason in Large Theories without Imitation [Bansal/Szegedy]
    arxiv.org/pdf/1905.10501.pdf
    MATHEMATICAL REASONING VIA SELF-SUPERVISED SKIP-TREE TRAINING [Rabe .. Szegedy]
    openreview.net/pdf?id=YmqAnY0...
    LIME: LEARNING INDUCTIVE BIAS FOR PRIMITIVES OF MATHEMATICAL REASONING [Wu..Szegedy]
    arxiv.org/abs/2101.06223v1
    DEEP LEARNING FOR SYMBOLIC MATHEMATICS [Lample]
    arxiv.org/pdf/1912.01412.pdf
    It’s Not What Machines Can Learn, It’s What We Cannot Teach [Yehuda]
    arxiv.org/pdf/2002.09398.pdf
    Investigating the Limitations of Transformers with Simple Arithmetic Tasks [Nogueira]
    arxiv.org/pdf/2102.13019.pdf
    Provable Bounds for Learning Some Deep Representations [Arora]
    arxiv.org/pdf/1310.6343.pdf
    Neural nets learn to program neural nets with fast weights [Schmidhuber]
    people.idsia.ch/~juergen/fast...
    How does Batch Normalization Help Optimization? [Ilyas]
    gradientscience.org/batchnorm/
    How to Train Your ResNet 7: Batch Norm
    myrtle.ai/learn/how-to-train-...
    Training a ResNet to 94% Accuracy on CIFAR-10 in 26 Seconds on a Single GPU - [KUHN]
    efficientdl.com/how-to-train-...
    en.wikipedia.org/wiki/HOL_Light
    en.wikipedia.org/wiki/Coq
    en.wikipedia.org/wiki/Kepler_...
    en.wikipedia.org/wiki/Feit%E2...
    We used a few clips from the ScaleAI interview with Christian - • Interview with Christi...

Komentáře • 37

  • @hellofromc-1374
    @hellofromc-1374 Před 3 lety +17

    get Schmidhuber here!!

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  Před 3 lety +8

      Please can you all tell him you want to see him on the show! We were soooo close to getting him on, he just needs a little bit of convincing 😃😃

  • @machinelearningdojowithtim2898

    First! Woohoo! This was a brilliant conversation with Christian, what a legend! 💥🙌👍

  • @danielalorbi
    @danielalorbi Před 3 lety +14

    8:25 - "Too much rigor led to rigor mortis"
    Bars.

  • @stalinsampras
    @stalinsampras Před 3 lety +18

    Congratulation on the 50th Episode. I'm a huge fan of this podcast/videocast. Looking forward to the 100th episode, I'm sure with the rate of improvement you guys have shown in video editing, you could be making mini docuseries relating to AI. Congrats and all the best.

  • @daveman683
    @daveman683 Před 3 lety +4

    Every time, I listen to this podcast, I go back with new ideas.

    • @daveman683
      @daveman683 Před 3 lety +1

      I would also recommend Dr. Sanjeev Arora as a guest request. I have gone through all his lectures and they are absolutely amazing to ground a lot of theoretical understanding about Deep learning.

    • @daveman683
      @daveman683 Před 3 lety

      @@things_leftunsaid Yes. typo in writing. I really enjoyed his lectures. It opened up my perspective.

  • @abby5493
    @abby5493 Před 3 lety +5

    Loving the graphics, gets better and better 😍

  • @tensorstrings
    @tensorstrings Před 3 lety +1

    Awesome! An episode I've been waiting for!!! Can't wait for next week either.

  • @rohankashyap2252
    @rohankashyap2252 Před 3 lety +1

    Absolute legend Christian, the best episode!

  • @AICoffeeBreak
    @AICoffeeBreak Před 3 lety +4

    Thanks for answering my question! I think Christian Szegedy made a great point on bias: people make little case studies around very particular questions but do not analyse AI feedback loops in general (including nonproblematic ones). Wouldn't this be a place where AI could collaborate with sociology?

  • @_tnk_
    @_tnk_ Před 3 lety +1

    Great session!

  • @user-yn8rg2xv4w
    @user-yn8rg2xv4w Před 3 lety +9

    AMAZING INTERVIEW !!!!!! .... When are you guys going to interview Ben Goertzel and Christos Papadimitriou??

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  Před 3 lety +2

      Thanks for letting us know about Christos Papadimitriou, he looks great! We did try to invite Ben I think on Twitter

  • @coder8i
    @coder8i Před 3 lety

    Whimsical Canvas is really nice to follow along the conversation

  • @dinasina3558
    @dinasina3558 Před 3 lety +1

    NN are hastables. Like a humans too. We don't compute multiplication. We memorising multiplication tables in school.

  • @HappyMathDad
    @HappyMathDad Před rokem

    If we are able to solve the formalism problem that Dr Szegedy is working on. I think we also solve his concern. Because we could sidestep or ignorance of the current deep networks, since they would just become a vehicle to get formalisms.

  • @janosneumann1987
    @janosneumann1987 Před 3 lety +1

    Awesome interview, really enjoyed the show, learned a lot. I liked the question so what do you think is going on in the deep learning model :)

  • @HappyMathDad
    @HappyMathDad Před rokem

    AGI used to be 10 years away, that's improvement

  • @keithkam6749
    @keithkam6749 Před 3 lety

    RE: multiagent systems in auto-formalization:
    This reminded me of an interactionist theory of reasoning from Sperber and Mercier's 'The Enigma of Reason' - It's a psychology book so maybe not the ML/AI crowd's cup of tea but many of the ideas are very applicable:
    The core idea proposed is that we humans do not have what Kahneman describes as 'system 2', logical thinking. (in 'Thinking fast and slow' he proposes that we have two types of cognition, system 1 = fast cheap intuitions, system 2 = slow expensive logical deduction). Instead, Sperber and Mercier suggest that all we have are intuitions - the ability to pattern match. Specifically, our ability to reason is actually intuitions about reasons, combined with intuitions for evaluating the validity of reasons.
    They argue that the primary purpose of reasoning from an evolutionary perspective is not to generate new knowledge from existing knowledge, but instead to generate emergent consensus and allow for cooperation between non-related individuals.
    1. Alice wants to convince of Bob something - e.g. a new idea, a proposal to do something together, a justification for an action.
    2. Of course, Bob would not accept everything Alice proposes. If that was the case they would be gullible and easily taken advantage of.
    3. However, it is not beneficial to reject everything Alice proposes, since the knowledge could be useful (for Bob or for both of them).
    4. To get around this, Alice proposes clear to follow, logical reasons that Bob then has to evaluate.
    Perhaps the key to reasoning in an ML context would be this generative, adversarial process, combined with an ability to direct attention to existing knowledge bases or new experiments.

  • @dr.mikeybee
    @dr.mikeybee Před 3 lety

    Tim, you are asking one of my questions in somewhat different wording. Here's my query: In CNNs pooling effectively changes the size of the sliding window; so that the model learns larger and larger features. Is there something like this in transformers?

  • @willd1mindmind639
    @willd1mindmind639 Před 3 lety

    Human reasoning starts at the earliest steps of existence because it takes place in a different part of the brain as opposed to, for example, the visual cortex. The visual cortex for all intents and purposes is a visual encoding system that converts light waves into neuronal networks. And those neuronal networks represent all the visual details of the real world that are stitched together by the brain into a coherent mental picture or projection of the real world. So what happens just in terms of the idea of 'visual intelligence' is that the features of the real world become inputs to the higher abstract reasoning parts of the brain as neural network "features". Which in turn become conceptual entities used in logic and reasoning. So a dog, is recognized because the features of fur (which in turn is a collection of features representing individual hair shapes), plus shape of the body parts (snout, ears, body pose, legs), plus number of legs, plus tail, etc. Now the trick is that each of those feature collections(clouds of neural network data) are inputs as parameters to a higher order part of the brain that does reasoning and logic and the weighting of parameters and relationship between parameters for understanding happens there, not in the visual cortex.. And the power of that is that it isn't a static set of parameters or a fixed set of weights. It is variable, which is how logic and reasoning are expressed. That ability then carries over to every other aspect of human intelligence. We see this in humans in the sense that if I take and draw a stylistic outline of a dog with some of the characteristic shape features of a dog, humans recognize it instantly as a dog (ie. dog emoji). Because the thinking area of the brain recalls that the shape of the emoji even if in a single color, matches the features found in real dogs and as a parameter into the analysis and reasoning area can be evaluated as "shape of a dog" in a logical conceptual way, even though there is no actual dog there as opposed to a some pen/marker shapes on paper, brush strokes in a panting or pixels on a computer screen. In fact the brain can handle the idea that the shape of a dog is drawn on paper with a pen as part of this reasoning and understanding process. That is because the feature encoding of the shapes themselves are separate from the reasoning part. Meaning these aren't hidden layers of logic as expressed in most monolithic machine models which are expected to compress all "thinking" into a single logical output based on hidden parameters and hidden weights. The difference is that these things, like feature layers or weights aren't hidden in the brain. The conceptual ability of the brain to reason is an extension of the fact that once the features are encoded by the brain (analog/digital conversion biologically) they become separate from the actual source and real world. And the higher order parts of the brain use that to make sense of the world, just like your view of the world is actually based on a projection of the world based on neural data as that is no longer the same as the light waves that triggered those neurons.

  • @Isinlor
    @Isinlor Před 3 lety

    Does anyone know what is being mentioned at 1:14:55 ? It's somehow connected to infinite RNN without back-propagation trough time.

  • @dr.mikeybee
    @dr.mikeybee Před 3 lety

    Is there a way to do transfer learning with transformers? Can the slow weights be re-trained without starting from scratch?

  • @bethcarey8530
    @bethcarey8530 Před 3 lety

    Great question Tim, based on Walid's interview - can transformers 'get there' with enough data, for language understanding? I take Christian's answer to be 'yes' because Transformers are currently too small but they can get there.
    This is at odds with symbolic AI advocates & inventors because as Walid says, not everything we use to generalize from is 'in the data'. And there is a gold standard blueprint brains follow to be able to generalize and provides our 'common sense', whatever our native language.

    • @bethcarey8530
      @bethcarey8530 Před 3 lety

      I'd love to know what Christian, or any transformer guru believes is enough training data that necessarily produces reasoning required for natural language. My math could be wrong, but GPT-3 used ~225x10^9

  • @dosomething3
    @dosomething3 Před 3 lety +1

    TLDR; Math is a closed system. Or as close as possible to being closed. Which makes it simpler for a neutral network to process compared to any other system. With reality being the furthest from a closed system. Hence the most difficult for a neutral network to process.

    • @machinelearningdojowithtim2898
      @machinelearningdojowithtim2898 Před 3 lety +2

      I think it's tempting to think it is, but it only is for things you know already-- you still need to make conjectures for things you don't know and this is very open-ended in lieu of some more general frameworks of reference which we currently have. It is precisely this reason that Christian thinks that Mathematics is the litmus test for AGI. I hope Christian will chime in with a comment on this because I think it gets to the core of the work.

  • @ratsukutsi
    @ratsukutsi Před 3 lety

    Excuseme, gentlemen, I have a question to ask, where is Mr Chollet at this time? I'm curious to know.

  • @SimonJackson13
    @SimonJackson13 Před 3 lety

    APL to syntax tree translators?

  • @sabawalid
    @sabawalid Před 3 lety

    The conclusion of the paper "Investigating the Limitations of Transformers with Simple Arithmetic Tasks" is that "models cannot learn addition rules that are independent of the length of the numbers seen during training". This is expected, because if you fix the number of digits then the space of that function is finite (it is a finite table). Addition of varying length numbers is infinite, and -- again, like in language, when the space is infinite, like most real problems in cognition, DL has nothing to offer. It is INFINITY all over again, that makes ANY data-driven approach nothing more than a crude approximation that can always be adversarially attacked.

  • @sabawalid
    @sabawalid Před 3 lety

    What??? "Mathematics is the process of taking a fuzzy thought and formalizing it" - I can't believe I heard that. Mathematics exists despite of physical reality - thus mathematics is not invented it is "discovered". We do not "invent" mathematical theorems, we discover them and then learn how to prove them. As just a simple example, we did not invent the fact that the sum of 1 + 2 + 3 + ... + n = n(n+1) / 2 - we simply discovered that fact.

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  Před 3 lety

      I am pretty sure Christian would agree with you - I think you misunderstood. We still make conjectures to discover the mathematics. Geometric progression as you cited is a great example of something which Christian would want to use deep learning to discover (if we didn't know it already). The auto formalization stuff just means converting from language and text into abstract syntax trees.