AlphaGeometry: Solving olympiad geometry without human demonstrations (Paper Explained)

Sdílet
Vložit
  • čas přidán 19. 06. 2024
  • #deepmind #alphageometry #llm
    AlphaGeometry is a combination of a symbolic solver and a large language model by Google DeepMind that tackles IMO geometry questions without any human-generated trainind data.
    OUTLINE:
    0:00 - Introduction
    1:30 - Problem Statement
    7:30 - Core Contribution: Synthetic Data Generation
    9:30 - Sampling Premises
    13:00 - Symbolic Deduction
    17:00 - Traceback
    19:00 - Auxiliary Construction
    25:20 - Experimental Results
    32:00 - Problem Representation
    34:30 - Final Comments
    Paper: www.nature.com/articles/s4158...
    Abstract:
    Proving mathematical theorems at the olympiad level represents a notable milestone in human-level automated reasoning1,2,3,4, owing to their reputed difficulty among the world’s best talents in pre-university mathematics. Current machine-learning approaches, however, are not applicable to most mathematical domains owing to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry because of its unique translation challenges1,5, resulting in severe scarcity of training data. We propose AlphaGeometry, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest olympiad-level problems, AlphaGeometry solves 25, outperforming the previous best method that only solves ten problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably, AlphaGeometry produces human-readable proofs, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation and discovers a generalized version of a translated IMO theorem in 2004.
    Authors: Trieu H. Trinh, Yuhuai Wu, Quoc V. Le, He He & Thang Luong
    Links:
    Homepage: ykilcher.com
    Merch: ykilcher.com/merch
    CZcams: / yannickilcher
    Twitter: / ykilcher
    Discord: ykilcher.com/discord
    LinkedIn: / ykilcher
    If you want to support me, the best thing to do is to share out the content :)
    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
    SubscribeStar: www.subscribestar.com/yannick...
    Patreon: / yannickilcher
    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
  • Věda a technologie

Komentáře • 71

  • @YannicKilcher
    @YannicKilcher  Před 5 měsíci +8

    OUTLINE:
    0:00 - Introduction
    1:30 - Problem Statement
    7:30 - Core Contribution: Synthetic Data Generation
    9:30 - Sampling Premises
    13:00 - Symbolic Deduction
    17:00 - Traceback
    19:00 - Auxiliary Construction
    25:20 - Experimental Results
    32:00 - Problem Representation
    34:30 - Final Comments

  • @paxdriver
    @paxdriver Před 5 měsíci +28

    Thanks Yan, I love these paper review videos.

  • @AndyHOTlife
    @AndyHOTlife Před 4 měsíci +9

    I remember that back in school most of the people my age had difficulties with geometry specifically due to these extra constructions. Some had such difficulties with these that several jokes inevitably emerged. It is very interesting to see that, similar to school pupils, an ML model also struggles with these. But unlike school pupils with their geometry homework, an ML model living in the cloud has enough compute power to propose and reason about several constructions in the blink of an eye. Thank you for the video!

  • @Buddharta
    @Buddharta Před 4 měsíci +23

    Geometry facts:
    The first trangle is isoceles not equilateral.
    The triagles A and B at around 14:00 are congruent (have the same angles).
    Four points are cyclic (are in a circle) if the sum of opposite angles are 180°.

    • @warrenarnold
      @warrenarnold Před 4 měsíci +3

      Good thing you kept attention in class coz that thing is still coming to replace you 💔

    • @costadekiko
      @costadekiko Před 4 měsíci

      Adding here that opposite angles summing to 180 is only one of the criteria for cyclic quadrilaterals, and is the one that applies for EADH (two right angles E and D).
      The one that most easily applies to EBCD: a side of the quadrilateral is “under” the other two points with equal angles (in this case BC is “under” E and D with right angles).

    • @RajarshiBose
      @RajarshiBose Před 4 měsíci

      @Buddharta congruent doesn't only mean triangles with same angles though they will have same angle. If two triangle are congruent, you can place one triangle completely on other one.(they are equal in terms of every thing, sides angles). If angles are same , triangles are called similar.
      Two similar triangle might not be congruent.

    • @Buddharta
      @Buddharta Před 4 měsíci

      @@RajarshiBose true, the triangles are similar and not congruent. I confused the definitions

  • @DrDanielCho-Kee
    @DrDanielCho-Kee Před 3 měsíci +2

    Great video very insightful! What programs do you use to record your videos? I love the paper backdrop with annotations with you at the bottom!

  • @meinbherpieg4723
    @meinbherpieg4723 Před 2 měsíci +1

    This video is criminally underappreciated. You have succinctly covered some of the most nuanced aspects of cutting edge AI research. Thank you.

  • @piratepartyftw
    @piratepartyftw Před 4 měsíci +27

    I expect this time next year (or sooner?), we'll be reading a similar paper where someone applied this kind of method to mathematics more broadly, leveraging something like the Lean language. It might be Coq or Isabelle instead, though, as those languages have more mature "autoprovers" of the kind used in this paper.

    • @piratepartyftw
      @piratepartyftw Před 4 měsíci +6

      My money's on Lean, long term. It's growing fastest and has the best developer ecosystem, and it's got a cohesive single library of proofs that can act as training data. The thing its missing is a good autoprover and a good tokenizer (it's hard to tokenize because it uses tons of unicode), but those are both actively being developed.

    • @HughBlackstone-tm6bw
      @HughBlackstone-tm6bw Před 4 měsíci

      That's on God

  • @mahiainti678
    @mahiainti678 Před 4 měsíci +9

    Very interesting paper. While quite limited, this approach leaves no space for any hallucinations/inaccuracies which is a great plus. I'm interested how/where it can be further applied.

    • @asdfasdfasdf1218
      @asdfasdfasdf1218 Před 4 měsíci +2

      Not much further yet because it's narrow, it can only find auxiliary points for 2D geometry problems. I'm sure deepmind is working on AI for math in general though, but it'll likely be in increments, step-by-step. But it'll definitely be huge if it can ever master all of math, not just 2D geometry.

  • @-mwolf
    @-mwolf Před 4 měsíci +8

    There might be an application in chip design actually...?

  • @googleyoutubechannel8554
    @googleyoutubechannel8554 Před 4 měsíci +2

    The search space for these geometry problems, if you go about it even marginally reasonably, seems very small compared with the scale of data processing available in modern times needed for say 1024x1024 diffusion, seems very reasonable to 'solve' these tightly constrained problems.

  • @LeePenkman
    @LeePenkman Před 5 měsíci +3

    I'm thinking most problem domains, even code can be considered following long reasoning chains but would likely involve fuzzy review steps like LLM code review etc as well to help speed it up

  • @boffo25
    @boffo25 Před 4 měsíci +3

    Beautiful lesson

  • @ai-interview-questions
    @ai-interview-questions Před 4 měsíci +1

    Thank you, Yan!

  • @oncedidactic
    @oncedidactic Před 4 měsíci +2

    Imagine a Go board of infinite size with randomly initialized "walls" defining a usable play area. This is alphaGo for that.

  • @greengoblin9567
    @greengoblin9567 Před 5 měsíci +8

    If the large language model is somehow able to change the symbolic engine and synthetic data, it can be adapted to any problem. For example the language model can switch between different symbolic engines that it has created on will continuously for better logical reasoning at inference.

    • @wege8409
      @wege8409 Před 5 měsíci +9

      The researchers in the paper said that readers should be advised that it may be difficult to adapt this solution to other domains. Geometry has really nice properties that allows them to do it this way. I'm sure we'll figure out something soon though.

    • @greengoblin9567
      @greengoblin9567 Před 5 měsíci +3

      @@wege8409 I am talking about other domains of math. Basically, you can have a library of symbolic engines that are finely tuned, and you can toggle between them.

    • @wurstelei1356
      @wurstelei1356 Před 4 měsíci

      @@greengoblin9567 The paper also stated, that it required a lot compute power to get to that level. Would be interesting if the weights of this are released. I can find the Github page with code but no weights. Then one could try to adopt other domains.

    • @Gingnose
      @Gingnose Před 4 měsíci

      I believe geometry has carry over through many fields of mathematics tho@@wege8409

    • @andreaterlizzi
      @andreaterlizzi Před 4 měsíci +1

      ​@@greengoblin9567 ​The problem is that other domains of math (e.g. calculus, linear algebra, differential topology) are very different from Euclidean geometry. The latter is a probably a self-complete logical system, and is MUCH less expressive than e.g. calculus, in Gödelian terms. This is because other (and all the relevant) domains of math are probably logically incomplete by the Gödel theorem, so it's impossible to have a set of statements that you can sample from to get a reasonable coverage of the domain. All the things that AlphaGeometry does with, say, 90% accuracy, a symbolic solver could have done efficiently with 100% accuracy (and that's why they could have exploited it for training the LLM).

  • @lerpmmo
    @lerpmmo Před 3 měsíci

    We can apply same concept to train to write software like devin ai?

  • @AntonMilan
    @AntonMilan Před 4 měsíci

    Is the sole purpose of the LLM to suggest which construct to sample at each step? If so, why use an LLM at all? Is it because it is able to better approximate the optimal sampling strategy?

  • @kaikapioka9711
    @kaikapioka9711 Před 5 měsíci

    Thz!

  • @onurozkan1077
    @onurozkan1077 Před 4 měsíci

    I think that reverse engineering could largely benefit from this approach(going from assembly to a language) especially if the target language is known(and imperative). Then a compiler becomes a solver and statements/expressions become premises(constructs if you will) in that sense. However i may be wrong...

  • @albinoameise
    @albinoameise Před měsícem

    Can someone shed some light on why to use a LLM instead of some Q-Learning approach with a fixed action space? I don't see the reason why AlphaGeometry relies on LLMs to search the best action...

  • @robertfontaine3650
    @robertfontaine3650 Před 4 měsíci +1

    using agents and llms to drive tactical changes to achieve objectives seems to be everyone's trick of the week.

  • @paxdriver
    @paxdriver Před 5 měsíci +2

    Isn't that an "isocolese" or something like that?

    • @user-wd8wx5md5z
      @user-wd8wx5md5z Před 5 měsíci +1

      Yeah, an Isoscele triangle. Yannick is not a highschool math teacher hahaha

    • @ekstrapolatoraproksymujacy412
      @ekstrapolatoraproksymujacy412 Před 4 měsíci +1

      ​@@user-wd8wx5md5z Yannich is not a native english speaker, I'm not one either and I didn't know it was called that in english either, but of course we both know what it is lol

    • @paxdriver
      @paxdriver Před 4 měsíci

      It might actually be equilateral, i'm not a highschool math teacher either and i am a native english speaker lol

  • @volkerengels5298
    @volkerengels5298 Před 4 měsíci

    Can't think that proper - but this thing may help reasoning...

  • @theosalmon
    @theosalmon Před 5 měsíci +5

    ... suggest you capture a new profile picture around 8:20.

  • @AngouSana69
    @AngouSana69 Před 5 měsíci +3

    Every video makes my eyes water, I'm not used to light.

    • @Hexanitrobenzene
      @Hexanitrobenzene Před 4 měsíci

      Yeah, I use grey background for pdfs. I don't know any trick for videos, though...

  • @continuallearning8366

    ARC-AGI sent me her

  • @luke2642
    @luke2642 Před 4 měsíci

    This video has saddle geometry: it starts off with an exciting premise then rapidly becomes incredibly disappointing, a classical algorithm would be fine, then recovers at the end if the compute budget really was modest.

  • @amafuji
    @amafuji Před 5 měsíci +5

    Stable Diffusion occasionally solves non-euclidian geometry. It's horrifying

    • @drdca8263
      @drdca8263 Před 5 měsíci

      Like, hyperbolic geometry, or spherical geometry, or...?

    • @leolrg2783
      @leolrg2783 Před 4 měsíci +1

      riemann geometry

    • @drdca8263
      @drdca8263 Před 4 měsíci

      @@leolrg2783 hm? That’s more of a general topic than a specific geometry though?
      Like, Euclidean geometry, spherical geometry, and hyperbolic geometry, are all described within it?

    • @Hexanitrobenzene
      @Hexanitrobenzene Před 4 měsíci +2

      @amafuji
      Could you provide some links ? Because your statement is rather bold.

    • @timmygilbert4102
      @timmygilbert4102 Před 4 měsíci +1

      People... That's a joke about how mangled dime generation are 😂

  • @user-if1ly5sn5f
    @user-if1ly5sn5f Před 4 měsíci

    Should look into sacred geometry, it sounds weird but if you understand, which its simple, then you can use its understanding like the ai to see things not in our current. Kinda like how the wright brothers saw a plane in the pieces of reality assisted by math and language and such to help find it.

  • @nevokrien95
    @nevokrien95 Před 4 měsíci

    This is ridiculously impressive olympiad is top mathematicians so an ai doing it is a HUGE break through by itself

  • @nikitastaf1996
    @nikitastaf1996 Před 5 měsíci +1

    Take my words with grain of salt. I have deduced it with Claude's help. They are not solving imo per se. What they say with this paper:we can now reliably generate shitton amount of synthetic data(at least for geometry). We use extremely small model (it has around 1gb in size) that is stupid to help us with essentially bruteforce approach to solve geometry. Next my predictions:If we apply this approach to large model and on top of it we apply something like chain of thought (which they essentially do in this paper but they use bruteforce solver) it will have significantly higher performance on geometry problems than any human alive.

  • @atypocrat1779
    @atypocrat1779 Před 5 měsíci +7

    0:17: 📐 Innovative paper introduces Alpha Geometry model for solving challenging geometry problems without human demonstrations.
    5:18: 🔍 Solving complex geometry problem without human demonstrations requires specific language model and training.
    10:27: ⚙ Automated geometry problem solving using algebraic reasoning and a small set of representations.
    15:57: 🔍 Logical deduction and proof process in geometry problem solving.
    20:59: ⚙ Symbolic deduction and traceback in problem solving process.
    25:58: 📝 Advanced system solves math Olympiad problems with lengthy proofs, outperforming previous methods.
    31:32: ⚙ Challenges in adapting math problems to a specialized language for proof steps and the correlation of problem difficulty with the number of required steps.
    Recap by Tammy AI

    • @vzxvzvcxasd7109
      @vzxvzvcxasd7109 Před 5 měsíci +2

      I was gonna say, that reply was way too quick.
      Ai

    • @notu483
      @notu483 Před 5 měsíci

      Is Tammy better than Harpa?

  • @nevokrien95
    @nevokrien95 Před 4 měsíci

    Isnt 25 75% of 30...

    • @Hexanitrobenzene
      @Hexanitrobenzene Před 4 měsíci

      No ? 25/30 = 5/6 =~0.8333...

    • @ShA-ib1em
      @ShA-ib1em Před 3 měsíci +2

      the researchers seem to say that their model can solve 75% of the problems and then they say they have solved 25 out of the 30 problems in the competition.
      However they seem to mean that 75% of all the problems can be represented by using the language that they choose to use.
      And then they say they have included only the problems that can be represented by using the language they choose to use.
      And they solved 25 of the 30 problems they tested the model on ...

  • @ruchirgarg2725
    @ruchirgarg2725 Před 4 měsíci +1

    Get a grip on geometry sensei.

  • @Ori-lp2fm
    @Ori-lp2fm Před 4 měsíci

    That child molster finally losing his hairc

  • @moeinhasani8718
    @moeinhasani8718 Před 4 měsíci

    I wonder why a group like this still cares about getting their papers published in journals. They were done with this project in April and because of the paper, they released the code in December. That's roughly 9 months that the community could have accessed the source code but they waited on a journal acceptance.

  • @telesniper2
    @telesniper2 Před 4 měsíci +1

    Wow they discovered how to do backtracking, but the hard way. LOL nubs. I love "research" that is just like nubbing for stock pumps

    • @lilithstenhouse267
      @lilithstenhouse267 Před 4 měsíci

      Isn't the key that they managed to reduce the search space on a problem with an infinite search domain down far enough that it can be backtracked? It's certainly not widely applicable yet, but it's potentially a very powerful premise

    • @Hexanitrobenzene
      @Hexanitrobenzene Před 4 měsíci

      @@lilithstenhouse267
      Yes, @telesniper2 is too dismissive for some reason. Brute force search would run into combinatorial explosion very quickly. LLMs are precisely for reducing the search space, that is, suggesting promising solutions.