AlphaGeometry: Solving olympiad geometry without human demonstrations (Paper Explained)
Vložit
- čas přidán 19. 06. 2024
- #deepmind #alphageometry #llm
AlphaGeometry is a combination of a symbolic solver and a large language model by Google DeepMind that tackles IMO geometry questions without any human-generated trainind data.
OUTLINE:
0:00 - Introduction
1:30 - Problem Statement
7:30 - Core Contribution: Synthetic Data Generation
9:30 - Sampling Premises
13:00 - Symbolic Deduction
17:00 - Traceback
19:00 - Auxiliary Construction
25:20 - Experimental Results
32:00 - Problem Representation
34:30 - Final Comments
Paper: www.nature.com/articles/s4158...
Abstract:
Proving mathematical theorems at the olympiad level represents a notable milestone in human-level automated reasoning1,2,3,4, owing to their reputed difficulty among the world’s best talents in pre-university mathematics. Current machine-learning approaches, however, are not applicable to most mathematical domains owing to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry because of its unique translation challenges1,5, resulting in severe scarcity of training data. We propose AlphaGeometry, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest olympiad-level problems, AlphaGeometry solves 25, outperforming the previous best method that only solves ten problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably, AlphaGeometry produces human-readable proofs, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation and discovers a generalized version of a translated IMO theorem in 2004.
Authors: Trieu H. Trinh, Yuhuai Wu, Quoc V. Le, He He & Thang Luong
Links:
Homepage: ykilcher.com
Merch: ykilcher.com/merch
CZcams: / yannickilcher
Twitter: / ykilcher
Discord: ykilcher.com/discord
LinkedIn: / ykilcher
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n - Věda a technologie
OUTLINE:
0:00 - Introduction
1:30 - Problem Statement
7:30 - Core Contribution: Synthetic Data Generation
9:30 - Sampling Premises
13:00 - Symbolic Deduction
17:00 - Traceback
19:00 - Auxiliary Construction
25:20 - Experimental Results
32:00 - Problem Representation
34:30 - Final Comments
Thanks Yan, I love these paper review videos.
I remember that back in school most of the people my age had difficulties with geometry specifically due to these extra constructions. Some had such difficulties with these that several jokes inevitably emerged. It is very interesting to see that, similar to school pupils, an ML model also struggles with these. But unlike school pupils with their geometry homework, an ML model living in the cloud has enough compute power to propose and reason about several constructions in the blink of an eye. Thank you for the video!
Geometry facts:
The first trangle is isoceles not equilateral.
The triagles A and B at around 14:00 are congruent (have the same angles).
Four points are cyclic (are in a circle) if the sum of opposite angles are 180°.
Good thing you kept attention in class coz that thing is still coming to replace you 💔
Adding here that opposite angles summing to 180 is only one of the criteria for cyclic quadrilaterals, and is the one that applies for EADH (two right angles E and D).
The one that most easily applies to EBCD: a side of the quadrilateral is “under” the other two points with equal angles (in this case BC is “under” E and D with right angles).
@Buddharta congruent doesn't only mean triangles with same angles though they will have same angle. If two triangle are congruent, you can place one triangle completely on other one.(they are equal in terms of every thing, sides angles). If angles are same , triangles are called similar.
Two similar triangle might not be congruent.
@@RajarshiBose true, the triangles are similar and not congruent. I confused the definitions
Great video very insightful! What programs do you use to record your videos? I love the paper backdrop with annotations with you at the bottom!
This video is criminally underappreciated. You have succinctly covered some of the most nuanced aspects of cutting edge AI research. Thank you.
I expect this time next year (or sooner?), we'll be reading a similar paper where someone applied this kind of method to mathematics more broadly, leveraging something like the Lean language. It might be Coq or Isabelle instead, though, as those languages have more mature "autoprovers" of the kind used in this paper.
My money's on Lean, long term. It's growing fastest and has the best developer ecosystem, and it's got a cohesive single library of proofs that can act as training data. The thing its missing is a good autoprover and a good tokenizer (it's hard to tokenize because it uses tons of unicode), but those are both actively being developed.
That's on God
Very interesting paper. While quite limited, this approach leaves no space for any hallucinations/inaccuracies which is a great plus. I'm interested how/where it can be further applied.
Not much further yet because it's narrow, it can only find auxiliary points for 2D geometry problems. I'm sure deepmind is working on AI for math in general though, but it'll likely be in increments, step-by-step. But it'll definitely be huge if it can ever master all of math, not just 2D geometry.
There might be an application in chip design actually...?
The search space for these geometry problems, if you go about it even marginally reasonably, seems very small compared with the scale of data processing available in modern times needed for say 1024x1024 diffusion, seems very reasonable to 'solve' these tightly constrained problems.
I'm thinking most problem domains, even code can be considered following long reasoning chains but would likely involve fuzzy review steps like LLM code review etc as well to help speed it up
Beautiful lesson
Thank you, Yan!
Imagine a Go board of infinite size with randomly initialized "walls" defining a usable play area. This is alphaGo for that.
If the large language model is somehow able to change the symbolic engine and synthetic data, it can be adapted to any problem. For example the language model can switch between different symbolic engines that it has created on will continuously for better logical reasoning at inference.
The researchers in the paper said that readers should be advised that it may be difficult to adapt this solution to other domains. Geometry has really nice properties that allows them to do it this way. I'm sure we'll figure out something soon though.
@@wege8409 I am talking about other domains of math. Basically, you can have a library of symbolic engines that are finely tuned, and you can toggle between them.
@@greengoblin9567 The paper also stated, that it required a lot compute power to get to that level. Would be interesting if the weights of this are released. I can find the Github page with code but no weights. Then one could try to adopt other domains.
I believe geometry has carry over through many fields of mathematics tho@@wege8409
@@greengoblin9567 The problem is that other domains of math (e.g. calculus, linear algebra, differential topology) are very different from Euclidean geometry. The latter is a probably a self-complete logical system, and is MUCH less expressive than e.g. calculus, in Gödelian terms. This is because other (and all the relevant) domains of math are probably logically incomplete by the Gödel theorem, so it's impossible to have a set of statements that you can sample from to get a reasonable coverage of the domain. All the things that AlphaGeometry does with, say, 90% accuracy, a symbolic solver could have done efficiently with 100% accuracy (and that's why they could have exploited it for training the LLM).
We can apply same concept to train to write software like devin ai?
Is the sole purpose of the LLM to suggest which construct to sample at each step? If so, why use an LLM at all? Is it because it is able to better approximate the optimal sampling strategy?
Thz!
I think that reverse engineering could largely benefit from this approach(going from assembly to a language) especially if the target language is known(and imperative). Then a compiler becomes a solver and statements/expressions become premises(constructs if you will) in that sense. However i may be wrong...
Can someone shed some light on why to use a LLM instead of some Q-Learning approach with a fixed action space? I don't see the reason why AlphaGeometry relies on LLMs to search the best action...
using agents and llms to drive tactical changes to achieve objectives seems to be everyone's trick of the week.
Isn't that an "isocolese" or something like that?
Yeah, an Isoscele triangle. Yannick is not a highschool math teacher hahaha
@@user-wd8wx5md5z Yannich is not a native english speaker, I'm not one either and I didn't know it was called that in english either, but of course we both know what it is lol
It might actually be equilateral, i'm not a highschool math teacher either and i am a native english speaker lol
Can't think that proper - but this thing may help reasoning...
... suggest you capture a new profile picture around 8:20.
Every video makes my eyes water, I'm not used to light.
Yeah, I use grey background for pdfs. I don't know any trick for videos, though...
ARC-AGI sent me her
This video has saddle geometry: it starts off with an exciting premise then rapidly becomes incredibly disappointing, a classical algorithm would be fine, then recovers at the end if the compute budget really was modest.
Stable Diffusion occasionally solves non-euclidian geometry. It's horrifying
Like, hyperbolic geometry, or spherical geometry, or...?
riemann geometry
@@leolrg2783 hm? That’s more of a general topic than a specific geometry though?
Like, Euclidean geometry, spherical geometry, and hyperbolic geometry, are all described within it?
@amafuji
Could you provide some links ? Because your statement is rather bold.
People... That's a joke about how mangled dime generation are 😂
Should look into sacred geometry, it sounds weird but if you understand, which its simple, then you can use its understanding like the ai to see things not in our current. Kinda like how the wright brothers saw a plane in the pieces of reality assisted by math and language and such to help find it.
This is ridiculously impressive olympiad is top mathematicians so an ai doing it is a HUGE break through by itself
Take my words with grain of salt. I have deduced it with Claude's help. They are not solving imo per se. What they say with this paper:we can now reliably generate shitton amount of synthetic data(at least for geometry). We use extremely small model (it has around 1gb in size) that is stupid to help us with essentially bruteforce approach to solve geometry. Next my predictions:If we apply this approach to large model and on top of it we apply something like chain of thought (which they essentially do in this paper but they use bruteforce solver) it will have significantly higher performance on geometry problems than any human alive.
0:17: 📐 Innovative paper introduces Alpha Geometry model for solving challenging geometry problems without human demonstrations.
5:18: 🔍 Solving complex geometry problem without human demonstrations requires specific language model and training.
10:27: ⚙ Automated geometry problem solving using algebraic reasoning and a small set of representations.
15:57: 🔍 Logical deduction and proof process in geometry problem solving.
20:59: ⚙ Symbolic deduction and traceback in problem solving process.
25:58: 📝 Advanced system solves math Olympiad problems with lengthy proofs, outperforming previous methods.
31:32: ⚙ Challenges in adapting math problems to a specialized language for proof steps and the correlation of problem difficulty with the number of required steps.
Recap by Tammy AI
I was gonna say, that reply was way too quick.
Ai
Is Tammy better than Harpa?
Isnt 25 75% of 30...
No ? 25/30 = 5/6 =~0.8333...
the researchers seem to say that their model can solve 75% of the problems and then they say they have solved 25 out of the 30 problems in the competition.
However they seem to mean that 75% of all the problems can be represented by using the language that they choose to use.
And then they say they have included only the problems that can be represented by using the language they choose to use.
And they solved 25 of the 30 problems they tested the model on ...
Get a grip on geometry sensei.
That child molster finally losing his hairc
I wonder why a group like this still cares about getting their papers published in journals. They were done with this project in April and because of the paper, they released the code in December. That's roughly 9 months that the community could have accessed the source code but they waited on a journal acceptance.
Wow they discovered how to do backtracking, but the hard way. LOL nubs. I love "research" that is just like nubbing for stock pumps
Isn't the key that they managed to reduce the search space on a problem with an infinite search domain down far enough that it can be backtracked? It's certainly not widely applicable yet, but it's potentially a very powerful premise
@@lilithstenhouse267
Yes, @telesniper2 is too dismissive for some reason. Brute force search would run into combinatorial explosion very quickly. LLMs are precisely for reducing the search space, that is, suggesting promising solutions.