Do you think that ChatGPT can reason?

Machine Learning Street Talk

zhlédnutí 62 396

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 7. 09. 2024

Komentáře • 362

@luke.perkin.inventor Před 25 dny ⁺¹²
My notes from this episode:
Formal languages have intepreters, and can accept any gramatically correct code.
The world is the interpreter for natural languages.
We can't tell the difference between internaly reasoning from first principles and retrieval.
Planning is an example of reasoning, e.g. stacking blocks that results in a certain sequence or shape. Swapping out the words 'stack' and 'unstack' for 'fist' and 'slap' and GPT4 fails.
Reasoning is defined from a logical perspective. Deductive closure based on base facts. You don't need to just match distribution of query - answer, you need to do deductive closure. Transitive closure for example is a small part of deductive closure.
People stop at the first intresting result from an LLM. For example, it can do a 13 rotational cypher, but it can't do any other number. If you can execute the general principle you should be able to do it.
Ideation requires shallow knowledge of wide scope.
Distributional properties versus instance level correctness. LLMs and diffusion models are good at one and not at the other.
When an LLM critiques its own solutions, its accuracy goes down - it halucinates errors and incorrect verifications.
Companies tell us they have million word context, but they make errors an intelligent child wouldn't make in a ten word prompt.
They're good at 'style' not 'correctness'. Classical AI was better at correctness, not style.
Teach a man to fish example - LLMs need 1 fish, 2 fish.. 3 fish... to N fish.
A 'general advice taker' is roughly equivalent to the goal of general AI.
"Modulo LLMs" - LLMs guess, and bank of external verifiers, verify. Back prompt, chain of thought, etc.
Agentic systems are worthless without planning. It's not interchangable - toddlers can operate guns, cows with a plan can't answer the phone.
@trucid2 Před měsícem ⁺²⁰⁶
I've worked with people who don't reason either. They exhibit the kind of shallow non-thinking that ChatGPT engages in.
@ericvantassell6809 Před měsícem ⁺¹⁶
ayup. keywords provoke response without understanding.
@billykotsos4642 Před měsícem ⁺⁴
especially CEOs
@ihbrzmkqushzavojtr72mw5pqf6 Před měsícem ⁺¹⁶
Why are you talking about me in public????
@2dapoint424 Před měsícem ⁺¹
😎
@stevengill1736 Před měsícem ⁺⁷
LOL - I imagine we all visit this probability space occasionally... ;*[}
@user-qg8qc5qb9r Před měsícem ⁺²²
Introduction and Initial Thoughts on Reasoning (00:00)
The Manhole Cover Question and Memorization vs. Reasoning (00:00:39)
Using Large Language Models in Reasoning and Planning (00:01:43)
The Limitations of Large Language Models (00:03:29)
Distinguishing Style from Correctness (00:06:30)
Natural Language vs. Formal Languages (00:10:40)
Debunking Claims of Emergent Reasoning in LLMs (00:11:53)
Planning Capabilities and the Plan Bench Paper (00:15:22)
The Role of Creativity in LLMs and AI (00:32:37)
LLMs in Ideation and Verification (00:38:41)
Differentiating Tacit and Explicit Knowledge Tasks (00:54:47)
End-to-End Predictive Models and Verification (01:02:03)
Chain of Thought and Its Limitations (01:08:27)
Comparing Generalist Systems and Agentic Systems (01:29:35)
LLM Modulo Framework and Its Applications (01:34:03)
Final Thoughts and Advice for Researchers (01:35:02)
Closing Remarks (01:40:07)
@NunTheLass Před měsícem ⁺⁴⁰
Thank you. He was my favorite guest that I watched here so far. I learned a lot.
@DataTranslator Před měsícem ⁺¹⁷
His analogy of GPT to learning a second language makes 100% sense to me.
I’m a nonnative speaker of English; yet I mastered it through grammar first and adding rules and exceptions throughout the years.
Also, concepts were not the issue; but conveying those concepts was initially very challenging.🇲🇽🇺🇸
@espressojim Před měsícem ⁺⁸
I almost never comment on youtube videos. This was an excellent interview and very informative. I'd love to hear more from Prof. Subbarao Kambhampati, as he did an amazing job of doing scientific story telling.
@AICoffeeBreak Před 9 dny ⁺¹
Thanks for having Prof. Kambhampati! I got to experience him first hand at this year's ACL where he also gave a keynote. What a great character! 🎉
@memetb5796 Před měsícem ⁺¹⁵
This guest was such pleasant person to listen to: there is a indescribable joy in listening to someone that is clearly intelligent and a subject matter expert that just can't be gotten anywhere else.
@jonashallgren4446 Před měsícem ⁺⁹
Subbarao had a great tutorial at ICML! The general verification generation loop was very interesting to me. Excited to see more work in this direction that optimise LLMs with verification systems.
@sammcj2000 Před 29 dny ⁺²
Fantastic interview, Prof Kambhampati seems to be not just wise but but governed by empathy and scepticism which is a wonderful combination.
@Hiroprotagonist253 Před 29 dny ⁺²
For natural languages the world is the interpreter. What a profound statement 🤯. I am enjoying this discussion so far!
@rrathore01 Před měsícem ⁺¹⁴
Great interview!! Some of the examples given in this interview which provides evidence that llms are not learning the underlying logic , colored block , 4*4 matrix multiplication, chain of thoughts issues.
Best quote: i need to teach llms how to fish 1 fish and then how to fish 2 fish and fish 3 fish and so on and it would still fail on task of how to fish "N" fish for N> n it has not seen before
@thenautilator661 Před měsícem ⁺²⁷
Very convincing arguments. Haven't heard it laid out this succinctly and comprehensively yet. I'm sure Yann LeCunn would be in the same camp, but I recall not being persuaded by LeCunn's arguments when he made them on Lex Fridman
@edzehoo Před měsícem ⁺⁹
Basically there's a whole bunch of "scientists and researchers" that don't like to admit the AGI battle is being won (slowly but surely) by the tech bros led by Ilya and Amodei. AI is a 50-year old field dominated in the past by old men, and is now going through recent breakthroughs made by 30 year olds, so don't be surprised that there's a whole lot of ego at play to douse cold water on significant achievements.
@bharatbheesetti1920 Před měsícem ⁺⁸
Do you have a response to Kambhampati's refutation of the Sparks of AGI claim? @edzehoo
@kman_34 Před měsícem ⁺¹¹
@@edzehooI can see this being true, but writing off their points is equally defensive/egotistical
@JD-jl4yy Před měsícem
@@edzehoo Yep.
@jakobwachter5181 Před měsícem ⁺⁴
@@edzehoo Ilya and Amodei are 37 and 41 respectively, I wouldn't call them "young", per se. Research on AI in academia is getting outpaced by industry, and only capital rivalling industry can generate the resources necessary to train the largest of models, but academics young and old are continuously outputting content of higher quality than most industry research departments. It's not just ego, it's knowing when something is real and when it is smoke and mirrors.
@dr.mikeybee Před měsícem ⁺³⁸
Next word prediction is the objective function, but it isn't what the model learns. We don't know what the learned function is, but I can guarantee you it isn't log-odds.
@ericvantassell6809 Před měsícem
croissants .vs. yogurt
@Lolleka Před měsícem ⁺¹²
At the end of the day, the transformer is just a kind of modern Hopfield network. It stores patterns, it retrieves patterns. It's the chinese room argument all over again.
@memegazer Před měsícem ⁺⁴
@@Lolleka
Not really.
You can point to rules and say "rules can't be intelligent or reason"
But when it is the NN that makes those rules, and the humans in the loop are not certain enough what they are to prevent hallucination or prevent the alignment problem then that is not the chinese room anymore.
@xt-89907 Před měsícem ⁺⁴
Research around mechanistic interpretability is starting to show that TLLMs tend to learn some causal circuits and some memorization circuits (I.e., grokking). So they are able to learn some reasoning algorithms but there’s no guarantee of it. Plus, sequence modeling is weak on some kinds of graph algorithms necessary for certain classes of logical reasoning algorithms
@synthclub Před měsícem ⁺¹
@@memegazer not hotdog, hotdog!
@elgrego Před měsícem ⁺⁶
Bravo. One of the most interesting talks I’ve heard this year.
@aitheignis Před měsícem ⁺¹⁴
I love this episode. In science, it's never about what can be done or what happen in the system, but it's always about mechanism that lead to the event (how the event happen basically). What is severely missing from all the LLMs talk today is the talk about underlying mechanism. The work on mechanism is the key piece that will move all of these deep neural network works from engineering feat to actual science. To know mechanism, is to know causality.
@stevengill1736 Před měsícem ⁺²
...yet they often talk about LLM mechanism as a "black box", to some extent insoluble...
@HoriaCristescu Před měsícem ⁺²
What you should consider is the environment-agent system, not the model in isolation. Focusing on models is a bad direction to take, it makes us blind to the process of external search and exploration, without which we cannot talk about intelligence and reasoning. The scientific method we use also has a very important experimental validation step, not even humans could reason or be creative absent environment.
@weftw1se Před měsícem ⁺²⁷
Disappointing to see so much cope from the LLM fans in the comments. Expected, but still sad.
@yeleti Před měsícem
They are rather AGI hopefuls. Who's not a fan of LLMs including the Prof ;)
@weftw1se Před měsícem
@@yeleti yeah, I think they are very interesting / useful but I doubt they will get to AGI with scaling alone.
@XOPOIIIO Před měsícem ⁺⁴
Reasoning requires loop thinking, to sort through the same thoughts from different angles, NNs are linear, they have input, output and just a few layers between them, their result is akin to intuition, not reasoning. That's why they give better results if you simulate loop thinking by feeding it's result to itself to create reasoning-like step-by-step process.
@timcarmichael Před měsícem ⁺¹⁵
Have we yet defined intelligence sufficiently well that we can appraise it and identify it hallmarks in machines?
@stevengill1736 Před měsícem
I think if we qualify the definition of intelligence as including reasoning, then yes.
I'd rather use the term sentience - now artificial sentience...that would be something!
@benbridgwater6479 Před měsícem
@@johan.j.bergman Sure, but that's a bit like saying that we don't need to understand aerodynamics or lift to evaluate airplanes, and can just judge them on their utility and ability to fly ... which isn't entirely unreasonable if you are ok leaving airplane design up to chance and just stumbling across better working ones once in a while (much as the transformer architecture was really a bit of an accidental discovery as far as intelligence goes).
However, if we want to actively pursue AGI and more intelligent systems, then it really is necessary to understand intelligence (which will provide a definition) so that we can actively design it in and improve upon it. I think there is actually quite a core of agreement among many people as what the basis of intelligence is - just no consensus on a pithy definition.
@jakobwachter5181 Před měsícem
@@johan.j.bergman A spatula serves a helpful purpose that no other cooking tool is able to replace in my kitchen, so I find it incredibly useful. Turns out they are rather mass market too. Should I call my spatula intelligent?
@Cammymoop Před měsícem
no
@rey82rey82 Před měsícem
The ability to reason
@snarkyboojum Před měsícem ⁺¹⁰
Great conversation. I disagree that LLMs are good for idea generation. In my experience, they're good at replaying ideas back to you that are largely derivative (based on the data they've been trained over). The truly 'inductive leaps' as the Professor put it, aren't there in my interaction with LLMs. I use them as a workhorse for doing grunt work with ideas I propose and even then I find them lacking in attention to detail. There's a very narrow range they can work reliably in, and once you go outside that range, they hallucinate or provide sub-standard (compared to human) responses.
I think the idea that we're co-creating with LLMs is an interesting one that most people haven't considered - there's a kind of symbiosis where we use the model and build artefacts that future models are then trained on. This feedback loop across how we use LLMs as tools is interesting. That's the way they currently improve. It's a symbiotic relationship - but humans are currently providing the majority of the "intelligence", if not all of it, in this process.
@larsfaye292 Před měsícem ⁺³
What a fantastic and succinct response! My experience has been _exactly_ the same.
@sangouda1645 Před měsícem
That's exactly it, they start to really act as good creative partner at Nth iteration after explaining to it back and forth by giving feedback, but once it gets the hang of it, really acts like a student wanting get good score from a teacher :)
@notHere132 Před měsícem
We need an entirely new model for AI to achieve true reasoning capability.
@swarnavasamanta2628 Před měsícem ⁺⁴
The feeling of understanding is different from the algorithm of understanding that's being executed in your brain. The feeling of something is created by consciousness while that something might already be going on in your brain. Here's a quick thought experiment: Try adding two numbers in your mind, and you can easily do it and get an answer. Not only that, but you have a feeling of the understanding of the addition algorithm in your head. You know how it works and you are aware of it being executed and the steps you're performing in real time. But imagine if you did not have this awareness/consciousness of this algorithm in your head. That's how LLMs can be thought of, they have an algorithm and it executes and outputs an answer but they are not aware of the algorithm itself or it is being performed and neither have any agency over it. Doing something and perception that you are doing something is completely different.
@prasammehta1546 Před měsícem
Basically they are soulless brain which they actually are :P
@jakobwachter5181 Před měsícem ⁺²
Rao is wonderful, I got the chance to briefly chat with him in Vancouver at the last AAAI. He's loud about the limitations of LLMs and does a good job of talking to the layman. Keep it up, loving the interviews you put out!
@sofoboachie5221 Před 25 dny
This probably the best episode I have watched here and I watch this channel as a podcast. Fantastic guest
@JurekOK Před měsícem ⁺³
29:38 this is an actually breakthrough idea addressing a burning problem, that should be discussed more!
@Redx3257 Před měsícem ⁺⁴
Yea this man is brilliant. I could just listen to him all day.
@whiteycat615 Před měsícem ⁺⁷
Fantastic discussion! Fantastic guy! Thank you
@Thierry-in-Londinium Před měsícem ⁺¹
This professor is clearly 1 of the leaders in his field. When you reflect & dissect what he is sharing. It stands scrutiny!
@oscarmoxon102 Před měsícem ⁺¹⁶
There's a difference between in-distribution reasoning and out-of-distribution reasoning. If you can make the distribution powerful enough, you can still advance research with neural models.
@SurfCatten Před měsícem ⁺³
Absolutely true. As an example I tested its ability to do rotation ciphers myself and it performed flawlessly. Obviously the reasoning and logic to do these translations was added to its training data since that paper was released.
@NextGenart99 Před měsícem
Easy, it’s all about prompting. Try this prompt with the Planbench test: Base on methodical analysis of the given data, without making unfounded assumptions. Avoid unfounded assumptions this is very important that you avoid unfounded assumptions, and base your reasoning directly on what you read/ see word for word rather than relying on training data which could introduce bias, Always prioritize explicitly stated information over deductions
Be cautious of overthinking or adding unnecessary complexity to problems
Question initial assumptions. Remember the importance of sticking to the given facts and not letting preconceived notions or pattern recognition override explicit information. Consider ALL provided information equally.
re-check the reasoning against each piece of information before concluding.
@shyama5612 Před měsícem ⁺¹
Sara Hooker said the same about us not fully understanding what is used in training - the low frequency data and memorization of those being interpreted as generalization or reasoning. Good interview.
@CoreyChambersLA Před měsícem ⁺¹
ChatGPT simulates reasoning surprisingly well using its large language model for pattern recognition and prediction.
@user-fh7tg3gf5p Před měsícem ⁺¹
This discussion makes it totally clear about what we can expect from the LLMs, and the irrefutable reasons for it.
@NextGenart99 Před měsícem ⁺¹
Easy, it’s all about prompting. Try this prompt with the Planbench test: Base on methodical analysis of the given data, without making unfounded assumptions. Avoid unfounded assumptions this is very important that you avoid unfounded assumptions, and base your reasoning directly on what you read/ see word for word rather than relying on training data which could introduce bias, Always prioritize explicitly stated information over deductions
Be cautious of overthinking or adding unnecessary complexity to problems
Question initial assumptions. Remember the importance of sticking to the given facts and not letting preconceived notions or pattern recognition override explicit information. Consider ALL provided information equally.
re-check the reasoning against each piece of information before concluding.
@scottmiller2591 Před měsícem ⁺¹
Good take on LLMs and not anthropomorphizing them. I do think there is an element of "What I do is hard, what others do is easy" to the applications of LLMs in creativity vs. validation, however.
@user-fh7tg3gf5p Před měsícem ⁺⁶
Such a sharp mind of a senior man.
@therainman7777 Před 29 dny ⁺¹
We already do have a snapshot of the current web. And snapshots for every day prior. It’s the wayback machine.
@willd1mindmind639 Před měsícem ⁺¹
Reasoning in humans is about using abstractions or general understanding of concepts to arrive at a result. A perfect example is math problems. Most humans use shortcuts to solve math calculations which can be a form of reasoning. In a computing sense, reasoning would be calculating a math answer without using the ALU (Arithmetic logic circuts on the CPU). In a GPT context it would mean arriving at a result without having the answer (and question) already in the training distribution set. So for example, a human using reasoning can add two plus two as follows: 2 is a number representing a quantity of items in a set that can be counted. So 2 plus 2 becomes 1, 2, 3, 4 (counting up 2 places and then counting up 2 more places with 4 being the answer. Something like that is not possible on a CPU . And ChatGPT would also not be able to do that either because it wouldn't be able to generalize that idea of counting to any kind of addition of 2 numbers. If it could, without every combination of numbers written out using the counting method in its training data (or distribution), then it would be reasoning.
@prabhdeepsingh5642 Před měsícem ⁺¹
Leaving the debate of reasoning aside, this discussion was a damn good one. Learned a lot. Dont miss out on this one due to some negative comments. Its worth your time.
@KRGruner Před 10 dny
Great stuff! ACTUAL non-hype commentary on AI and LLMs. I am familiar with Chollet and ARC, so no big surprises here but still, very well explained.
@Neomadra Před měsícem ⁺²⁴
LLMs definitely can do transitive closure. Not sure why the guest stated otherwise. I tried it out with completely random strings as object names and Claude could do it easily. So it's not just retrieving information.
@autingo6583 Před měsícem ⁺⁵
this is supposed to be science. i hate it so much when people who call themselves researchers do not really care for thoroughness, or even straight out lie. don't let them get away with it.
@jeremyh2083 Před měsícem ⁺¹²
It struggles with it if you create something it’s never seen before. It’s a valid point on his part.
@st3ppenwolf Před měsícem ⁺⁶
transitive closures can be done from memory. It's been shown these models perform bad with novel data, so he has a point still
@SurfCatten Před měsícem ⁺³
And it was also able to do a rotation cipher of any arbitrary length when I just tested it. There are definite limitations but what they can do is far more complex than simply repeating what's in the training data. I made a separate post but I just wanted to add on here that it can also do other things that he specifically said it can't.
@gen-z-india Před měsícem
Ok, everything they speak is guess work, and it will be so until deep learning is there.
@rolandhochmuth1951 Před měsícem ⁺²
I originally tried, Let a = b and b = c, does a = c? The response cited the transitive property and said yes, a = c. So, then I tried something like, Let asdf = hjkl and let hjkl = vbnm, does asdf = vbnm to try and eliminate a potential reference to a description of the transitive property that uses a, b, and c as variable names? The response was not able to identity the transitive property in this case and didn't know how to treat the = sign. I then modified the prompt after a few iterations too, Let asdf, hjkl and vbnm be variables. Let = be equality. Given asdf = hjkl and hjkl = vbnm, does asdf = vbnm? The response was:
You're getting closer! Now we have a clearer setup.
Since we've defined "=" as equality and the variables are unspecified, we can indeed apply the transitive property.
Yes, given asdf = hjkl and hjkl = vbnm, then asdf = vbnm.
The transitive property holds true in this case because we're dealing with standard equality.
@rolandhochmuth1951 Před měsícem
Extending this model a bit more. Let asdf, hjkl, vbnm and zxcv be variables. Let = be equality. Given asdf = hjkl, hjkl = vbnm, and vbnm = zxcv, does asdf = xxcv?
Model
2.5s
Yes, you're absolutely right!
Following the transitive property:
asdf = hjkl
hjkl = vbnm
vbnm = zxcv
Therefore:
asdf = zxcv
Let me know if you have any other logic puzzles you'd like to try!
@vishalrajput9856 Před měsícem ⁺²
I love Rao's work and he's funny too.
@annette4718 Před měsícem ⁺¹
This is a very refreshing episode. Lots of complex topics synthesized into easily digestible insights
@techw4y Před měsícem ⁺¹
I agree fully with the points here. LLMs are good at "creative" side of language and media, though its not really the same creativity as humans. However its best to use that capability of LLMs to construct responses in an acceptable manner, while the actual data is coming from authoritative sources and the metrics coming from reliable calculations based on formulas, calculators or rule engines.
Btw, I have given below a better written professional version of my above post, courtesy Google Gemini. I could not have said it any better.
I concur with the assessment presented. Large language models (LLMs) excel at generating creative language and media, albeit distinct from human creativity. Leveraging this capability, LLMs can effectively construct responses in an appropriate manner, while sourcing data from authoritative references and deriving metrics from reliable calculations based on formulas, calculators, or rule engines. This approach optimizes the strengths of both LLMs and traditional information systems for a comprehensive and accurate solution.
@user-gr9ql7wx3c Před měsícem
This is the first time I'm seeing either of the two people in the video, and I'm hooked. Lots of hard-punching and salients points to be gotten from the guest, and kudos to the interviewer for steering the discussion.
@markplutowski Před měsícem ⁺⁸
if the title says “people don’t reason” many viewers think it makes the strong claim “ALL people don’t reason“, when it is actually making the weaker claim “SOME people don’t reason“. that title is factually defensible but misleading. one could be excused for interpreting this title to be claiming “ChatGPT doesn’t reason (at all)“, when it is actually claiming “ChatGPT doesn’t reason (very well)“.
One of the beauties of human language is that the meaning of an utterance derived by the listener depends as much on the deserialization algorithm used by the listener as on the serialization algorithm employed by the speaker. the CZcams algorithm chose this title because the algorithm “knows” that many viewers assume the stronger claim.
nonetheless, be that as it may, this was a wonderful interview. many gems of insight on multiple levels ; including historical, which I enjoyed. I especially liked your displaying the title page of an article that was mentioned. looking forward to someone publishing “Alpha reasoning: no tokens required“.
I would watch again.
@阳明子 Před měsícem ⁺²
Professor Kambhampat is making the stronger claim that LLMs do not reason at all.
@markplutowski Před měsícem
@@阳明子 1:20:26 "LLMs are great idea generators", which is such an important part of reasoning, he says, that Ramanujan was great largely because he excelled at the ideation phase of reasoning. 16:30 he notes that ChatGPT 4.0 was scored at 30% on a planning task. 1:23:15 he says that LLMs are good for style critiques, therefore for reasoning about matters of style, LLMs can do both ideation and verification.
@阳明子 Před měsícem
@@markplutowski 3:14 "I think the large language models, they are trained essentially in this autoregressive fashion to be able to complete the next word, you know, guess the next word. These are essentially n-gram models."
11:32 Reasoning VS Retrieval
17:30 Changing predicate names in the block problem completeley confuses the LLMs
32:53 "So despite what the tenor of our conversation until now, I actually think LLMs are brilliant. It's just the brilliant for what they can do. And just I don't complain that they can't do reason, use them for what they are good at, which is unconstrained idea generation."
@markplutowski Před měsícem ⁺¹
@@阳明子 Ok, I see it now. I originally misinterpreted his use of a double-negative there where he says "And just I don't complain that they can't do reason".
That said, he contradicts himself by admitting that they can do a very limited type of reasoning (about matters of style), and are weakly capable of planning (which is considered by many as a type of reasoning, although he seems to disagree with that), and can be used for an important component of reasoning (ideation).
But yeah, I see now that you are correct - even though there are these contradictions he is indeed claiming "that they can't do reason".
@ACAndersen Před měsícem ⁺¹
His argument is that if you change the labels in classical reasoning tests the LLM fails to reason. I tested GPT 4 on the transitive property, with the following made up prompt: "Komas brisms Fokia, and Fokia brisms Posisos, does Komas brism Posisos? To brism means to contain." After some deliberation it concluded that yes, the statement holds true. Thus there is some reasoning there.
@hashp7625 Před měsícem ⁺¹
How did you test his primary point on this topic - that the GPT 4 training data is so large that it has been trained on common statements like this and that answering true is a likely distribution?
@yafz Před měsícem ⁺¹
Excellent, in-depth interview! Thanks a lot!
@GarthBuxton Před měsícem ⁺³
Great work, thank you.
@SurfCatten Před měsícem ⁺¹⁰
Claude just deciphered a random biography, in rotation cipher, for me. All I told him was that it was a Caesar cipher and then gave him the text. I didn't tell him how many letters it was shifted or rotated by and I didn't use rot13. I tried it three times with three different shift values and it translated it perfectly each time. There's no way that Claude has memorized every single piece of information on the internet in cipher form. Don't know if it's "reasoning" but it is certainly applying some procedure to translate this that is more than just memorization or retrieval. ChatGPT also did it but it had some errors.
Instead of criticizing other scientists for being fooled and not being analytical enough maybe you should check your own biases.
I have found it true that it can't do logic when a similar logic problem was not in its training data but it definitely can generalize even when very different words are used.
@quasarsupernova9643 Před měsícem ⁺³
The benefits of scale create an illusion of reasoning. Perhaps that suffices for many really useful applications, but sooner or later we are going to hit a brick wall and start missing the ability to actually reason rather than pretend to reason by its ability to handle retreival at scale...
@SurfCatten Před měsícem ⁺⁴
@@quasarsupernova9643 That's a perfectly reasonable statement. However my comment was that I tested it and it did generalize and use logic to solve a cipher that the speaker just said it could not do unless it had memorized it, which is impossible if you think about it. The amount of information contained in these models is infinitesimal compared to that used their training data. The idea that it can explain all jokes simply because it read a website explaining them is so simplistic compared to the way that LLMs operate that it's absurd. Or that it can translate a particular cipher because it read a website containing every single possible word in English translated into cypher using ROT13. So I tested it specifically not using ROT 13 and using an obscure long biography with lots of non-English names etc. and it had no problem, not only identifying the specific shift used in the cipher, but then applying it.
@Neomadra Před měsícem ⁺²
It could also be that they used synthetic data to train the model specific for this task. For this specific task creating synthetic data is trivial. Unfortunately no of the major players reveal the training data they use so it's hard to know when a model truly generalizes. That said, I tested the transitive closure task and using completely random strings as objects it nailed it with ease. So at least it has learned a template to solve unseen problems, which I consider at least a weak form of reasoning.
@ej3281 Před 28 dny
Very nice to hear from an LLM guy that hasn't lost his mind. He's simply wrong about LLMs being useful for unconstrained idea generation, but as far as his other views go, very enjoyable to watch.
@markplutowski Před měsícem ⁺¹
1:31:09 - 1:31:32. “People confuse acting with planning“ . “We shouldn’t leave toddlers alone with a loaded gun.” this is what frightens me : agent based systems let loose in the wild without proper controls. A toddler AI exploring the world, picking up a loaded gun and pulling the trigger.
@siddharth-gandhi Před měsícem ⁺²
Hi! Brilliant video! Much to think about after listening to hyper scalers for weeks. One request, can you please cut on the clickbait titles? I know you said for YT algo but if I want to share this video with say PhD, MS or profs, no one takes a new channel seriously with titles like this one (just feels clickbaity for a genuinely good video). Let the content speak for itself. Thanks!
@MachineLearningStreetTalk Před měsícem ⁺³
I am really sorry about this, we will change it to something more academic when the views settle down. I’ve just accepted it as a fact of youtube at this point. We still use a nice thumbnail photo without garish titles (which I personally find more egregious)
@siddharth-gandhi Před měsícem ⁺¹
@@MachineLearningStreetTalk Thanks for understanding! 😁
@jeremyh2083 Před měsícem ⁺⁸
Those people who are assuming the AGI is going to be achieved have never done long-term work inside any of the major GPT systems if you want to have a quick and dirty test, tell it to create you a fiction book first make 15 chapters and 10 sections with each chapter And then have it start writing that book look at it in detail and you will see section after section it loses sight of essentially every detail. It does a better job if you are working inside the universe, another author has already made and does the worst job if you were creating a brand new universe, even if you have it define the universe.
@mattwesney Před měsícem
sounds like youre bad at prompting
@jeremyh2083 Před měsícem
@@mattwesney lol it does, doesn’t it, but you haven’t tried it and I have.
@phiarchitect Před měsícem ⁺²
what a wonderfully exuberant person
@luke.perkin.inventor Před měsícem ⁺¹
Great episode and fantastic list of papers in the description!
@plasticmadedream Před měsícem ⁺²
A new bombshell has entered the villa
@Jukau Před měsícem ⁺³
what is the bombshell? This is absolutly clear and known...it would be a bombshell if it would
@MachineLearningStreetTalk Před měsícem ⁺²
Read the comments section here, I wish it was clear and known. It's subtle and requires a fair bit of CS knowledge to grok unfortunately.
@swarupchandra1333 Před měsícem
One of the best explanations I have come across
@alexandermoody1946 Před měsícem
Not all manhole covers are round.
The square manhole covers that have a two piece triangular tapered construction are really heavy.
@briandecker8403 Před měsícem ⁺¹¹
I love that this channel hosts talks by the best experts in the field and generates comments from the lowest Dunning-Kruger keyboard cowboys.
@LuigiSimoncini Před měsícem
Bwahaha!!!
@johnkost2514 Před měsícem
This aligns nicely with the work Fabrice Bellard has been doing using Transformers to achieve SOTA lossless compression in his NNCP algorithm.
Coincidence .. I think not!
@davidcummins8125 Před měsícem ⁺¹
Could an LLM for example figure out whether a request requires a planner, a math engine etc, transform the request into the appropriate format, use the appropriate tool, and then transform the results for the user? I think that LLMs provide a good combination of UI and knowledge base. I was suspicious myself that in the web data they may well have seen joke explanations, movie reviews, etc etc and can lean on that. I think that LLMs can do better, but it requires memory and a feedback loop in the same way that embodied creatures have.
@DataJuggler Před měsícem ⁺²
0:18 When I was 4 years old, I was often stuck at my parents work. The only thing for me to do that was entertaining, was play with calculators or adding machines. I memorized the times table, because I played with calculators a lot. My parents would spend $8 at the drug store to keep me from asking why is the sky blue and other pertinent questions. I was offered to skip first grade by after kindergarten, and my parents said no. Jeff Bezos is the same age from me, and also from Houston. His parents said yes to skipping first grade. I told my parents this forever until they died.
@pruff3 Před měsícem ⁺¹³
I memorized all the knowledge of humans. I can't reason but I know everything humans have ever put online. Am I useful? Provide reason.
@fburton8 Před měsícem ⁺⁴
What proportion of “all the knowledge of humans” do current models have access to?
@pruff3 Před měsícem
@@fburton8 all of it, well everything on the open Internet so most books, most poetry, lots of art, papers, code, etc.
@albin1816 Před měsícem ⁺¹
Extremely useful. Ask any engineer who's addicted to ChatGPT / Copilot / OpenAI API at the moment for their daily workflows.
@malakiblunt Před měsícem ⁺¹
but i make up 20% of my answers - can you tell which ?
@n4rzul Před měsícem ⁺¹
@@malakiblunt So do you... sometimes...
@jamad-y7m Před měsícem ⁺²
Chat gpt is definitely not sentient. A lot of times when I'm talking to it, it will just repeat the same answers over and over again without realizing that I've already told it it is wrong.
@fburton8 Před měsícem
What really bugs me is when ChatGPT clearly doesn’t know something but keeps on acting like it does. It’s lack of self awareness and humility is really unhelpful.
@krox477 Před měsícem ⁺¹
What is your defination for sentience
@MateusCavalcanteFonseca Před měsícem
Hegel said long time ago that deduction and induction are diferent aspects of the same process, the process of aquiring knlowdge about the world. great talk
@JG27Korny Před měsícem ⁺¹
I think there is broad misconception. LLMs are LLMs they are not AGI (artificial general intelligence).
Each AI has a world model. If the question fits the world model it will work. It is like asking a chess ai engine to play checkers.
That is why multimodal models are the big thing as they train not just on corpus of texts and on images too. So those visually trained AI models will solve the stacking problem on minute 19:00.
It is not that chatgpt does not reason. It reasons but not as a human does.
@user-mn6bb6gi6v Před měsícem
Pertinently pinpointed, one killed 'the beast'. LLMs are just wonderful 'bibliothèques vivantes', quite great tools that save time by ignoring any educated protocols
@anuragshas Před měsícem ⁺¹
On the Dangers of Stochastic Parrots paper still holds true
@lystic9392 Před 29 dny
I think I have a way to allow almost any model to 'reason'. Or to use reasoning, anyway.
@VanCliefMedia Před měsícem
I would love to see his interpretation of the most recent gpt4 release with the structured output and creating reasoning through that output
@SkyGodKing Před měsícem ⁺¹
Whenever I do tests I always change the names, make thing unique such that it's never encountered them before, and it seems to reason just fine. So I don't think that arguement holds weight, but I guess maybe I need to check his paper.
@wtfatc4556 Před měsícem ⁺³
Gpt is like a reactive mega wikipedia....
@Neomadra Před měsícem ⁺¹¹
I agree but Claude 3.5 does! ;)
@vpn740 Před měsícem ⁺³
no, it doesn't.
@Neomadra Před měsícem ⁺¹
@@vpn740It's a joke.
@tylermoore4429 Před měsícem
This analysis is of LLM's as a static thing, but the field is evolving. Neurosymbolic approaches are coming, a couple of these are already out there in the real world (MindCorp's Cognition and Verses AI).
@pallharaldsson9015 Před měsícem
16:44 "150% accuracy [of some sort]"? It's a great interview with the professor (the rest of it good), who knows a lot, good to know we can all do such mistakes...
@benbridgwater6479 Před měsícem
I processed it as dry humor - unwarranted extrapolation from current performance of 30%. to "GPT 5" at 70%. to "GPT 10" at 150%. Of course he might have just mis-spoke. Who knows.
@MoeShlomo Před měsícem ⁺²
People typically assume that LLMs will always be "stuck in a box" that is determined by their training data. But humans are of course quite clever and will figure out all sorts of ways to append capabilities analogous to different brain regions that will allow LLMs to effectively "think" well enough to solve increasingly-challenging problems and thereby self-improve. Imagine equipping a humanoid robot (or a simulated one) with GPT6 and Sora3 to allow it to make predictions about what will happen based on some potential actions, take one of those actions, get feedback, and integrate what was learned into its training data. My point is that people will use LLMs as a component of a larger cognitive architecture to make very capable systems that can learn from their actions. And of course this is just one of many possible paths.
@benbridgwater6479 Před měsícem ⁺¹
Sure, there will be all sorts of stuff "added to the box" to make LLMs more useful for specific use cases, as is already being done - tool use, agentic scaffolding, specialized pre-training, etc, but I don't think any of this will get us to AGI or something capable of learning a human job and replacing them. The ability for lifelong learning by experimentation is fundamentally missing, and I doubt this can be added as a bolt-on accessory. It seems we really need to replace gradient descent and pre-training with a different more brain-like architecture capable of continual learning.
@eyoo369 Před měsícem
@@benbridgwater6479Yes agree with that. Anything that doesn’t resemble the human brain will not bring us to AGI. While LLMs are very impressive and great first step into a paradigm shift. They are ultimately a hack route to reach the current intelligence. there are still so many levels of reasoning missing even from the SOTA models like Claude 3.5 and GPT-4o. For me the roadmap to general intelligence is defined by the way it learns and not necessarily what a model outputs after pre-training it. To be more specifically.. true AGI would be giving a model the same amount of data a human approximately gets exposed to in its lifetime and perform like a median human. Throwing the worlds data and scaling the parameters into billions / trillions.. although is impressive. But far away from AGI
@notHere132 Před měsícem ⁺⁵
I use ChatGPT every day. It does not reason. It's unbelievably dumb, and sometimes I have trouble determining whether it's trying to deceive me or just unfathomably stupid. Still useful for quickly solving problems someone else has already solved, and that's why I continue using it.
@GoodBaleadaMusic Před měsícem ⁺³
CHAT GTP is smarter than EVERYONE I work with.
@PaoloCaminiti-b5c Před 22 dny ⁺¹
I'm very skeptic of this, Aristotele inferred logic by looking at rhetoric arguments, LLMs could being extracting those features already while building their model to compress the corpus of data and this seems equivalent to propositional logic. It seems this researcher is pushing too much the accent on agents needing to be able of mathematical proof, which utlity in agents - including humans - is not well stated.
@quebono100 Před měsícem ⁺¹
Wow a good one thank you
@jerosacoa Před měsícem ⁺²
I don't agree with it, and this is why : Reasoning is a complex cognitive process that's challenging to define precisely. To understand it better, let's start by examining a neuron, the basic unit of our brain's information processing system.
A neuron resembles a highly energy-efficient, non-linear information processor with multiple inputs and adaptive connectivity (neuroplasticity). It's tree-like in structure, with dendrites branching out to receive signals and an axon to transmit them.
Modern AI, including large language models like ChatGPT, attempts to mimic certain aspects of neural processing. While these models use mechanisms like self-attention, backpropagation, and gradient descent - which differ from biological neural processes - the underlying inspiration comes from the brain's information processing capabilities.
It's important to note that the hardware differences between biological brains and artificial neural networks necessitate different implementations. For instance, our biological processors inherently incorporate time as a dimension, allowing us to process temporal sequences through recurrent connections. AI models have found alternative ways to handle temporal information, such as positional encodings in transformer architectures.
Given these parallels and differences, we should reconsider the question: Do humans "reason" in a fundamentally different way than AI? The answer isn't straightforward. Creativity and reasoning are both poorly defined terms, which complicates our understanding of these processes in both humans and AI.
Large language models, including ChatGPT, process vast amounts of factual information and can combine this information in novel ways to produce responses that often appear reasoned and creative. While the underlying mechanisms differ from human cognition, the outputs can demonstrate logical coherence, factual accuracy, and novel insights.
Therefore, it may be more productive to view AI reasoning not as an all-or-nothing proposition, but as a spectrum of capabilities. These models can certainly perform tasks that involve logical inference, factual recall, and the synthesis of information - key components of what we often consider "reasoning."
It seems to me that the argument that LLMs cannot reason may be based on an overly narrow definition of reasoning that doesn't account for the nuanced ways in which these models process and generate information. As our understanding of both human cognition and AI capabilities evolves, we may need to refine our definitions of reasoning and creativity to better reflect the complex reality of information processing in both biological and artificial systems...
@MachineLearningStreetTalk Před měsícem ⁺²
There is something to what you are saying for sure, but in this context, the professor is talking about a specific form of reasoning i.e. deductive closure.
@benbridgwater6479 Před měsícem
He defines what he means by reasoning - deductive closure - things that can be deduced from a base set of knowledge. If an LLM could reason, then it wouldn’t depend on being additionally trained on things that are deducible from other data in the training set
@jerosacoa Před měsícem
@@benbridgwater6479 ...well.... I don't agree! It's fundamentally about data. Humans possess a vast reservoir of data, which enables us to extrapolate further insights. If your datasets are incomplete, your understanding will be limited. Moreover, alignment and security measures are in place to impose these limitations.. therefore you wouldn't know unless you were sam's buddy! :) But it's only a matter of time until self-sufficient AI systems, models, and LLMs can generate synthetic data to evolve independently-this is already happening as you may know.
Firstly, it's essential to recognize that reasoning, even deductive reasoning, involves logical inference, factual recall, and the synthesis of information-domains where large language models have shown remarkable capabilities. For example, studies have demonstrated that models like GPT-3 and GPT-4 perform well on analogical reasoning tasks, even surpassing human abilities in areas like abstract pattern recognition and logical inference when appropriately guided. The outputs of AI reasoning can closely resemble those produced by humans, especially when employing "chain-of-thought" "reasoning", a method used by the latest models.
In structured testing scenarios, such as medical exams, ChatGPT has outperformed human candidates by accurately and contextually addressing complex questions, showcasing its ability to apply deductive reasoning effectively. Is this zero-shot luck, or is there more to it? Recently, an AI model secured a silver medal in mathematics, and we're already discussing models with PhD-level expertise across various fields, making R&D fully automatic. Achieving such feats necessitates a more precisely defined concept of "reasoning." as this one won't cut it!
The vagueness of these terms creates a false sense of security. All of this can be countered by examining the broader context of AI reasoning capabilities, as evidenced in numerous studies and experiments. As I see it, we are heading towards a potentially substantial and irrevocable mess. Regardless of the reasons, the reality is that we are dealing with concepts that defy our full comprehension, making the initial safe deployment of a better-trained, fully aligned supermodel nearly "impossible." Once such a system is deployed and causes harm-and believe me, it will-you won't get a second chance to review the code or understand why it happened initially. We must approach this with utmost seriousness and not dismiss these concerns as merely complex decision trees inside a fortune teller's crystal ball.
Indeed, the rabbit hole runs deep, Alice.
@benbridgwater6479 Před měsícem
@@jerosacoa Data and reasoning are distinct things. Data by itself is just static memories (e.g. pretrained weights, or in-context tokens). Reasoning is the dynamic ability to actually use that data to apply it to new, and novel, situations.
Prof. Rao's definition of reasoning as consisting of (or including?) "deductive closure" over a body of data seems necessary if not sufficient! If I tell you a bunch of facts/rules, then, if you have the ability to reason, you should be able to combine those facts/rules in arbitrary combinations to deduce related facts that are implied by the base facts I gave you (i.e. are part of the closure), even if not explcitly given. For example, if I told you the rules of tic tac toe, then you should be able to reason over those rules to figure out what would be winning or losing moves to make in a given game state. I shouldn't have to give you dozens of examples in addition to the rules. LLMs are not like this, and most fail at this task (if they are now succeeding it is only due to additional training data, since the LLM architecture has not changed).
However, one really shouldn't need examples to realize that LLMs can't reason! Reasoning in general requires an open ended number of steps, working memory to track what you are doing, and ability to learn from your mistakes as you try to solve the problem you are addressing. Current transformer-based LLMs such as ChatGPT simply have none of these ... Input tokens/embeddings pass through a fixed number of transformer layers until they emerge at the output as the predicted next token. It's just a conveyor-belt pass-thru architecture. There is no working memory unless you want to count tempory internal activations which will be lost as soon as the next input is presented (i.e by next token), and there is no ability to learn, other than very weak in-context learning (requiring much repetition to be effective), since transformers are pre-trained using the learning mechanism of gradient descent, which is not available at inference time. AGI will require a new architecture.
Prompting and scaffolding techniques such as "think step by step" or "chain of thoughts" can make up for some of the transformer's architectural limitations, by allowing it to use it's own output as working memory, and to build upon it's own output in a (potentially at least) unlimited number of steps, but there is no getting around lack of inference time learning ability. The current approach to address this lack of reasoning ability seems to be to try to use RL pre-training to pre-bake more "fixed reasoning templates" into the model, which is rather like playing whack-a-mole, and has anyways already been tried with the expert system CYC.
I don't discount your concerns about AGI, but LLMs are not the thing you need to fear - it'll be future systems, with new architectures, than do in fact support things like reasoning and continual run-time learning.
@jerosacoa Před měsícem
@@benbridgwater6479 I, being the Black Sheep, respectfully disagree with your assessment. :)
While it is true that data and reasoning are distinct, your assertion that LLMs can't reason because they lack certain traditional cognitive attributes overlooks significant developments and capabilities these models have demonstrated. The lack of long term memory is just a detail that should not be significant to the inference process perception.
Firstly, reasoning, as defined by Prof. Rao, involving "deductive closure," is indeed a component of reasoning. However, it's essential to note that LLMs, particularly advanced ones like GPT-4, have shown remarkable abilities in logical inference, factual recall, and synthesis of information, akin to human reasoning as stated before and i think we both agree on it.
In my POV your point about the architecture of LLMs being a "conveyor-belt pass-thru" is valid to an extent.. BUT.. :) this does not negate their reasoning capabilities. Techniques like "chain-of-thought" prompting allow these models to use their outputs as working memory, enabling them to perform multi-step reasoning processes effectively. This method significantly enhances their ability to solve problems that require an open-ended number of steps.
Moreover, while current transformer-based LLMs do not have traditional working memory or real-time learning capabilities (as a feature not as a bug - due to security and super alignment), they can still achieve impressive feats of reasoning through their pre-trained weights and in-context learning. This includes tasks that involve combining given facts to deduce new information, as seen in those before mentioned studies where these models matched or surpassed human performance on analogical reasoning tasks .
Additionally, it's worth mentioning that LLMs can indeed improve themselves to some extent. They can utilize techniques such as Low-Rank Adaptation (LoRA), which allows models to quickly adapt and fine-tune themselves with a smaller amount of data and computational resources. This can significantly enhance their performance and reasoning capabilities without requiring complete retraining.. so yes.. this limitation is done by design.
Furthermore, integrating concepts from ARC-AGI and neurosymbolic AI can bridge some of the gaps you mentioned. ARC-AGI focuses on creating systems that combine statistical and symbolic AI approaches to achieve more robust and comprehensive reasoning capabilities. Neurosymbolic AI integrates neural networks with symbolic reasoning, leveraging the strengths of both to enhance the cognitive abilities of AI systems. These advancements are paving the way for more sophisticated and capable AI models that approach human-like reasoning more closely. For this.. the latest Deepmind's AlphaProof and AlphaGeometry are good examples.
Regarding the need for new architectures for AGI, it is true that achieving AGI will likely require advancements beyond current LLM architectures. However, this does not diminish the significant reasoning capabilities that current LLMs have demonstrated. These models are continually evolving, and enhancements such as reinforcement learning (RL) and advanced prompting techniques have already shown promise in bridging some of the gaps you mentioned.
So, while it is my understanding that concerns about AGI and future systems are valid, dismissing the reasoning capabilities of current LLMs overlooks the substantial progress made.. and can pave the road to a false sense of security. These models, through sophisticated training and advanced techniques, have demonstrated a form of reasoning that is more advanced than mere static data recall, indicating a significant step towards more complex cognitive functions. The rabbit hole... Ben.. indeed, runs deep, and current LLMs are already navigating it with surprising adeptness.. :)
...but.. then again.. we may be talking about different things. LLMs at this moment are more evolved that GPT4... or GPT4o... (AI developments run very fast.. in a week a lot happens)
@idck5531 Před měsícem
Possible LLMs do not reason, but they sure are very helpful for coding. You can combine and generate code easily and advance much faster. Writing scripts for my PhD is 10x easier now.
@MrBillythefisherman Před měsícem
The prof seems to be saying that we do something different when we reason to when we recall. Is there any evidence from the processes or structure of the brain that this is the case? It always seems as if people are saying they know how the human brain works when to my knowledge at least we haven't really a clue, more than neurons fire signals to other neurons via synapses and that we have dedicated parts of the brain for certain functions.
@PhysicalMath Před měsícem
I've been working with it for a year. It can't reason. It still forgets how to do things it's done before more than once.
@luisluiscunha Před měsícem
Maybe in the beginning, with Yannick, these talks were properly named "Street Talk". They are more and more Library of the Ivory Tower talks, full of deep "philosophical" discussions that I believe will be considered all pointless. I love the way Heinz Pagels described how the Dalai Lama avoided entering into arguments of this kind about AI. When asked his opinion about a system he could talk as to a person, he just said "sit that system in front of me, on this table, then we can continue this talk". This was in the 80s. Even to be profoundly philosophical you can think in a very simple and clear way. It is a way of thinking epistemologically most compatible with Engineering, that ultimately is where productive cognitive energy should be spent.
@life42theuniverse Před měsícem
The most likely response to logical questions is logical answers.
@hartmut-a9dt Před měsícem ⁺¹
great interview !
@TastyGarlicBread Před měsícem ⁺⁶
There are a lot of people in this comment section who probably can't even do basic sums, let alone understand how a large language model works. And yet, they are very happy to criticize.
We are indeed living in an Idiocracy.
@television9233 Před měsícem ⁺²
I've read some of Prof Subbarao's work from ASU. Excited for this interview.
@nodistincticon Před měsícem ⁺²
fantastic guest! his intro statements are so clear and informative. awesome content!!
@mattabrahamson8816 Před měsícem ⁺¹
contrary to his claims gpt4o & sonnet do generalize to different cipher shifts.
gpt4o:
It looks like the text "lm M pszi csy" might be encoded using a simple substitution cipher, such as the Caesar cipher. This type of cipher shifts the letters of the alphabet by a fixed number of positions.
To decode it, let's try different shifts and see if any of them make sense.
For example, if we shift each letter by 4 positions backward (a common shift in Caesar ciphers):
- l -> h
- m -> i
- M -> I
- p -> l
- s -> o
- z -> v
- i -> e
- c -> y
- s -> o
- y -> u
So, "lm M pszi csy" becomes "hi I love you."
This decoded message makes sense as a simple phrase. If you have any additional context or need further decoding, please let me know!
@BrianPeiris Před měsícem ⁺¹
Great, you got one sample. Now run it a hundred times each for different shifts and report back.
@stevengill1736 Před měsícem ⁺²
I understand that it's not entirely accurate, and doesn't apply to all machine learning tech, but I love the expression "stochastic parrot" - it strikes my funny bone for some reason....
But the good professor prompted a couple questions, one being does Google translate sound like ESL (English as 2nd language) to native speakers of other languages, that is, are some languages a little stilted or "foreign sounding", or does Google translate sound like a native speaker in every language it translates to?
And the second question is, what's so amazing about GTPx answering all the questions on standardized exams? The LLM has all the internet data including the correct answers to the exams! It's like having the ultimate cheat sheet....
@simpleidea2825 Před měsícem
When the first stream engine were tested, people were saying that it was a ghost. But now you see...
@akaalkripal5724 Před měsícem ⁺¹²
I'm not sure why so many LLM fans are choosing to attack the Professor, when all he's doing is pointing out huge shortcomings, and hinting at what could be real limitations, no matter the scale.
@alansalinas2097 Před měsícem ⁺²
Because they don't want the professor to be right?
@therainman7777 Před 29 dny
I haven’t seen anyone attacking him. Do you mean in the comments to this video or elsewhere?
@techpiller2558 Před měsícem ⁺²
He is highly educated and faculty somewhere, yes? Yet, still he also forgets or doesn't understand or comprehend or digest that LLMs are not just "predicting the next word", but "approximating the function that produces the training data text", which is intelligence itself, including logic. This is why I believe atm that if you train an LLM enough, the errors will be reduced.
@ubertrashcat Před 18 dny
Insert bell curve meme
@hayekianman Před měsícem ⁺¹
the caesar cipher thing is already working for any n for claude 3.5. so donno
@benbridgwater6479 Před měsícem
Sure - different data set. It may be easy to fix failures like this by adding corresponding training data, but this "whack-a-mole" approach to reasoning isn't a general solution. The number of questions/problems one could pose of a person or LLM is practically infinite, so the models need to be able to figure answers for themselves.
@januszinvest3769 Před dnem
@@benbridgwater6479so please give one example that shows clearly that LLMs can't reason
@human_shaped Před měsícem ⁺¹
This is a silly argument now. It's easy to construct reasoning questions that are guaranteed novel, and they can be answered. They're not brilliant at reasoning, but they can do it. Even though predicting tokens is the high level goal, in order to do that effectively they need to develop some "understanding" and "reasoning" that is not expressed in n-gram statistical models. The reason I say silly *now* is that it has so concretely been demonstrated so often now. I find that people still saying this sort of thing just haven't spent much time with the best models doing novel work.
@benbridgwater6479 Před měsícem
The models learn the patterns in the training set, including reasoning patterns, which they can predict/generate just as easily as anything else, but clearly there are no weight updates occurring at runtime - nothing new being learnt (other than weak, ephermal, in-context learning), and anyways the model has no innate desire (cf human curiosity, boredom) to learn even if it could do so, and at runtime it's prediction errors go unnoticed/unrewarded (in an animal brain this is the learning signal!). When someone says "LLMs can't do X", it needs to always be understood as "... unless X was in the training set".
Prof. Rao's definition of reasoning as "deductive closure", while a bit glib (incomplete), does capture what they are missing. While the model can certainly utilize reasoning patterns present in the training set, you can't in general just give it a set of axioms and expect it to generate the deductive closure of everything derivable from those axioms, and perhaps somewhat surprisingly this even includes things like the rules of a game like "tic tac toe" where one might have guessed that the simple reasoning patterns to perform the closure would have been learnt from the training set.
It seems people WANT to believe that the future is here, and massive confirmation bias is at play. People pay attention to the impressive things these models can do based on what they learnt from the training set, and just ignore what they can't do based on architectural limitations. Of course the companies building the models play into this by playing "whack-a-mole" and add new benchmark-beating training data to each new model. As an old-timer, it reminds me of CYC (the ultimate expert system experiment - with decades of effort adding new rules - cf training data - do it. "Scale it and it'll become sentient" was the wishful thinking back then, just as it is today.
@LionKimbro Před měsícem ⁺²
"Why are manhole covers round?" "It's not that they are round. It's that they are circles. They keep the demons in... Or, ... maybe, out."

Další v pořadí

Automatické přehrávání

Neural and Non-Neural AI, Reasoning, Transformers, and LSTMs