Large Language Models and Generative Grammar

Carnie 2021 Syntax 4th Edition

zhlédnutí 3 503

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 31. 05. 2024
This video covers a recent topic of interest that is not addressed in the textbook itself. It looks at the question of whether LLMs like ChatGPT constitute a real challenge to Generative Grammar, particularly with respect to the issues of Universal Grammar and Structural Dependence (i.e. are the principles underlying human syntax deterministic and rule-based or are they probabilistic and based in statistical frequency. The conclusion drawn is that LLMs, while impressive and sophisticated, don't replicate the properties of human language COMPETENCE. They are based on large data sets unavailable to children learning a language and they make incorrect inferences about structure. By contrast they are extremely good at PERFORMANCE. So they effectively model something distinct from Generative Grammar and neither approach negates the other.
©2023, Andrew Carnie
Please purchase the book: www.wiley.com/en-us/Syntax%3A...
Jak na to + styl

Komentáře • 9

@joelthomastr Před 13 dny
Thank you very much. I've been in need of something like this to understand the generativist critique of LLMs
@CarnieSyntaxthEdition Před rokem ⁺³
Links Mentioned in the Video:
How LLMs work: www.analyticsvidhya.com/blog/2023/03/an-introduction-to-large-language-models-llms/
Challenges of AI ethics: www.technologyreview.com/2021/05/20/1025135/ai-large-language-models-bigscience-project/
Ted Chiang in the New Yorker: www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web?fbclid=IwAR0XVYzrGDLDMlkvr6ajV9nAqwt50bX2lX7v4nOGgNRptU21oPOt80Vqxdg
Gary Marcus’ Blog: garymarcus.substack.com/p/gpt-4s-successes-and-gpt-4s-failures
Also Gary Marcus: garymarcus.substack.com/p/noam-chomsky-and-gpt-3
Noam Chomsky in the NYT: www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html
@dominiks5068 Před 9 měsíci
The A-condition example is amazing! I used a similar example on Bing AI and it also fell for it
@the_linguist_ll Před rokem ⁺²
Genuinely curious what you think of RRG, FDG, and SFG if anything
@CarnieSyntaxthEdition Před rokem ⁺³
That’s really a question that takes more than a comment to answer. I teach a whole graduate course on non-Chomskyan theories. But I’d say that functionalist theories have more in common with formalist theories then they do with the stochastic folk - despite the fact they often align themselves with stochastic learning. Oversimplifying *massively*the major difference, in my view, between structuralist theories like GG and functionalist approaches is the direction of causality between form and function. For example formalist theories typically take structure to determine meaning / discourse function. Functionalist theories take function to determine form. Again that’s a MASSIVE oversimplification. But what they have in common is the notion of structure especially theories like RRG. LLMs have no abstract structure so they are a very different fish than either GG or functionalist grammars.
@dylanlow4871 Před 7 měsíci
I'd like to add a little to this discussion from an SFG perspective (particularly the Cardiff Model developed by Robin Fawcett and his colleagues), which as Andrew Carnie had pointed out in his comment, is among those functionalist theories which assume function to determine form.
Yet, I actually find myself agreeing with most of the critique of LLMs offered by in this video, so at least in broad strokes Generative Grammar (GG) and SFG share points of agreement. In SFG, the counterpart concepts to GG's competence-performance distinction is the distinction between potential and instance. The idea is simple: the systems of potential are the engines that generate instances, and Cardiff SFG in particular prosposes theoretical models for both syntactic potential (structured paradigmatically as systems of alternative choices) and syntactic instances (structured syntagmatically as trees of units which realize specific choices). So, like GG, Cardiff SFG proposes that syntax instances (i.e. generated sentences) are hierarchically structured, and would agree that LLMs are lacking in modelling this crucial aspect of language.
However, an important point of divergence would be how GG and SFG model competence/potential. From my understanding (I could be wrong here), most versions of GG lack a paradigmatic theory and instead illustrate competence by showing how well-formed sentences can be derived (e.g., from D-structure to S-structure in G&B and via Merge in Minimalism) by the application of some proposed set of rules or transformations. In this case, the rules are the backbone of a speaker's competence. Optimality Theory (OT) is an exception to this view within GG in that it attempts to model the generative processes as choices between candidates rather than rule-application, somewhat similar to SFG.
But SFG (and OT, for that matter) is more aligned with the stochastic approach inherent in today's LLMs, at least with regard to modelling potential/competence. When you view generative potential as the selection between realization choices, the question of which choices are statistically more probable is inevitable. This is the bedrock for Roman Jakobson's linguistic notion of markedness (incidentally a key concept in both SFG and OT). The stochasticity of linguistic choice is also in alignment with what is generally known about the human brain, particularly in habit-formation and the Hebbian learning principle of "neurons that fire together, wire together." That said, the sheer volume of data required to train an LLM like ChatGPT indicates a clear discontinuity between how the LLMs model learning and how learning actually takes place in humans.
Thus, it seems to me, that SFG and GG are not in completely separate worlds (as the stark lack of interaction in the literature might imply). Both do not deny structure dependence in syntax and agree that LLMs could benefit from some knowledge of syntactic structure. And both are concerned with generating well-formed sentences that make sense (no space to discuss this further though). It also seems to me that it is when one considers a paradigmatic perspective that statistical learning naturally arises as a central concern, as it has in GG via OT (particularly the literature on OT's underlying connectionist architecture, e.g., Smolensky & Legendre's The Harmonic Mind, 2006).
@vincentduenas8964 Před rokem ⁺¹
Is there any immediate reconciliation between LLM's vs human language learning in the foreseeable future? It appears that scientific linguistics is in the middle of an AI vs human war per se, where Big Data (quantum computing?) will dictate the future of the evolution of human language by simply overpowering the human language faculty, no matter that generative grammar models human language so effectively.
@CarnieSyntaxthEdition Před rokem ⁺²
I have no clue! If you asked me 6 months ago if an LLM would be as good as ChatGPT is now I would have laughed you out of the room. So clearly I’m not good at predicting the future.
@deadeaded Před rokem ⁺¹
There might be a path for reconciliation/synthesis via a discipline called Geometric Deep Learning. GDL provides a principled way to endow neural networks with innate priors based on the symmetries you expect to see in your data.
The most famous example comes from image classification. We know that images have translational invariance: a cat is still a cat if you shift it three pixels to the left. You could, in principle, throw a generic neural network at the problem and hope that it would "learn" translational invariance on its own, but it turns out the better way to do it is to build translational invariance into the network itself (a so-called "convolutional neural network"). GDL generalizes this principle to other types of symmetries.
The transformer architecture that LLMs are built on also have a symmetry built into them, but it's pretty rudimentary. If you could reformulate universal grammar into a collection of symmetries, you could, at least in principle, use GDL to build a neural network architecture that has universal grammar built into it as an innate prior.

Další v pořadí

Automatické přehrávání