The Debate Over “Understanding” in AI’s Large Language Models

Sdílet
Vložit
  • čas přidán 21. 04. 2024
  • Melanie Mitchell, Santa Fe Institute
    Abstract: I will survey a current, heated debate in the AI research community on whether large pre-trained language models can be said to "understand" language-and the physical and social situations language encodes-in any important sense. I will describe arguments that have been made for and against such understanding, and, more generally, will discuss what methods can be used to fairly evaluate understanding and intelligence in AI systems. I will conclude with key questions for the broader sciences of intelligence that have arisen in light of these discussions.
    Short Bio: Melanie Mitchell is Professor at the Santa Fe Institute. Her current research focuses on conceptual abstraction and analogy-making in artificial intelligence systems. Melanie is the author or editor of six books and numerous scholarly papers in the fields of artificial intelligence, cognitive science, and complex systems. Her 2009 book Complexity: A Guided Tour (Oxford University Press) won the 2010 Phi Beta Kappa Science Book Award, and her 2019 book Artificial Intelligence: A Guide for Thinking Humans (Farrar, Straus, and Giroux) was shortlisted for the 2023 Cosmos Prize for Scientific Writing.
    cbmm.mit.edu/news-events/even...
  • Věda a technologie

Komentáře • 27

  • @breaktherules6035
    @breaktherules6035 Před 22 dny +5

    EXCELLENT insights! Thank you so much for sharing!

  • @andytroo
    @andytroo Před 18 dny +2

    22:35 - how many of those tasks require knowledge of the exact spelling of the word? - LLM's are only passed the encoded tokens, and may not be aware of word spelling in a way that allows eg: acronyms?

  • @ZelosDomingo
    @ZelosDomingo Před 20 dny +3

    It seems like tokenization would really fuck with one's ability to do some of these tests. Like, I don't know how much the format of something like that would even be preserved? Also, makes me wonder how much the lack of physical 3d movement data/training would impact some of these reasoning tasks. Like, you can even notice in her language use about concepts and stuff how much "spatial" reasoning is involved.
    It seems like to do one of these tests fairly, you would have to completely sort of homogenize the way the test taker would be taking it?
    It brings to mind like, disabilities, you know? You wouldn't expect someone that was born not only blind, but completely unable to process visual data in any way we would recognize to be able to solve visual tasks necessarily, unless they generalize well to whatever mediums they do know.

    • @NullHand
      @NullHand Před 19 dny +1

      I once saw a research 'paper' on decoding the actual nerve signals that are sent from the mammalian retina to the brain.
      The "visual data" turned out to actually have been pre-processed into something like 6 channels of what could be described as "very stripped down image data".
      There was a "channel" of mostly just high contrast edges. There was a "channel" of cells that had recent luminosity changes.
      There were some "channels" that were apparently still in a status of WTF is this?
      So I would not be surprised if it turns out that information flow in our human brain turns out to be way more "Tokenized" than we assume.

  • @novantha1
    @novantha1 Před 18 dny +1

    In my opinion understanding is actually pretty clear. In humans, a very useful skill in language acquisition is circumlocution, or referring to a concept without using the name of it. Now, for a true LLM it's possible that could be done with regurgitation of training data directly (what are the odds that some common turns of phrase show up on Wikipeda? Or that a dictionary would find its way into a dataset?), but in a multi modal LLM I think the ability to verify similar patterns of neuron activity for analogous inputs across modalities is pretty indicative of strong understanding and generalization. In other words, I think the strength of understanding can be measured as the number of unique inputs that can implement a given pattern of behavior in the FFN, or lead to the same or similar output in the language head, roughly.

    • @alexanderbrown-dg3sy
      @alexanderbrown-dg3sy Před 18 dny

      Agreed. This is literally the basis for the formation of internal world models. To me. The world model in itself is confirmation of a deep contextual understanding, stop bottlenecks in that understanding(knowledge conflicts, Hallucinations..etc). This is an architectural and data issue though..fyi temporal self-attention makes a world of a difference. The model needs native temporal embeddings.

  • @ArtOfTheProblem
    @ArtOfTheProblem Před 20 dny +3

    can you post discussion?

  • @kellymoses8566
    @kellymoses8566 Před 15 dny

    It would be interesting to see the difference between LLMs trained on non-fiction, realistic fiction, and and fantasy.

  • @legathus
    @legathus Před 9 dny

    26:18 -- Those results are highly suspicious. I suspect there's a confirmation bias in the human scores. The workers don't want to lose their qualifications, and so won't perform tasks that are too difficult. So there may be a drop off in participation or submission if a human worker feels like they may be in error for the more complex tasks. Furthermore, the human workers were given "room to think", where the prompting of the LLMs suggest they were not. I suspect allowing GPT-4 to use step-by-step reasoning would improve its score across the board. And dramatically more so if its allowed to create a python script to solve the problem.

  • @SynaLinks
    @SynaLinks Před 16 dny

    Really good talk :)

  • @XalphYT
    @XalphYT Před 21 dnem +1

    25:06 I consider myself to be reasonably intelligent, but I am absolutely stumped by Problem No. 1. How are you supposed to evaluate the three blocks of letters below the alphabet? Are the two blocks on the first line supposed to serve as an example? Are you supposed to consider all three blocks together? Does the order of the blocks matter? I suspect that there is something implied here that I am missing.

    • @voncolborn9437
      @voncolborn9437 Před 21 dnem

      Read left to right on the blocks. Notice that the second block matches the alphabet with the jumbled letters. The second row is the test. Match the the similar sequence, replacing the 'l'.

    • @alexmolyneux816
      @alexmolyneux816 Před 20 dny

      fghij?

    • @NextGenart99
      @NextGenart99 Před 15 dny +2

      Nice try chat GPT

  • @lycas09
    @lycas09 Před 16 dny

    The tasks where llm fail are either useless (not trained on many data), or based on vision capabilities (where are a lot worst yet these systems)

  • @electric7309
    @electric7309 Před 21 dnem

    Melanie Mitchell, ILY

  • @mordokai597
    @mordokai597 Před 20 dny +1

    lol! the difference in performance between the humans doing the test for free and the people being paid is about the same jump in performance you get from chatgpt when you just give it a prompt vs if you tell it "i'll give you $20 if you do a good job" xD

  • @seventyfive7597
    @seventyfive7597 Před 16 dny +1

    So why did her methodology fail to work here? So we have to go to the basics, because you simply can't skip them:
    1) Humans are repetition machines, they repeat and recombine their experiences, you can see it ESPECIALLY in the arts, but there we call it inspirations, humans take "inspirations" from their life experiences, and recombine them.
    2) AI is the same, they too are repetition machines that recombine experiences, but their experiences are different from humans'.
    3) Hence, for comparison, you may not test humans on subjects that they have not experienced, and for AI you may not do the same. However her entire testing methodology was based on testing on experiences that only humans had.
    Basically, she almost got it when she said that a child learns to wear socks under the shoes by his experiences, but then did not narrow her tests to be based on common experiences of AI and humans, rendering them a curiosity of translation, but not of understanding.

  • @optmanii
    @optmanii Před 14 dny +1

    The AI understanding of the world is different from Human being.

  • @AlgoNudger
    @AlgoNudger Před 18 dny +2

    Lecun's too overrated in AI community. 🤭

    • @J_Machine
      @J_Machine Před 17 dny +3

      Nope

    • @AlgoNudger
      @AlgoNudger Před 15 dny

      ​@@J_MachineC'mon. 😂

    • @J_Machine
      @J_Machine Před 15 dny

      @@AlgoNudger u don't understand nothing about AI 🤦‍♂️

    • @AlgoNudger
      @AlgoNudger Před 9 dny

      ​@@J_MachineNow you sound like a stochastic parrot. 🤭

    • @J_Machine
      @J_Machine Před 9 dny

      @@AlgoNudger If there is a Stocastic parrot that must be you 😁😁😁😁

  • @netscrooge
    @netscrooge Před 19 dny

    Mitchell is great! Love her work. But Lecun's reckless, self-serving comments should not be elevated so high. It's like a TV news program hosting a flat-Earther to give both sides of the story.