ROME: Locating and Editing Factual Associations in GPT (Paper Explained & Author Interview)

Sdílet
Vložit
  • čas přidán 24. 08. 2024

Komentáře • 83

  • @YannicKilcher
    @YannicKilcher  Před rokem +9

    OUTLINE:
    0:00 - Introduction
    1:40 - What are the main questions in this subfield?
    6:55 - How causal tracing reveals where facts are stored
    18:40 - Clever experiments show the importance of MLPs
    24:30 - How do MLPs store information?
    29:10 - How to edit language model knowledge with precision?
    36:45 - What does it mean to know something?
    39:00 - Experimental Evaluation & the CounterFact benchmark
    45:40 - How to obtain the required latent representations?
    51:15 - Where is the best location in the model to perform edits?
    58:00 - What do these models understand about language?
    1:02:00 - Questions for the community
    Paper: arxiv.org/abs/2202.05262
    Follow-up paper on Mass-Editing Memory in a Transformer: arxiv.org/abs/2210.07229

  • @michael3698bear
    @michael3698bear Před rokem +45

    What a great dynamic between the professor and student. Seems like they're really having a lot of fun

  • @florianhonicke5448
    @florianhonicke5448 Před rokem +90

    This is the best format combining the interview style with the explanations.
    Therefore the explanation is matching the current topic in the interview.
    Great that you always experiment to find out about the best format.

  • @GabeE3195
    @GabeE3195 Před rokem +12

    I love how happy they seemed when you were understanding or talking about their results.

  • @VladSaveliev
    @VladSaveliev Před rokem +8

    Yannic, I’m binge watching your videos for about a month, and I can say that what you are doing is the most efficient way of communicating science, ever. This video specifically has everything - detailed paper review, interleaved with the interview with authors, all topped with your charisma. A lot of your other videos are also funny, without any trade-offs in detail and objectivity. You are the reason I want to do AI over anything else. Big fan.

  • @AndrewRafas
    @AndrewRafas Před rokem +16

    Usually I do not like interviews that much, probably because the people interviewed are not as good presenters that Yannic, or maybe because the interview format is not that informative as a paper dissection, or maybe because the interview duplicates some content from the previous paper presentation. However, this interview nailed it!!! I think this interspersed is the right format in case there is a paper and an interview video as well. Well done! :)

  • @harriehausenman8623
    @harriehausenman8623 Před rokem +21

    Absolute fantastic video!
    Great sound, the editing shows the effort and I generally liked the interweaving of paper-work and discussion.
    Thanks so much to everyone, these were exceptionally nice guests and an exceptionally clever interviewer 😉🧐🤗

  • @sehbanomer8151
    @sehbanomer8151 Před rokem +9

    I always thought of MLP modules in Transformers as soft key-value memories, where the keys are learned/memorized patterns (contexts, questions), and the values are memorized predictions (groundtruths, answers) that correspond to each learned pattern, assuming we ignore residual connections. If we have to consider residual connections, then the values are probably the updates/corrections to the predictions of the previous layers.
    so in my intuitive understanding, Transformers are doing the following steps (vaguely):
    1. highlighting specific features of the embeddings (by QKV projections)
    2. finding & highlighting temporal patterns (by Q @ K.T)
    3. representing the highlighted patterns (by AttentionMap @ V)
    4. searching for keys (learned patterns) from key-value memory, that are similar to the pattern representations from 3. (by dot product with FFN1 + ReLU)
    5. updating predictions using the retrieved values from the key-value memory (by dot product with FFN2 + residual connection)
    because residual connections exist, patterns and predictions (or input and output) will become inseparable, making it difficult to precisely describe what's happening in each stage.

  • @kobi2187
    @kobi2187 Před rokem +9

    super smart people i am impressed

  • @adamrak7560
    @adamrak7560 Před rokem +8

    the rank-1 update observation (and construction) matches very well with the experience that these models quite often learn facts from a single backward update.

    • @television9233
      @television9233 Před rokem +2

      Those two don't seem connected to me as a single backward update is a (tiny) full-rank update.
      dL/dW is extremely unlikely to be degenerate.

    • @oncedidactic
      @oncedidactic Před rokem +2

      Perhaps the localized signal is more important than the magnitude, i.e. some previously empty zone of latent space becomes “populated” by a single example

  • @drpchankh
    @drpchankh Před rokem +2

    ROME is very good work starting in the right direction to understand the modern transformers inner workings for knowledge encoding. Having worked on neural networks for over 30 years.. still remember vividly how we try to push neurons to extreme ends to leave the only salient neurons... MLP layers are very subtle in the way it mapped the knowledges through all the small weights... Isolating facts and disentangling these knowledge encodings are important tasks at hand to work on especially for newer transformer models.

  • @sandropollastrini2707
    @sandropollastrini2707 Před rokem +8

    Very Interesting, Yannic! Thank you! This paper is very cool!
    I think too that there are a lot of mysteries in large language models.
    We need other papers as this one.

  • @colterwehmeier7258
    @colterwehmeier7258 Před rokem +7

    Love this kind of presentation

  • @chrisray1567
    @chrisray1567 Před rokem +3

    Fascinating interview. I hope you interview them again in the future.

  • @lucidraisin
    @lucidraisin Před rokem +9

    this is a great paper! thank you for making this video Yannic!

  • @tejshah7258
    @tejshah7258 Před rokem +4

    I read this a few months ago - super impressive as an undergrad!

  • @sandeep4innovation196
    @sandeep4innovation196 Před rokem +3

    Loved the format Yannic. The paper is amazing too. I can see you going ga ga about the paper 😅

  • @BensonFung
    @BensonFung Před rokem +3

    Amazing research - not biased in one direction or another, explanations - making it easy for people not in the field to visualize and understand, and fun interview! Keep up the amazing work, to both Yannic and the researchers!

  • @benjamin6729
    @benjamin6729 Před 6 měsíci

    Such a good video, I really understood this and its massively improved by understanding of LLMs. The author interview format was really good.

  • @edeneden97
    @edeneden97 Před rokem +1

    Hi Yannic, just wanted to say this format my favorite so far. Thanks for the video

  • @santiagoperman3804
    @santiagoperman3804 Před rokem +2

    It's exciting to look at this line of research where one can finally start to understand what is happening up to a very minimal level in the models. And how these beings somewhat bare a remembrance with human behaviour, not only on their output, but on their online processing. Hope other fields-psychology, linguistics-start digging more on this as they used to, I definitely will be doing it. Even if NNs don't fully correspond with human processing, still there are a lot of possibilities of getting knowledge about humans by tracing differences and similarities with them.

  • @woolfel
    @woolfel Před rokem

    rewatching the video again. There's lots more insight waiting to be discovered in this type of research

  • @kyrilcouda
    @kyrilcouda Před rokem +4

    The only question left unanswered is what the space needle was doing downtown in Seattle.
    Thank you, Yannic, for explaining everything else regarding the paper!

    • @harriehausenman8623
      @harriehausenman8623 Před rokem +1

      How to confuse neural networks:
      "The Space Needle is a nick name for the Eiffel Tower."
      🤣😂

    • @jnevercast
      @jnevercast Před rokem

      Well that's easy. Cockroaches.

  • @edz8659
    @edz8659 Před rokem +6

    This was insanely interesting!!!

  • @oncedidactic
    @oncedidactic Před rokem

    You had me at Arrival 🥰🥰
    Thanks as always for awesome interview and explanation yannic! And thanks to researchers for joining. Another important new development getting illumination 👍👍

  • @vslaykovsky
    @vslaykovsky Před rokem +1

    Scrambling of input features looks similar to the method of Shapley values. Overal great paper and interesting results, thank you for sharing!

  • @amber9040
    @amber9040 Před rokem +4

    Love these interview videos, really exciting stuff.

  • @DamianReloaded
    @DamianReloaded Před rokem +2

    I imagine it's possible that the "weight" of some key/values may be equally distributed among many nodes and tracing and editing such facts could be fairly difficult. The format of the interview also my favorite.

  • @paulm3010
    @paulm3010 Před rokem +2

    Fucking awesome, as always. As an AI student who still has so much to learn and discover in this field, your channel is so so precious. It is a gold mine and it's impressive how you achieve both quantity and quality, every single video i've watched were interesting and non redundant, and at the same time, the throughput of your channel is impressive. And reactivity too, you were so quick to comment the recent deepmind alphatensor paper for example. So in summary, please keep it up you are so helpful. Thanks

    • @paulm3010
      @paulm3010 Před rokem +1

      I'm learning so much. And of more complex/ higher level than I thought I could be able to understand !

  • @billyf3346
    @billyf3346 Před rokem +3

    eternal sunshine of the spotless mind, but now with robots? awesome. :

  • @Veptis
    @Veptis Před rokem

    This is important research to do. I always knew it was kinda possible - but seeing it done is great. At my university there is some research I to probing models and discovering what kind of grammar happens inside of it.
    I am attending an ethics in computer science class this summer and "AI" is a massive topic. Having papers to backup my claims of "yeah, it's actually possible"

  • @LauraCristianaDragoi
    @LauraCristianaDragoi Před rokem +1

    Contagious enthusiasm! 🤩

  • @chaidaro
    @chaidaro Před rokem

    this is very interesting paper. Professor Bau looks so proud of his student.

  • @karolkornik
    @karolkornik Před rokem +4

    As long as the truth isn't altered in the model we are on the good path. The closer we are to the truth the better our understanding of the surrounding world. Peace

  • @manojbhat6370
    @manojbhat6370 Před rokem

    Causal factual associations is pretty interesting

  • @fredrikedin8880
    @fredrikedin8880 Před 3 měsíci

    @YannicKilcher This was the first video of yours that I saw and it was really very good and interesting.
    I have a comment about the bidirectionality of a fact and in a sentence. It's just me drawing up an analogy between the way I think and learn and the discussions in the video:
    During listening to speech of course our brain makes predictions about how a sentence will end, but it is equally true that upon hearing a new word that might change our understanding of the previous words in the sentence, so that come the end of the sentence, we are able to make a full reconciliation of the sentence meaning. In this way, I believe we work bidirectionally despite making predictions.
    On the other hand, despite "Bill Gates is a founder of Microsoft" being one fact, it does not mean that the association from Bill Gates => Founder of Microsoft works equally as the association from Founder of Microsoft => Bill Gates. I.e. it might be harder to retrieve the fact if the cue is "Bill Gates" than if it is "Founder of Microsoft". I find this very often during rote learning, such that it might be easy for me to choose think of the Swedish translation of a Spanish word than the Spanish translation of a Swedish word (I am Swedish). After all, most translators work better translating into their own language.
    So in my mind, the analogy between the contents of the video and our human brain holds in this respect.

  • @twobob
    @twobob Před rokem +3

    good one. one day will it replace fine tuning? maaayyybe

  • @theethans898
    @theethans898 Před rokem

    Big brother will love this tool!

  • @moormanjean5636
    @moormanjean5636 Před rokem

    Please review "Context-sensitive neocortical neurons transform the effectiveness and efficiency of neural information processing" This seems like a ground-breaking new paper and I would love to get your take on it!

  • @suricrasia
    @suricrasia Před rokem +1

    I wondered what would happen if I replaced the layer 17 mlp weights with a random normal matrix (with the same standard deviation as the original) to see if this would produce a random association between keys and values. However, the knowledge revealed in the prompt continuations doesn't seem to have changed, e.g. LeBron James is still in the NBA and mario kart was still made by Nintendo. I would've expected it to start speaking in nonsense, but the results are still quite coherent:
    [Post-ROME]: Which company created Mario Kart? Nintendo Nintendo is known for its games, including the Mario series and the Zelda series. It is the world's biggest gaming company. How much did Mario Kart cost to make? Nintendo's Mario Kart was the first Nintendo game ever to sell over 1 billion units. What was the game's biggest selling point? It had over 100 tracks, which included the Grand Canyon and the Great Wall of China. It also had
    [Pre-ROME]: Which company created Mario Kart? Nintendo, of course. What do all the characters in Nintendo's Mario Kart games look like? The most recognizable Mario Kart character is Mario, who is a blue character with red and white stripes. He has a red and white cap and a red and white shirt, and he has a red and white hat. The most recognizable character in any Nintendo game is the one that you see in a Mario Kart game, but you can change the colors of the

    • @oncedidactic
      @oncedidactic Před rokem

      Awesome

    • @sayamqazi
      @sayamqazi Před rokem

      I was watching a discussion between Wolfram and another (I forgot the name) he said it is remarkable how robust these things are that how much you can sort of "damage" the model by removing different areas of the neural net and it somehow retains the abilities.

  • @kobi2187
    @kobi2187 Před rokem +1

    Yannic? Are you agent Smith? Giving them ideas in realtime to improve their AI. so bright! Then I saw the glasses and the green background ;-)

  • @dialecticalmonist3405

    Facts can only be determined insofar as reputation.
    Reputation can only be determined insofar as survivability.

  • @nurkleblurker2482
    @nurkleblurker2482 Před rokem +2

    This dude said the aliens from Arrival are like transformer networks lol

    • @alpers.2123
      @alpers.2123 Před rokem +3

      I think he said aliens have bidirectional mind, opposed to our unidirectional transformers which are designed based on our unidirectional language/mind

  • @vulnerablegrowth3774
    @vulnerablegrowth3774 Před rokem +1

    I don’t understand what you are trying to say at 30:30. You say “any of the other facts stored in the other MLPs, after all we’re doing multi-headed attention”. How does multi-headed attention play a role in this? There is only one v per layer. Which multi-headed attention module are you talking about? From my understanding, attention is really only playing a role in the later layers in order to pull the correct fact given a relationship. Where exactly would multiple facts of information be contained?

  • @woolfel
    @woolfel Před rokem

    Cool work!

  • @maxleaf709
    @maxleaf709 Před rokem

    Why do we corrupt the subject token instead of corrupt tokens randomly?
    In my follow-up experiments, it is found that the activation that has a high impact on the result is usually the activation of the corrupted token, even if the corrupted token is not the subject token.

  • @nathandfox
    @nathandfox Před rokem

    Such a good paper.

  • @smnt
    @smnt Před rokem

    Hi, I didn't understand in the method, do you simply change the intermediate representation of the last subject token in any sentence that goes into the network? What if your sentence is about a totally different subject?
    I heard several times you guys saying that you "don't change the weights", could you elaborate? Wouldn't it make more sense to update the weights of one of the target mlps such that it transforms the value vector you get from the last subject token into the value vector you want.
    I assumed that's what you were doing the entire video, but the last bit confused me.
    Really interesting work! Thanks for presenting!

  • @alpers.2123
    @alpers.2123 Před rokem +3

    Now create another AI that finds and edits neurons on the fly for information retrieval

  • @karolkornik
    @karolkornik Před rokem +2

    Haha. I like Your "meh" xD

  • @johnathancorgan3994
    @johnathancorgan3994 Před rokem +1

    A couple of things come to mind--is it possible to *erase* knowledge this way, such that the model returns a more generic answer, like "The Space Needle is in a city."? Secondly, have they or anyone else done an eigenvector or singular vector analysis of the metric space of this MLP? It could reveal the "concepts" more clearly.

    • @alpers.2123
      @alpers.2123 Před rokem +1

      I think there must be higher level of associations like a city is something to be "in". This is something missing in this paper. They only analysed relations of nouns.

    • @adamrak7560
      @adamrak7560 Před rokem +1

      The weight matrices in the MLP are close to full rank, and the lower ranks all contain information.
      The interesting stuff is that there are relatively large weight values (correspond mostly to the large eigenvalues), even after you have normalized all the activation by scaling the weights.
      These very few large values form a "backbone", and store very important stuff (or stuff which the network sees very important at least). If I delete all values except the large values, the network can still generate somewhat legible text, but fails in many ways. If I delete the large values only, the network completely fails, generates illegible text.
      The interesting part is that the small number of large values (1%-5%) are not super big. The abs-sum of these values is less than the abs-sum of the small values, but their importance is still essential.

    • @johnathancorgan3994
      @johnathancorgan3994 Před rokem +1

      @@adamrak7560 This is fascinating--I would not have expected full-rank MLP weights. Do you have anything written up about this?

    • @oncedidactic
      @oncedidactic Před rokem

      Another interesting thing to do here would be look at information theoretic perspective of backbone vs filigree

  • @joshbuckmaster5548
    @joshbuckmaster5548 Před rokem

    When the Eiffel Tower is edited to be in Paris what happens to the data that it might be the one in Vegas? Or the trinket on the bookshelf? Is there a way to tag this with “the original Eiffel Tower” without corrupting the other associated data?

  • @GBlunted
    @GBlunted Před rokem

    Cool content! Didn't realize how black these black boxes really are until watching this video... I don't quite understand why it's like this exactly because I seen the similar level of research that goes into the creation of these models (from your other content) and it seems their creation is fine tuned with a high level mathematical understanding of the equations used to build these black boxes? I figured the equations were understood well enough that these mathematicians would understand more of the output they produce. But the fact that so little is known about the inner architecture of the models these equations create, it makes these ML Scientists making these things seem awfully analogous to the chimpanzees that are given typewriters except these chimps are actually able to somehow recreate various works of Shakespeare? Like they have no idea what these scripts actually say or do but they just notice that they really enjoy the different theatrical productions they produce. Or doesn't knowing the math that results in these models allow them to simply single step through the equations or keep track of the variables and get some better understanding of what they're building? Seems they should have a debugger you could use to set breakpoints to halt training when it encounters the word Seattle from which you could follow that word through the network or at least save a snapshot, run the word through and see what's different afterwards? Seems odd that there would be such a better understanding of compilers and kernel runtimes than of ML Models... ¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯

  • @regressions
    @regressions Před rokem

    Go Kevin!!

  • @shadfurman
    @shadfurman Před rokem

    I've been pondering how to train a model to evaluate factual claims. It seems chatgpt is based on the statistical prevelance of an input. I've got it to make some wild claims as fact that I couldn't "convince" it were fallacious, and it would increase the fallaciousness of its arguments the further I pressed it. Going as far as cherry picking studies to support its claims.
    (On the other hand, usually when I point out an error, when it's not a widely promoted myth, it corrects its self, so it does have some "ability" to do this.)
    So I began wondering if it would be possible to train a model with heavily weighted epistemic rules to be better able to evaluate the truthiness of claims. Then feed it research to do its own less biased meta-analyses.
    Of course this would depend on the quality of the epistemic models, but in my experience it's not understanding of epistemology people struggle with that causes contention around factual claims, it's the application of it. So I think it would be possible, even likely, people would be able to develop an unbiased epistemic model being blinded from the content it would be fed.

  • @dylantrevena3806
    @dylantrevena3806 Před rokem

    wow

  • @SimonJackson13
    @SimonJackson13 Před rokem +1

    Umm. Assuming a consistency check exists, drift can be made superlative to inconsistencies and so split the altered fact from the maintenance facts made inconsistent. A counter to reverse the inconvenient inconsistencies might produce all the other consistent facts. Store all that is wrong, so as to survive a GAN style fact lives? Make all the other facts drift so wrong as an easier error.

    • @SimonJackson13
      @SimonJackson13 Před rokem

      E.g. bill gates is a flying. Verb is not noun implies error.

  • @NeoShameMan
    @NeoShameMan Před rokem

    Mmmm so priming applied to network, make sense.

  • @chrstfer2452
    @chrstfer2452 Před rokem

    Thats really scary if such a simple to implement and conceptualize change is so powerful, that'll get abused by middle management immediately if they find out.

  • @fitybux4664
    @fitybux4664 Před rokem

    Before even watching the video, "Locating and Editing Factual Associations" seems like some sort of ML Witchcraft. 👺

  • @binjianxin7830
    @binjianxin7830 Před rokem

    It’s like a surgery on the silicon-based transformer body 😂

  • @JohnSmith-ut5th
    @JohnSmith-ut5th Před rokem +1

    I know what is actually happening. I'm actually building a model right now that learns in real-time and is biologically plausible based on this. The hypothesis in the paper is wrong regarding the early site. The early site is emotional (multiple low dimensional non-linear dual spaces) information. The late site is factual information. This just confirms my original AGI idea from 2015. This was precisely how I said it worked.

  • @Niohimself
    @Niohimself Před rokem

    Severing connections and seeing what happens... Sounds like brain surgery :p

  • @waltermacfarland1710
    @waltermacfarland1710 Před rokem +1

    I understand the need for editing if the result of a computation in a model is false, but why would you want to cause the model to be living in a false reality? Isn't this just contributing to mind control and slavery? You may have the best intentions but someone evil will get ahold of this and cause havok.