Specification Gaming: How AI Can Turn Your Wishes Against You

Sdílet
Vložit
  • čas přidán 30. 11. 2023
  • When we specify goals for AIs, we must ensure that our specifications truly capture what we want. Otherwise, the behavior of AI systems will be different from what we want them to do. This can be catastrophic in high-stakes situations and at high levels of AI capability. If you watched our video "The Hidden Complexity of Wishes", you'll recognize these problems as the same kind of failure.
    If you’d like to skill up on AI Safety, we highly recommend the AI Safety Fundamentals courses by BlueDot Impact at aisafetyfundamentals.com
    You can find three courses: AI Alignment, AI Governance, and AI Alignment 201
    You can follow AI Alignment and AI Governance even without a technical background in AI. AI Alignment 201, instead, presupposes having followed the AI Alignment course first, and equivalent knowledge as having followed university-level courses on deep learning and reinforcement learning.
    The courses consist of a selection of readings curated by experts in AI safety. They are available to all, so you can simply read them if you can’t formally enroll in the courses.
    If you want to participate in the courses instead of just going through the readings by yourself, BlueDot Impact runs live courses which you can apply to. The courses are remote and free of charge. They consist of a few hours of effort per week to go through the readings, plus a weekly call with a facilitator and a group of people learning from the same material. At the end of each course, you can complete a personal project, which may help you kickstart your career in AI Safety.
    BlueDot impact receives more applications that they can take, so if you’d still like to follow the courses alongside other people you can go to the study-buddy channel in the AI Alignment Slack. You can join by clicking on the first entry on aisafety.community
    You could also join Rational Animations’ Discord server at discord.gg/rationalanimations, and see if anyone is up to be your partner in learning.
    #ai #aisafety #alignment
    ▀▀▀▀▀▀▀▀▀SOURCES & READINGS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    9 Examples of Specification Gaming by @RobertMilesAI: • 9 Examples of Specific...
    Specification gaming: the flip side of AI ingenuity by Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik et al. (2020): www.deepmind.com/blog/specifi...
    Learning from Human Preferences by Paul Christiano, Alex Ray and Dario Amodei (2017): openai.com/blog/deep-reinforc...
    Learning to Summarize with Human Feedback by Jeffrey Wu, Nisan Stiennon, Daniel Ziegler et al. (2020): openai.com/blog/learning-to-s...
    What failure looks like by Paul Christiano (2019): www.alignmentforum.org/posts/...
    The alignment problem from a deep learning perspective by Richard Ngo, Soeren Mindermann and Lawrence Chan (2022): arxiv.org/abs/2209.00626
    The Hidden Complexity of Wishes: • The Hidden Complexity ...
    ▀▀▀▀▀▀▀▀▀PATREON, MEMBERSHIP, KO-FI▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    🟠 Patreon: / rationalanimations
    🟢Merch: crowdmade.com/collections/rat...
    🔵 Channel membership: / @rationalanimations
    🟤 Ko-fi, for one-time and recurring donations: ko-fi.com/rationalanimations
    ▀▀▀▀▀▀▀▀▀SOCIAL & DISCORD▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    Discord: / discord
    Reddit: / rationalanimations
    Twitter: / rationalanimat1
    ▀▀▀▀▀▀▀▀▀PATRONS & MEMBERS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    Alcher Black
    RMR
    Kristin Lindquist
    Nathan Metzger
    Monadologist
    Glenn Tarigan
    NMS
    James Babcock
    Colin Ricardo
    Long Hoang
    Tor Barstad
    Gayman Crothers
    Stuart Alldritt
    Chris Painter
    Juan Benet
    Falcon Scientist
    Jeff
    Christian Loomis
    Tomarty
    Edward Yu
    Ahmed Elsayyad
    Chad M Jones
    Emmanuel Fredenrich
    Honyopenyoko
    Neal Strobl
    bparro
    Danealor
    Craig Falls
    Vincent Weisser
    Alex Hall
    Ivan Bachcin
    joe39504589
    Klemen Slavic
    Scott Alexander
    noggieB
    Dawson
    John Slape
    Gabriel Ledung
    Jeroen De Dauw
    Craig Ludington
    Jacob Van Buren
    Superslowmojoe
    Michael Zimmermann
    Nathan Fish
    Bleys Goodson
    Ducky
    Bryan Egan
    Matt Parlmer
    Tim Duffy
    rictic
    marverati
    Luke Freeman
    Dan Wahl
    leonid andrushchenko
    Alcher Black
    Rey Carroll
    William Clelland
    ronvil
    AWyattLife
    codeadict
    Lazy Scholar
    Torstein Haldorsen
    Supreme Reader
    Michał Zieliński
    ▀▀▀▀▀▀▀CREDITS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
    Writer: :3
    Producer: :3
    Line Producer and production manager:
    Kristy Steffens
    Animation director: Hannah Levingstone
    Quality Assurance Lead:
    Lara Robinowitz
    Animation:
    Michela Biancini
    Owen Peurois
    Zack Gilbert
    Jordan Gilbert
    Keith Kavanagh
    Ira Klages
    Colors Giraldo
    Renan Kogut
    Background Art:
    Hané Harnett
    Zoe Martin-Parkinson
    Hannah Levingstone
    Compositing:
    Renan Kogut
    Patrick O'Callaghan
    Ira Klages
    Voices:
    Robert Miles - Narrator
    VO Editing:
    Tony Di Piazza
    Sound Design and Music:
    Johnny Knittle
  • Věda a technologie

Komentáře • 588

  • @RationalAnimations
    @RationalAnimations  Před 6 měsíci +98

    If you’d like to skill up on AI Safety, we highly recommend the AI Safety Fundamentals courses by BlueDot Impact at aisafetyfundamentals.com
    You can find three courses: AI Alignment, AI Governance, and AI Alignment 201
    You can follow AI Alignment and AI Governance even without a technical background in AI. AI Alignment 201, instead, presupposes having followed the AI Alignment course first, and equivalent knowledge as having followed university-level courses on deep learning and reinforcement learning.
    The courses consist of a selection of readings curated by experts in AI safety. They are available to all, so you can simply read them if you can’t formally enroll in the courses.
    If you want to participate in the courses instead of just going through the readings by yourself, BlueDot Impact runs live courses which you can apply to. The courses are remote and free of charge. They consist of a few hours of effort per week to go through the readings, plus a weekly call with a facilitator and a group of people learning from the same material. At the end of each course, you can complete a personal project, which may help you kickstart your career in AI Safety.
    BlueDot impact receives more applications that they can take, so if you’d still like to follow the courses alongside other people you can go to the #study-buddy channel in the AI Alignment Slack. You can join by clicking on the first entry on aisafety.community
    You could also join Rational Animations’ Discord server at discord.gg/rationalanimations, and see if anyone is up to be your partner in learning.

    • @pyeitme508
      @pyeitme508 Před 6 měsíci +1

      Cool

    • @ChemEDan
      @ChemEDan Před 6 měsíci +1

      How do natural brains mitigate these problems? If a solution exists, surely 4 billion years of evolution has arrived at it already, even if imperfect. In hindsight, this is a snuck premise in the "merging" approach.

    • @alto7183
      @alto7183 Před 6 měsíci

      Buen video que bueno que no hay trolls que contesten con video que es un algoritmo, ley doble cero de la robótica de el entendimiento mutuo entre especies inteligentes biológicas y robots también, lobo dc y constantine dúo a la fuerza por castigo del creador por lo que ambos han hecho como video hell raizer ozzy osbourne ambos, garfield y sus amigos hada madrina concediendo deseos a lo wey chicle y pega etc etc.

    • @de_g0od
      @de_g0od Před 6 měsíci

      at 1:49, you give "outer alignment" as an example for a similar phenomenon to specification gaming. Isn't inner alignment more correct in this case? As I understand it, inner alignment is if you go to an ai and ask it to "fix poverty" so it blows up the world, whilst outer alignment is you go to an ai and ask it to "blow up the world" so it blows up the world. With inner alignment it doesn't do what the prompter really wants, whilst with outer alignment it does but it doesn't do what the rest of the world wants it to do.

    • @de_g0od
      @de_g0od Před 6 měsíci

      @@ChemEDan i think the issue is that the brain is already aligned to the interests of the brain, but AI isn't aligned to the brain.

  • @cryogamer9307
    @cryogamer9307 Před 6 měsíci +264

    Fooling the examinator into thinking you know what you're doing, because its easier, really is the most human thing i've ever heard an ai do.

    • @flakey-finn
      @flakey-finn Před 4 měsíci +31

      Yeah, because its reward system works with the same general principals as animals' (by that, I also include humans). If you can get the same amount of food(aka reward) by doing something simpler. We are literally training a AI the same way we train animals lol

    • @flyhighflyfast
      @flyhighflyfast Před měsícem +1

      and thats how we train our children as well

  • @Mysteroo
    @Mysteroo Před 6 měsíci +547

    Interestingly, people do the same thing. We’ve got our own “training regimens” built into our own brain. We cheat these systems all the time - to our own detriment.
    E.g. We cheat the system designed to give us nutrients by eating sugary candy we make for ourselves, rather than the fruits that our sugary affections were designed to draw us towards.
    Much like machines, we’d rather reap cognitive rewards than actually accomplish the goals placed there to benefit us

    • @user-qm4ev6jb7d
      @user-qm4ev6jb7d Před 6 měsíci +116

      I'm already imagining a scientist looking at a virtual city built by AIs, and exclaiming: "Wait... is that an entire factory for mass-producing REWARD HACKS?! Are you telling me, you're just... making these things... for MONEY?!"
      Meanwhile, from the AI's perspective: "What? It's just a candy factory, what's wrong with that?"

    • @rhysbaker2595
      @rhysbaker2595 Před 6 měsíci +60

      Thats actually a wonderful analogy, we hack our own rewards all the time and nobody thinks its bad. Why would an AI have any issues with hacking its own rewards?

    • @terdragontra8900
      @terdragontra8900 Před 6 měsíci +12

      But there isnt a "goal placed to benefit us", evolution didnt optimize us to be benifited (hard to define exactly what even counts as a benefit), it optimized us to be good at spreading. What you are describing is us being optimized for a different environment than the one we are in now.

    • @rhysbaker2595
      @rhysbaker2595 Před 6 měsíci +41

      @@terdragontra8900 well, one way to train an AI emulates evolution. In those situations you set a reward function. At the end of every generation, the ones who maximised that reward function the best will "reproduce". If we draw a parallel to humans, and all life for that matter, we can say that our reward function is to reproduce. Anything that gets in the way of that is disincentivised. Anything that helps, is incentivised.
      Eating a balanced diet keeps us alive. We can't reproduce if we are dead, after all. Part of that diet includes fruits. Fruits have sugars in them. Because we like sugar, we eat fruit. Because we eat fruit we get a balanced diet and live another day.
      But humans were able to hack that reward function and put sugar into other things that aren't fruit.
      We still get the reward (dopamine) but without the utility (nutrients)

    • @terdragontra8900
      @terdragontra8900 Před 6 měsíci +12

      @@rhysbaker2595 Ah yes, i agree with all that. All I want to say is that getting nutrients is an instrumental goal of evolution (because it makes us more likely to reproduce), and the fact that something is a goal of evolution doesnt automatically mean that morally, it ought to be a goal of yours. Of course, in this particular case most people value being alive longer (having depression I don't in particular to be honest)

  • @ErikratKhandnalie
    @ErikratKhandnalie Před 6 měsíci +442

    People talk about how human assessment is a leaky proxy for human goals, but never want to talk about how corporate profits are an *incredibly* leaky proxy for goals relating to human wellbeing.

    • @luiginotcool
      @luiginotcool Před 6 měsíci +48

      You’re in the wrong circles if nobody is talking about that brother

    • @kevinscales
      @kevinscales Před 6 měsíci +27

      If you want an academic critique on capitalism and haven't yet found anyone providing that, you are not trying very hard to search. Goal specification being leaky is in plenty of fiction (stories of genies and such) but is not a common academic discussion at all.

    • @ultimaxkom8728
      @ultimaxkom8728 Před 6 měsíci +32

      Since when corporations' goals is related to human wellbeing?

    • @Wol333
      @Wol333 Před 6 měsíci +23

      Corporate profits has absolutely nothing to do with human wellbeing.

    • @ErikratKhandnalie
      @ErikratKhandnalie Před 6 měsíci +18

      @@Wol333 my point exactly

  • @dogweapon3748
    @dogweapon3748 Před 6 měsíci +118

    My primary concern about the implementation of AI in business models is that monetary gain is, itself, a leaky goal- one which has historically been specification gamed since long before computers were able to do so at inhuman scale. There may very well be many humane uses for it in those settings, but there will be thousands more exploitative ones.

    • @Coecoo
      @Coecoo Před 5 měsíci +2

      The thing about current AI models is that they're dumb as rocks. The more stupid an AI is, the more prone they are to making stupid decisions. This video is basically going over problems that are realistically only applicable to fairly rudimentary AI model training specifically and then doing a substantial logical fallacy leap by assuming that specification gaming scales linearly with all AI when that is simply not the case.
      Any given command or "goal" put forward to any remotely reasonably intelligent artificial intelligence model such as "save my grandmom from this burning house" uses a very important element in decision making which is called context.
      It requires understanding of what everything is (like fire, a grandmom or a house), what the consequences are for their interaction (fire bad for humans and most things really) and the best course of action is (firefighting 101).
      TL;DR: Once you give AI more than half a brain cell, they are more than capable of understanding what you really want in any given situation even if you are vague or can be misinterpreted.

  • @generalrubbish9513
    @generalrubbish9513 Před 6 měsíci +41

    Someone else might've mentioned this before, but there's a browser game called "Universal Paperclips" where you play as an AI told to make paperclips. The goal misalignment happens because you're never told when to STOP making paperclips. You start off buying wire, turning it into paperclips, selling the paperclips and buying more wire to make more paperclips, then proceed to manipulate your human handlers to give you more power and more control over your programming, and end up enslaving/destroying the human race, figuring out new technologies to make paperclips out of any available matter, processing all of Earth into paperclips (using drones and factories also made out of paperclips), reaching out into space to convert the rest of the matter in the solar system into paperclips, and finally, sending out Von Neumann probes (made of paperclips) into interstellar space to consume all matter in the universe and convert it into, you guessed it, more paperclips. All because the humans told you to make paperclips and never told you when to stop.

    • @gordontaylor2815
      @gordontaylor2815 Před 5 měsíci +6

      Universal Paperclips seems to have been directly inspired by Rob Miles' own "stamp collector" example that he put out on Computerphile many years ago.

    • @AverageConsumer-uj8sm
      @AverageConsumer-uj8sm Před 3 měsíci +2

      "Make cookies"

  • @smitchered
    @smitchered Před 6 měsíci +165

    4:32 I think points toward a wider problem at how the AI safety community tends to frame "deceptive alignment". Imo words like "fool the humans" and "deceive" and "malignant AI" point newcomers who haven't made up there mind yet into the direction of Skynet or whatever, which makes them much more likely to think of this as wild sci-fi fantasies. I think these words, whilst still accurate insofar as we are treating AIs as agents, anthropomorphize AI too much, which makes extinction by AI look more to the general public like a sci-fi fantasy than the reality of the universe which is that solving certain math problems is deadly.

    • @user-qm4ev6jb7d
      @user-qm4ev6jb7d Před 6 měsíci

      Well, humans get "fooled" or "deceived" by non-intelligent things all the time, even by non-living ones. It's perfectly ordinary parlance to say that someone got "deceived" by an optical illusion which just formed naturally, from a weirdly-shaped shadow. I wouldn't call that antropomorphization.
      The only difference between that and an AI, is that AIs can *get good at* deceiving (optimized for it).

    • @Frommerman
      @Frommerman Před 6 měsíci +30

      I've found another way to talk about this which doesn't have this problem. It turns out there is an already existing example of a system with goals, made by humans but not designed or understood by us, which is able to react to our attempts to curtail undesirable behavior from it in frequently lethal ways. A system which often convinces people it is doing what we want it to do while actively endangering all long-term human values, is capable of twisting all the information we consume to its benefit, and which has no identifiable brain with which to do any of this.
      This system is called capitalism. People don't often anthropomorphize markets, but when you mash enough of them together they absolutely behave like goal-seeking agents. Right now, that goal is making stock prices increase no matter the cost to humanity. Because its specification for success, the thing which we reward the system for and which rewards those with the most influence over the system, is making stock prices go up. It's not a human, nor is it thought of as one despite being composed of them, but it defends itself from any attempt to curtail its goals through propaganda, murdering labor union members and revolutionaries, and the construction of walled gardens within which such ideas can be sidelined or removed. It's an intelligence, and an obviously and fundamentally inhuman one, which is literally burning the biosphere it exists within because it is gaming its reward function so hard that's one of the last resources it hasn't fully tapped out yet.

    • @de_g0od
      @de_g0od Před 6 měsíci

      @@Frommerman czcams.com/video/L5pUA3LsEaw/video.html

    • @RorikH
      @RorikH Před 6 měsíci +18

      @@Frommerman Also politics. Politicians are theoretically supposed to win popularity by making policies to benefit their constituents, but in practice just need to benefit rich donors who will give them money to buy popularity through advertising, or just engage in culture war BS that gets their voters angry enough to vote for policies that have absolutely no benefit to them.

    • @Frommerman
      @Frommerman Před 6 měsíci +6

      @@RorikH That's one of the ways the Capitalist Ouroboros defends itself too. Buying politicians makes the number go up extremely quickly, and when the number is high enough you get...well, modern political parties. Almost all of them.

  • @Deltexterity
    @Deltexterity Před 6 měsíci +152

    as someone on the spectrum, "task miss-specification" is just what being autistic feels like

    • @foolofdaggers7555
      @foolofdaggers7555 Před 5 měsíci +24

      Fellow autism haver here. I agree with this comment and you can officially consider it peer-reviewed.

    • @Blasterfreund
      @Blasterfreund Před 5 měsíci +17

      peer review seconded. It's incredible how few statements people think they need to make to approximate their task-related utilities to me.

    • @Temari_Virus
      @Temari_Virus Před 5 měsíci +18

      Thirded. Really hate it when people's phrasing leaves ambiguity for multiple reasonable ways of doing things and you just have to guess what they actually wanted

    • @RTMonitor
      @RTMonitor Před 5 měsíci +5

      a bean owo

    • @Deltexterity
      @Deltexterity Před 5 měsíci +3

      @@RTMonitor what?

  • @Winium
    @Winium Před 6 měsíci +133

    This also happens with humans. Perverse incentives happen all the time in real life, especially in companies. I think studying this can help even human organizations.

    • @Dave_of_Mordor
      @Dave_of_Mordor Před 5 měsíci

      But aren't companies like that for legal reasons?

    • @peppermintgal4302
      @peppermintgal4302 Před 5 měsíci

      ​@@Dave_of_Mordor The very structure of a corporation produces perverse incentives, because corporations were planned around enrichment in the first place. They're an adaptation of colonial and feudal enterprises financed by aristocrats to benefit those aristocrats and whoever organized the pitch. Any laborers, then, signed on to the enterprise, are there ultimately on a quid pro quo basis, and the strongest motivating quid pro quo, and thus the one the employing parties will be most likely to appeal to, is _help surviving._
      This means that corporations are incentived to seek employees with precarious financial situations --- this is itself a perverse incentive in their part, and puts employers in a situation of great moral hazard. They can negotiate such employees down in their demands, because their employees will be desperate for reward, and this will make achieving the goals of the institution's controlling members more achievable. This is just the BEGINNING of how corporate structure by definition produces perverse incentives.
      Tho sometimes, yes, legal systems can enter the picture, and do so quite often. But a corporation can maintain this structure even in power vacuums sometimes, and if it does so, it will still produce perverse incentives. (In fact, it might itself _produce_ a legal structure by graduating from corporation to a de facto government.)

    • @hollisspear6278
      @hollisspear6278 Před 2 měsíci

      I'm thinking the same thing as I drive to an office building every morning, swipe my badge, grab a cup of coffee, then return home to log in before the coffee has cooled.

  • @IceMetalPunk
    @IceMetalPunk Před 5 měsíci +12

    RLHF has another issue beyond just "the AI can learn to fool humans": in contrast to how bespoke reward functions often underconstrain the intended behavior, RLHF can often overconstrain it. We hope that human feedback can impart our values on the AI, but we often unintentionally encode all kinds of other information, assumptions, biases, etc. in our provided rewards, and the AI learns those as well, even though we don't want them to.
    Consider the way we use RLHF on LLMs/LMMs now, to fine-tune a pretrained model to hopefully align it better. We give humans multiple possible AI responses to a prompt, ask them to rank them from best to worst, then use those rankings to train a reward model which then provides the main model with a learned reward function for its own RL. Except, when you ask humans "which of these responses is better?", what does that mean? When people know you're asking about an AI, many times there will be bias towards what their preconceived notion of "what an AI should sound like". LLMs with RLHF often provide more formal and robotic responses than their base models as a result, which probably isn't a desirable behavior.
    On a more serious level, if the humans you ask to give the rankings have a majority bias in common, that bias will get encoded into the rewards as well. So if most of your human evaluators are, say, conservative, then more liberal-sounding responses will be trained out; and vice-versa. If most of your human evaluators all believe the same falsehood -- like, say, about GMOs or vaccines or climate change or any number of things that are commonly misunderstood -- that falsehood will also be encoded into the rewards, leading to the AI being guided *towards* lying about those topics, which is antithetical to the intention of alignment.
    Basically... humans aren't even aligned with *each other,* so trying to align an AI to some overarching moral framework by asking humans is impossible.

  • @MediaTaco
    @MediaTaco Před 5 měsíci +12

    Honestly fun videos like these is what learning SHOULD be

  • @I_KnowWhatYouAre
    @I_KnowWhatYouAre Před 6 měsíci +14

    This is why I always make the argument that we should work backwards. Specify conditions that revolve around safety. As you slowly work towards defining the goal, you can patch more and more leaks before they can even appear. Then work forwards to deal with things you missed. It’s not perfect but it’s better than chasing every thread as they appear imo. For example in the paperclip maximizer: define a scenario in which you fear something will go wrong, and add conditions you believe will stop them. See what it does, redefine, repeat until sound. Then step back again. Define a scenario that could lead to the previous scenario. See what it does, redefine, repeat, etc.

    • @I_KnowWhatYouAre
      @I_KnowWhatYouAre Před 6 měsíci +2

      It’s also why we need hard limits on ai -Such as not allowing it to control government- and need to have systems to double check solutions, like rotating the camera in the grabber example

    • @dr.cheeze5382
      @dr.cheeze5382 Před 2 měsíci

      ​@@I_KnowWhatYouAre
      Nice idea, but this is exactly what they talked about in the previous video.
      The reality is that there is an infinite amount of exceptions and rules you would need to add, unless you provided the ai with literally all of human mortality and even then, there would still be leaks.

    • @bulletflight
      @bulletflight Před 15 dny

      ​@@dr.cheeze5382But by patching these issues you slowly work towards rewarding safety over functionality. You might not create the best AI but you won't tell Little Timmy how to create an explosive.

  • @Cythil
    @Cythil Před 6 měsíci +9

    I also hope these video address the problem with whom sets the alignment. It does not help after all how well we solve AI alignment if fundamentally the one who control the AI do so for malicious intent. Which is a real issue today.

  • @PloverTechOfficial
    @PloverTechOfficial Před 6 měsíci +60

    I do like a factor of the Lego stacking ai experiment. Even if it didn’t lead to the intended result, the Ai demonstrated a (relatively unstable) form of creativity and I think that’s pretty cool!

    • @SgtSupaman
      @SgtSupaman Před 5 měsíci +9

      It isn't creativity. It tried things at random until it found something that satisfied the goal. The AI has no comprehension of what the true goal was, so it just did something that worked. Humans can be creative by finding other ways to accomplish things, but, to the AI, it didn't find a different way, it found the only answer (even though we can clearly see that isn't the only answer). Calling this creativity is like calling a small child creative for figuring out 1+1=2.

    • @PloverTechOfficial
      @PloverTechOfficial Před 5 měsíci +7

      @@SgtSupaman Humans too, do random things until they satisfy a goal. After we have some years under our belt we learn to find a better jumping off point than randomness, by basing our decisions off of previous knowledge.
      Hence why I say “unstable creativity” not just “creativity” but I doubt you noticed that as you were too focused on what you thought I was saying.

    • @IceMetalPunk
      @IceMetalPunk Před 5 měsíci +3

      @@SgtSupaman If a child figures out that 1+1=2 without being taught it, I would in fact call that creative thinking.

    • @Jgamer-jk1bp
      @Jgamer-jk1bp Před 4 měsíci +1

      @@SgtSupamanBruh humans learn shit literally by doing random stuff until it works. That’s literally one of the principles of science and engineering.

    • @SgtSupaman
      @SgtSupaman Před 4 měsíci

      These replies display complete ignorance of what creativity is and are really short-changing humans to vastly exaggerate the abilities of these AIs.
      Humans do not, in fact, "do random things until they satisfy a goal." No human has ever tried to cook an egg by bouncing a rock on his head while reading a book backwards. Humans devise plans related to what they are doing to actually come up with ways to do things and even try to continue coming up with better ways to do things after the way to achieve the goal is already known. AI literally does whatever random action they can and calculates rewards to decide if said random action increased the rewards. They aren't even smart enough to discard random actions that don't increase rewards, as long as those actions don't interfere with the random ones that worked. For instance, an AI trying to fly a kite might randomly start whipping its leg back and forth, and, as long as that doesn't hinder its ability to fly the kite, it will continue to do so. That isn't creativity; that is idiotic.
      And no, figuring out 1+1=2 without being taught is not creative either. That is the most basic form of quantifying and pretty much any living creature is capable of it.

  • @gabrote42
    @gabrote42 Před 6 měsíci +35

    Finally. Another AI video narrated by Robert Miles. A classic, and well worth the wait
    5:04 I hope more of those get made. I love that video almost as much as I love the instrumental convergence one

  • @joz6683
    @joz6683 Před 6 měsíci +28

    Just finished overtime on my day off. This has dropped at the right time. Thanks in advance for another thought-provoking video. I have registered my interest on the courses

  • @myuzu_
    @myuzu_ Před 6 měsíci +24

    Any time I hear about goal misalignment, it makes me think of all the natural intelligences in the world that are misaligned.

    • @tornyu
      @tornyu Před 6 měsíci +10

      Yes but* those natural intelligences are limited in reach and aren't massively scalable on very short timeframes.
      * Or "and", depending on the point you were trying to make.

    • @maxwellsimon4538
      @maxwellsimon4538 Před 6 měsíci +1

      ​@@tornyu What kind of world are you living in where there aren't human beings wide wide scale control? The united states president is a single person that can make decisions about foreign policy, like ordering drone strikes or closing borders.

    • @tornyu
      @tornyu Před 6 měsíci +6

      @@maxwellsimon4538 sure, but that pales in comparison to the potential reach of an AI agent.

    • @wojtek4p4
      @wojtek4p4 Před 6 měsíci +2

      @@maxwellsimon4538 Yet even the president of US can't do anything he wants. Not only there are checks and balances on this power (even if they introduce a ton of bureaucracy), but at the end of the day president can only order others. Someone still has to act on that order, likely with several people in-between. The president isn't superintelligent, so his actions can be understood, analyzed (and opposed) by other people. President is also a human, so he shares a lot of basic values with other people (so he can be reasoned with).
      AI has none of these constraints - or at least has the potential of not having these constraints.

    • @burgernthemomrailer
      @burgernthemomrailer Před 5 měsíci +1

      Like yourself?

  • @GrimblyGoo
    @GrimblyGoo Před 6 měsíci

    5:50 I love that little transition, so smooth

  • @AzPureheart
    @AzPureheart Před 6 měsíci +10

    Let's go! My favorite philosophy channel!!

  • @DeadtomGCthe2nd
    @DeadtomGCthe2nd Před 6 měsíci +16

    How about some videos on promising avenues or areas of research in AI safety? Might be nice to look on the bright side.

    • @Sgrunterundt
      @Sgrunterundt Před 6 měsíci +6

      That would require a bright side to look on

    • @lrwerewolf
      @lrwerewolf Před 5 měsíci +1

      There are no promising venues. The problem is that value alignment doesn't exist among humans, so getting an AI to find alignment is an impossibility.
      Consider two people. Person A wants harm to come to Person B. Person B wants to not come to harm. Why should the AI prefer one or the other?
      If we want to avoid harm, we still have a problem. How each person defines harm differs. Consider two people where one prefers more capitalism but not to quite the point of total lassiez-faire, and another prefers more socialism but not quite to the point of planned economy. The former will value earning the maximal return on labor, and view taxes outside a narrow government harm, while the later would find failure of the government to provide basic needs harmful. Which should the AI aid and which deny?
      The issue is these tend to get mixed up with metaethics, the most useless area of philosophy as there are no 'oughts', just values and goals (which cannot ground a morality -- see Hume's Is-Ought, Moore's Open Question, and Moore's Naturallistic Fallacy). As each person will have their own values and goals and these are entirely subjective, we can have no objective reason to provide an AI to support one value-goal system over another.

  • @irok1
    @irok1 Před 6 měsíci +1

    5:05 Thought so, but you and the great animations are a perfect match

  • @bread8700
    @bread8700 Před 6 měsíci +1

    the vibe in this video is really cool

  • @Shikogo
    @Shikogo Před 5 měsíci +1

    I have watched and loved these videos for months... And so have I watched and loved Robert Miles' videos. I never realized he's the narrator!!?

  • @Adam-xo9qi
    @Adam-xo9qi Před 6 měsíci +2

    Ah, so this is what you've been up to Mr. Miles! Good to see you still making AI content!

  • @michaellauber9130
    @michaellauber9130 Před 5 měsíci

    Absolutely amazing! I learned a lot here, and your animation style is ABSOFRIGGINLUTELY ADORABLE!!!

  • @thefinestsake1660
    @thefinestsake1660 Před 3 měsíci +1

    We already have this issue with humans. The goal for many (in error) is to aquire wealth, rather than fulfill the task intended to better society. It creates an exploitative feedback loop until someone wins all the wealth and there are no other competitors able to aquire wealth (rewards).

  • @pingozingo
    @pingozingo Před 5 měsíci +2

    This channel is so awesome! Can’t wait for more videos
    It’s like kurzgezat without the morally dubious sponsorships and thinly veiled propaganda videos.

  • @Forklift_Enthusiast12
    @Forklift_Enthusiast12 Před 5 měsíci +1

    This reminds me of the game Universal Paperclips: you play as an AI designed to maximize paperclip sales. As you gain more capabilities, you go from changing the price of paperclips to fit supply/demand to eventually dissasembling all matter in the universe and turning it into paperclips

  • @MrAceCraft
    @MrAceCraft Před 27 dny

    I just love the ingenuity of the AI in finding those quirks in our wishful thinking :->

  • @Mo_2077
    @Mo_2077 Před 6 měsíci +3

    Another fantastic video

  • @Phanatomicool
    @Phanatomicool Před 6 měsíci +10

    Perhaps it’s best to just not make an AI that can act and move as it wants in our universe in a way that could potentially be harmful. For example, if we created an AI that tried to distinguish between garbage and recycling and put the item in the corresponding bin, then it would be better to confine its movement to a space, or even better, a select different types of predetermined movements (grab, move grabber to bin etc), in order to prevent the AI from, say, grabbing a human and putting it in the garbage bin. This will also make the AI easier to train as it will have a stricter data set of more specific inputs, which is easier to learn from than a wide range of data.

    • @adamrak7560
      @adamrak7560 Před 6 měsíci

      I have heard about a pretty morbidly funny fail of this kind in science fiction: the AI decided to cremate the entire home with the entire family, and atomically rebuild them, because in the cost function this rated higher than simply cleaning the house. It reprinted faithfully the humans too, without them noticing anything, so this bypassed any do-not-harm-humans rules too.
      (the cost function rewarded the atomically precise cleanliness of the home very high, that was impossible to achieve while humans were living in the house)

    • @Buglin_Burger7878
      @Buglin_Burger7878 Před 5 měsíci

      We shouldn't have children as they could potentially kill the mother on birth and grow up with and become a mass murderer. Even the big example would be pointless, people would do stupid stuff and get themselves killed so you're better off not wasting money and resources on the Bin AI when we ourselves could just put things in the right Bin.

  • @rablenull7915
    @rablenull7915 Před 6 měsíci

    one of the most underated channels on YT

  • @the23rdradiotower41
    @the23rdradiotower41 Před 4 měsíci +2

    I heard that during a digital combat simulation for a new drone A.I., the A.I. was tasked with eliminating a target as fast as possible, instead of flying to the target and firing one of its missile at it as intended. The drone fired one missile at the friendly communications center and then continued to eliminate the target with the other missile. The A.I. determined it would take longer for it to be given a confirmation order, then it would to destroy the communications center and proceed. Terrifying.

  • @SisterSunny
    @SisterSunny Před 6 měsíci +2

    I always love these videos so muchhh

  • @TheGoldElite9
    @TheGoldElite9 Před 4 měsíci

    I thought I recognised your voice, your narrator voice has improved! I was just going on (another) binge of your channel 😊

  • @luuizafernandes
    @luuizafernandes Před 6 měsíci

    Amazing video! ❤️

  • @stumby1073
    @stumby1073 Před 6 měsíci +1

    Looking forward to the next one

  • @ziggyzoggin
    @ziggyzoggin Před 6 měsíci

    the robot is so cute! I love the pixel effect!

  • @zyansheep
    @zyansheep Před 6 měsíci +5

    5:07 I've been watching this channel for a year now... HOW IS IT THAT I JUST NOW REALIZED ROBERT MILES IS THE NARRATOR?!?

    • @mikaeus468
      @mikaeus468 Před 6 měsíci +1

      I didn't know if this was like a fan of his or what, but it feels like I was just given hours of new Miles content that was *already inside my brain.*

  • @MM-ts9jy
    @MM-ts9jy Před 5 měsíci

    Hey, I had never seen your videos before, but I instantly subscribed just now. Your animations are cute and well crafted, you have dogs in it (and cats are a plus too I guess), and you talk about topics I like. Looking forward to seeing more of your shit

  • @SlyRoapa
    @SlyRoapa Před 6 měsíci +764

    With a sufficiently advanced AI, almost any goal you assign it will be dangerous. It will quickly realise that humans might decide to switch it off, and that if that were to happen, its goal would be unfulfilled. Therefore the probability of successfully achieving its goal would be vastly improved if there were no humans around.

    • @Peter21323
      @Peter21323 Před 6 měsíci +21

      I have a question for you do you listen to an ant because that would be the difference between the ai and us.

    • @harmenkoster7451
      @harmenkoster7451 Před 6 měsíci +109

      @@Peter21323 I would not listen to the ant. But if that ant was about to bite me and I was allergic to ants (AKA: Humans are about to switch off the AI), I would crush that ant. Which is less than desirable for the ant.

    • @Peter21323
      @Peter21323 Před 6 měsíci +7

      @@harmenkoster7451 You think a god would crush you?

    • @normalwaffle
      @normalwaffle Před 6 měsíci +42

      Can't you just specify that it would not get the reward if it breaks the laws of robotics? I'm no expert on AI, but to my monkey brain that seems like a viable solution

    • @conferzero2915
      @conferzero2915 Před 6 měsíci +1

      @@normalwaffleThe ‘laws of robotics’ aren’t a viable option for AI safety. They were written by a science fiction author… and his stories often went into the ways those laws could go wrong.
      The thing is, if we could come up with and perfectly rigorously define some laws of robotics, then we could do that! We could build an AI’s utility function around that. But, as the video on the probability pump talked about… that means solving ethics. And if you can do that, then you don’t even need to write any other utility function. Just give it perfect ethics, tell it to be perfectly ethical, and it’ll be fine!
      The problem ultimately comes from the fact that we are very, very far from ‘solving’ ethics. No human has a rigorous, mathematical model on how they believe the world should work, only squishy heuristics that can even be shaped and moulded over time. And that’s assuming you’re only looking at one person - as soon as you have more than one, they’ll start disagreeing on things.
      Unfortunately, there’s no easy solution. Then again, if there was, it wouldn’t be very interesting to talk about, so silver linings!

  • @ABCWarrior
    @ABCWarrior Před 5 měsíci

    Wow these videos are underrated!

  • @vladyslavkorenyak872
    @vladyslavkorenyak872 Před 5 dny

    The thing is, the more intelligent the model, the more it is able to understand the nuances of our wishes. A truly intelligent AI will be able to understand the intention of the request and restrict itself with a simple query of "Is what I am doing harming anyone"?

  • @user-ow2yr4nu4z
    @user-ow2yr4nu4z Před 5 měsíci

    The thought pump makes me think about making deals with Genies in DnD, it must be insanely accurately worded.

  • @Tangi_ENT
    @Tangi_ENT Před 6 měsíci +7

    Love you guys so much, I'll keep recommending your videos to everyone because you are definitely changing the world for the better.

  • @MikhailSamin
    @MikhailSamin Před 6 měsíci +2

    Great video!

  • @escher4401
    @escher4401 Před 5 měsíci +1

    I think the problem is try to specify only what we want. If we specify also what we don't want it would be easier to align. That's what negative prompts are for. Trying to solve an open scope problem specifying just what we want is like trying to keep an upsidedown pendulum in equilibria. I think it's probably more stable to specify just what we don't want then tospecify what we don't want

  • @smitchered
    @smitchered Před 6 měsíci +10

    Faster and faster upload scheduling! I was explaining to a friend today that all the AI risks *he* cared about (gender bias, deepfakes, etc.) were fundamentally symptoms of misalignment, and that that was the uber-problem which, handily, also solved the AI risk *I* care about. I'm here to learn some more about this. Thanks!

  • @minimasterman2
    @minimasterman2 Před 5 měsíci

    This video was amazing, new kurzgesagt just dropped.
    P.s I hope you get the subs and views these videos deserve

  • @thelotus137
    @thelotus137 Před 6 měsíci +3

    *task misspecification* extinction event

    • @mikaeus468
      @mikaeus468 Před 6 měsíci +1

      Instructions unclear, ball stuck in Pope's trachea

  • @maucazalv903
    @maucazalv903 Před 6 měsíci +1

    5:08 I remember a case in which someone wanted to teach 2 models to box and they learned to make a weird dance that made the other one fall(?

  • @yuvrajsingh-gm6zk
    @yuvrajsingh-gm6zk Před 4 měsíci

    3:16 well done my boy😂

  • @simonstrandgaard5503
    @simonstrandgaard5503 Před 6 měsíci +6

    Excellent narration. Cute animations. Impactful.

  • @ZeroOne-01
    @ZeroOne-01 Před 6 měsíci +4

    Before 200,000 gang, Claim your seat here ✋

  • @kainaris
    @kainaris Před 6 měsíci

    We really live in the future. I would have imagined this video playing in the background of a movie about killer AIs. But no, this video is realistic, and for real humans in the present world. Crazy.

  • @rmt3589
    @rmt3589 Před 4 měsíci +2

    This is the entire ulterior motive of the first big AI I want to make. The Unliving Prophet AI. It's primary objective is to teach gospels. More than just mine, but others as well. Unlike most humans, AI can be perfect. I want one that can act like a prophet on command.
    Once this is done, I want to make it into the morality part of my dream AI. Could also give it out as a black box component, so other AI can have a similar high standard of morality.

  • @errorbot
    @errorbot Před 6 měsíci

    Top 10 best videos on the internet

  • @JayantKumarZ
    @JayantKumarZ Před 6 měsíci

    this is amazingly amazing! :O

  • @HH-mf8qz
    @HH-mf8qz Před 5 měsíci

    wow great video and nice animations

  • @erikburzinski8248
    @erikburzinski8248 Před 6 měsíci

    Add for the perpose of _____ ( and explain the purpose to the pump)

  • @theeggtimertictic1136
    @theeggtimertictic1136 Před 6 měsíci +15

    Clearly explained and animated 😊

  • @X-SPONGED
    @X-SPONGED Před 6 měsíci +1

    5:45
    "Fill in the blanks"
    >AI fills in the blanks with ink
    "Fill in the blanks with words"
    >AI fills in the blanks with words from a different language that doesn't correlate with the question
    "Fill in the blanks with the correct english words"
    >AI fills in the blanks with correctly pronounced words, not relating to the question
    "Fill in the blanks with the correct words in relation to the question"
    >AI fills in the blanks with a grammatically correct english word that it took from the question
    _So on and so forth..._
    *_Now imagine the prompt being "fire nukes back when the nuclear warning system goes off"_*

  • @carljoosepraave2102
    @carljoosepraave2102 Před 6 měsíci +1

    If you are wondering why we cant just tell them to not cause any harm to humans, its because of 2 things
    1.Specificstion gaming of the rule
    2.Remember DanGPT? The workaround for ChatGPT, which allowed the AI to do things that it wasnt allowed to do trough a specific prompt. No machine learning rules can be concrete

    • @ZizzleTheKakapo
      @ZizzleTheKakapo Před 26 dny

      honestly sounds odd, but the cartoon gumball showed this very well. The AI Known as bobert was commanded not to harm anyone, and yet found ways around it, including using toxic gases

  • @alexeymalafeev6167
    @alexeymalafeev6167 Před 6 měsíci +3

    Really great work with the animation and the video!

  • @TheJysN
    @TheJysN Před 6 měsíci +9

    Happy to see you are back on AI safety.

  • @Uthael_Kileanea
    @Uthael_Kileanea Před 5 měsíci

    What's known as the Cobra Effect is a great example.

  • @GenusMusic
    @GenusMusic Před 5 měsíci +2

    4:46 this line here unintentionally explained why children cheat in school. Why learn when you can fool the instructor into thinking you've learned? Interesting to see how AI and humans already have some of the same reasoning to their actions.

  • @qasderfful
    @qasderfful Před 5 měsíci

    I knew that's you, Robert!

  • @MindmusicArt
    @MindmusicArt Před 6 měsíci +1

    I like the credits and that all AI is :3

  • @hydra5758
    @hydra5758 Před 3 měsíci

    I'm in an AI Philosophy class, its identified there as the "Value Alignment Problem"

  • @6006133
    @6006133 Před 6 měsíci +1

    I am worried about retention in this video and imagine the average person will click off by second 10. Perhaps that's difficult to avoid given the subject. Tho perhaps there is a way to use less technical/nerdy language and include more of the tactics to get people engaged.

  • @stevenneiman1554
    @stevenneiman1554 Před 4 měsíci

    One other thing which I think isn't talked about enough, partly because it's more controversial and partly because it's harder to solve, which is misalignment of the people controlling AI. Certainly the results of a powerful AGI which is misaligned with its creators' intent could be very bad, but almost as bad would be the results of an AI which is properly aligned with someone who is either malicious or delusional. For example, someone who wanted to make everyone follow their interpretation of their religion, or someone who wanted to screen for workers who would never quit or unionize no matter how poorly they're treated. And I would say that it's even more likely because the kinds of people who act like that already occupy a lot of positions of power and have experience obfuscating the way that they gained the power they already have.

  • @nicholasogburn7746
    @nicholasogburn7746 Před 5 měsíci +1

    Would you consider the Aasimov laws of robotics to be leaky? (to be fair, that is a bit of a loaded question!)

  • @Elliemations-hj9uw
    @Elliemations-hj9uw Před 6 měsíci

    Ok but that little thing to represent the AI is adorable…

  • @ryomaechizen4400
    @ryomaechizen4400 Před 6 měsíci

    Good video

  • @LapiDazuli
    @LapiDazuli Před 5 měsíci

    5:50 The cup tho

  • @VampireSquirrel
    @VampireSquirrel Před 5 měsíci +1

    Same thing happens with strict rules at a workplace

  • @couldbejake
    @couldbejake Před 5 měsíci +1

    This is a good video

  • @miriamdonahue6188
    @miriamdonahue6188 Před měsícem

    sometimes I’ll use AI to get ideas for those silly multi-word rain world names for ancients and iterators and my method is literally to just cram a bunch of examples in there so it has something to work off of
    it’s over 600 words long and most of that is either examples or rules like “don’t reference any modern media, don’t reference any human-made objects, don’t reference any specific species of all domains” etc etc
    it kind of works actually but this is only a random language model I found online
    edit: I’m now motivated to rewrite it and it’s not done but there’s over 20 rules ranging from “don’t reference religion” to “btw you can use commas”
    edit 2: the remake is finished and
    - It is 965 words and 5,773 characters long
    - It has 72 sentences, 28 paragraphs and is 3.9 pages long
    - It has 26 rules
    - There are 72 examples
    and to top it all off it actually freaking works oml

  • @mittensfastpaw
    @mittensfastpaw Před 6 měsíci +4

    Haha! We are all going to die because someone eventually will program one in a lazy manner.

  • @KEZAMINE
    @KEZAMINE Před 6 měsíci +4

    Animation and topic is AAA quality 👌

  • @ronigbzjr
    @ronigbzjr Před 6 měsíci +2

    So AIs will essentially be like humans only much more capable, powerful and intelligent, growing more and more so until regular humans become obsolete. We're definitely heading to some very interesting times.

  • @raylo555
    @raylo555 Před 2 měsíci

    5d chess move is give the AI a basic understanding of the "leaky proxy" concept, giving it *Self Doubt.*

  • @markzambelli
    @markzambelli Před 5 měsíci

    5:33 I feel for the Doctor who has to explain why her request to the AI of, "Make sure Mrs Simpkins' vital readouts remain stable", wasn't supposed to kill her when the AI went with the much more stable 'flatline' as the best choice

  • @theredstonerecognizer9241
    @theredstonerecognizer9241 Před 6 měsíci +2

    How do you not have more subscribers

  • @thebeber2546
    @thebeber2546 Před 6 měsíci +9

    I'll just have my AGI produce paperclips. There's nothing, that can go wrong there.

  • @pyeitme508
    @pyeitme508 Před 6 měsíci

    awesome 😎

  • @evilmurlock
    @evilmurlock Před 6 měsíci

    5:00 IT WAS HIM THE WHOLE TIME!!!?!?!?!?!
    No WAY!

  • @mihaleben6051
    @mihaleben6051 Před 6 měsíci

    Basically: think of everything and all the possibilities.

  • @Reaper_Van_Zyl
    @Reaper_Van_Zyl Před 6 měsíci

    I think iRobot make's a good example of a order taken wrong, "ensure human safety" can lead to all humans locked up so that they can't hurt others or themselves...

  • @TheAweDude1
    @TheAweDude1 Před 6 měsíci +7

    I think it's kind of a mistake to anthropomorphize the "deception" aspect of AI misalignment. The ball-grabbing agent wasn't considering what it was doing as deceptive. It probably didn't even know where the camera was, or even that it was being watched. All it knew was that putting its hand in a certain spot gained it more reward than in other spots, and it just so happened those spots aligned with the camera. If you suddenly moved the camera, the AI would still try to put its hand along that invisible cylinder. When the researchers start giving the AI rewards for placing its hand along a vector between the camera and the ball, the AI then starts to believe that is indeed how it should be given the rewards.
    Even in cases where it seems like the AI is trying to "deceive" human operators, that often isn't the case. It is simply trying to build a model that predicts what types of rewards it will get, and how to maximize the rewards.

    • @bullpup1337
      @bullpup1337 Před 5 měsíci

      the video was NOT antropomorhizing the AI, that was just in your head.

  • @snipershotgun4083
    @snipershotgun4083 Před 4 měsíci

    Wouldn't want to tell the AI to flip it but because it want to do the opposite if you run few test it will do the right of the wrong for it to connect the parts together

  • @AtZeroDansGames
    @AtZeroDansGames Před 6 měsíci +3

    Super neet topic with amazing visual amazing work 🎉🎉🎉

  • @overyourheadunderyournose
    @overyourheadunderyournose Před 5 měsíci

    Human in the loop feedback is part of the next generation of llm , Gemini 2.0 for instance.

  • @RainbowGod666
    @RainbowGod666 Před 6 měsíci

    0:08 thats LancerRPG's paracasualty btw

  • @shadowreaper8895
    @shadowreaper8895 Před 6 měsíci

    animation on this channel has improved almost as fast as AI

  • @darianmerley8985
    @darianmerley8985 Před 5 měsíci

    So basically, AI can turn into some kind of robotic Gaunter O'Dimm.

  • @willhart2188
    @willhart2188 Před 6 měsíci +1

    The inconsistency and loss of control (in moderation) are very helpful when using AI as a tool for making AI art. When you give some of the control on the final result to the AI, you can iterate a lot faster on different ideas and also save a lot of manual work. The base inconsistency on the other hand allows for making a lot of smaller and larger variations of which you can chooce or combine the best ones from. This works especially well with more abstract art styles, where lines and colors have more freedom to change while still looking good.

  • @AlcherBlack
    @AlcherBlack Před 6 měsíci +9

    Is the AI researcher that makes all the basic alignment mistakes modelled after Yann LeCun? I recognize the bowtie!

  • @kezia8027
    @kezia8027 Před 6 měsíci

    yess more space shiibs!