The Hidden Complexity of Wishes

Rational Animations

zhlédnutí 353 978

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 26. 09. 2023
This video is about AI Alignment. At the moment, humanity has no idea how to make AIs follow complex goals that track human values. This video introduces a series focused on what is sometimes called "the outer alignment problem". In future videos, we'll explore how this problem affects machine learning systems today and how it could lead to catastrophic outcomes for humanity.
The text of this video has been slightly adapted from an original article written by Eliezer Yudkowsky. You can read the original article here: www.readthesequences.com/The-...
If you’d like to skill up on AI Safety, we highly recommend the AI Safety Fundamentals courses by BlueDot Impact at aisafetyfundamentals.com
You can find three courses: AI Alignment, AI Governance, and AI Alignment 201
You can follow AI Alignment and AI Governance even without a technical background in AI. AI Alignment 201, instead, presupposes having followed the AI Alignment course first, and equivalent knowledge as having followed university-level courses on deep learning and reinforcement learning.
The courses consist of a selection of readings curated by experts in AI safety. They are available to all, so you can simply read them if you can’t formally enroll in the courses.
If you want to participate in the courses instead of just going through the readings by yourself, BlueDot Impact runs live courses which you can apply to. The courses are remote and free of charge. They consist of a few hours of effort per week to go through the readings, plus a weekly call with a facilitator and a group of people learning from the same material. At the end of each course, you can complete a personal project, which may help you kickstart your career in AI Safety.
BlueDot impact receives more applications that they can take, so if you’d still like to follow the courses alongside other people you can go to the #study-buddy channel in the AI Alignment Slack. You can join by clicking on the first entry on aisafety.community
You could also join Rational Animations’ Discord server at discord.gg/rationalanimations, and see if anyone is up to be your partner in learning.
▀▀▀▀▀▀▀▀▀PATREON, MEMBERSHIP, KO-FI▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🟠 Patreon: / rationalanimations
🟢Merch: crowdmade.com/collections/rat...
🔵 Channel membership: / @rationalanimations
🟤 Ko-fi, for one-time and recurring donations: ko-fi.com/rationalanimations
▀▀▀▀▀▀▀▀▀SOCIAL & DISCORD▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Discord: / discord
Reddit: / rationalanimations
Twitter: / rationalanimat1
▀▀▀▀▀▀▀▀▀PATRONS & MEMBERS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Shrimant
RMR
Kristin Lindquist
Nathan Metzger
Monadologist
Glenn Tarigan
NMS
James Babcock
Colin Ricardo
Long Hoang
Tor Barstad
Gayman Crothers
Stuart Alldritt
Ville IkÃ¤lÃ¤inen
Chris Painter
Juan Benet
Falcon Scientist
Jeff
Christian Loomis
Tomarty
Edward Yu
Ahmed Elsayyad
Chad M Jones
Emmanuel Fredenrich
Honyopenyoko
Neal Strobl
bparro
Danealor
Craig Falls
Aaron Camacho
Vincent Weisser
Alex Hall
Ivan Bachcin
joe39504589
Klemen Slavic
Scott Alexander
noggieB
Dawson
John Slape
Gabriel Ledung
Jeroen De Dauw
Craig Ludington
Jacob Van Buren
Superslowmojoe
Nicholas Kees Dupuis
Michael Zimmermann
Nathan Fish
Ryouta Takehiko
Bleys Goodson
Ducky
Bryan Egan
Matt Parlmer
Tim Duffy
rictic
Mark Gongloff
marverati
Luke Freeman
Dan Wahl
Rey Carroll
Alcher Black
Harold Godsoe
William Clelland
ronvil
AWyattLife
codeadict
Lazy Scholar
Torstein Haldorsen
Supreme Reader
Michał Zieliński
▀▀▀▀▀▀▀CREDITS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
Animation director: Hannah Levingstone
Writer: Eliezer Yudkowsky
Editor and producer: :3
Line Producer and production manager:
Kristy Steffens
Quality Assurance Lead:
Lara Robinowitz
Animation:
Michela Biancini
Owen Peurois
Zack Gilbert
Jordan Gilbert
Keith Kavanagh
Damon Edgson
Neda Lay
Colors Giraldo
Renan Kogut
Background Art:
Hané Harnett
Zoe Martin-Parkinson
Olivia Wang
Compositing:
Renan Kogut
Patrick O'Callaghan
Ira Klages
Voices:
Robert Miles - Narrator
VO Editing:
Tony Di Piazza
Sound Design and Music:
Epic Mountain
Věda a technologie

Komentáře • 1,4K

@RationalAnimations Před 7 měsíci ⁺¹¹⁴⁴
This video is about AI Alignment. At the moment, humanity has no idea how to make AIs follow complex goals that track human values. This video introduces a series focused on what is sometimes called "the outer alignment problem". In future videos, we'll explore how this problem affects machine learning systems today and how it could lead to catastrophic outcomes for humanity.
The text of this video has been slightly adapted from an original article written by Eliezer Yudkowsky. You can read the original article here: www.readthesequences.com/The-Hidden-Complexity-Of-Wishes
If you’d like to skill up on AI Safety, we highly recommend the AI Safety Fundamentals courses by BlueDot Impact at aisafetyfundamentals.com
You can find three courses: AI Alignment, AI Governance, and AI Alignment 201
You can follow AI Alignment and AI Governance even without a technical background in AI. AI Alignment 201, instead, presupposes having followed the AI Alignment course first, and equivalent knowledge as having followed university-level courses on deep learning and reinforcement learning.
The courses consist of a selection of readings curated by experts in AI safety. They are available to all, so you can simply read them if you can’t formally enroll in the courses.
If you want to participate in the courses instead of just going through the readings by yourself, BlueDot Impact runs live courses which you can apply to. The courses are remote and free of charge. They consist of a few hours of effort per week to go through the readings, plus a weekly call with a facilitator and a group of people learning from the same material. At the end of each course, you can complete a personal project, which may help you kickstart your career in AI Safety.
BlueDot impact receives more applications that they can take, so if you’d still like to follow the courses alongside other people you can go to the #study-buddy channel in the AI Alignment Slack. You can join by clicking on the first entry on aisafety.community
You could also join Rational Animations’ Discord server at discord.gg/rationalanimations, and see if anyone is up to be your partner in learning.
@rat_king- Před 7 měsíci
*Kissu*
@FloridanMan Před 7 měsíci ⁺⁷
What happens if two wishes contradict each other?
@tmmroy Před 7 měsíci ⁺¹³
I think the best alignment we could hope for may be one that will make us truly uncomfortable. An ally maximizer paired with a parasite minimizer. If the machine wanted you to be an ally it would know that saving your mother is likely to lead to you as an ally. You won't have to ask for it's help But allies both give and receive, and our wish for an aligned AI is largely to be parasites. We want to increase our control over a complex system without giving anything at all. But the advantage of an ally maximizer and parasite minimizer is that the concepts generalize to enough games that the AI agents could be trained in a sandboxed environment that includes humans as players to check for the organic ability for human and AI agents to act as allies to one another. The greatest risk would largely be that the AI allies itself to humanity by domesticating us, but there's an argument to make that we largely do this to ourselves already. It's not necessarily a terrible outcome compared to alternative methods of alignment.
Just my thoughts.
@lawrencefrost9063 Před 7 měsíci ⁺¹
awesome!
@XOPOIIIO Před 7 měsíci ⁺³
Thank you for the episode. But personally I find the concept explained too obvious for long explanation.
@RazorbackPT Před 7 měsíci ⁺⁵⁶⁰
I wonder what the conversation was like when they realised they would have to animate a family dog in this world where everyone is already a dog.
@ultimaxkom8728 Před 5 měsíci ⁺⁷
Or family dog as in an M dog or S's dog.
Or the abolished s-word.
Or... furry? Hmm how would that even work?
Cosplaying as your ancestors?
@soupcangaming662 Před 3 měsíci ⁺²
A cat.
@arandom_bwplayeralt Před 3 měsíci ⁺²
a human
@Zodaxa_zdx Před 3 měsíci ⁺³
was so not prepared for "family dog" when they were all dogs, to see little creature in a gerbal ball, yup that's the dog
@AlexReynard Před 11 dny
I do not understand why this idea freaks some people out. Have you never seen a human with a pet monkey?
@BaronB.BlazikenBaronOfBarons Před 7 měsíci ⁺²¹⁵⁰
I’m reminded of SCP-738, which, boiled down, is essentially a genie.
One of the tests preformed on it was a lawyer attempting to make a wish on it. A wish was never made. 41 hours passed, all of which was used forming a 900+ page contract, before the lawyer passed out from exhaustion.
The last thing the lawyer was trying to do before blacking out was quote “negotiating a precise technical definition of the word ‘shall’” unquote.
@user-qi6pv9jh7o Před 7 měsíci ⁺³¹⁴
lawyer was used because 738 always asked for a decent sacrifice (and doesn't account the unhappiness caused by granted wish)
@bestaround3323 Před 7 měsíci ⁺²⁶¹
The Lawyer actually greatly enjoyed the process along with the devil.
@Jellyjam14blas Před 7 měsíci ⁺¹³⁵
XD exactly. You/your grandma would be dead before you'd finished listing all the ways you don't want to be taken out of the building. I would just wish for something like "Please safely bring my (as healthy as possible) grandma out of that building"
@Mahlak_Mriuani_Anatman Před 7 měsíci ⁺³⁷
@@Jellyjam14blassame thoughts, how about following what your mind wants 100%
@rhysbaker2595 Před 7 měsíci
The issue with that is that the probability maximiser doesn't understand English. How would you define "safely" and "as healthy as possible." And as the video mentioned towards the end, what side effects are you not taking into consideration?
@@Jellyjam14blas
@lucas56sdd Před 7 měsíci ⁺¹⁴⁸⁶
"There is no safe wish smaller than an entire human morality"
I have plenty of problems with Eliezer, but he is such a useful perspective on so many of these previously unthinkable questions. Incredibly well said.
@justaguy3518 Před 7 měsíci ⁺¹⁸
what are some of your problems with him?
@Frommerman Před 7 měsíci ⁺¹²⁹
Sophie From Mars, a woman whose content I have a lot of respect for, recently did a video which included the line, "Eliezer Yudkowsky is a man who is interesting, but not for any of the reasons he thinks he is."
I agree with this judgment. Eliezer is a pompous, well-off, white (for all definitions of white other than that of white supremacists, whose definitions of anything should never be considered), man who has only ever experienced a single major injustice as far as I can tell. That being the untimely death of his brother. He doesn't get that none of his dreams of a transhuman future are possible in a world where all the people with the power to make AI agents are telling them to maximize bank accounts instead of human values. He blithely handwaves away the fact that most current global injustices are directly caused by systems with the unjustifiable claim that technologies entirely controlled by the people who benefit from those systems will solve the injustices they benefit from. He refuses to consider the possibility that humanity has already produced a misaligned artificial agent which is currently destroying us all, which we call capitalism.
But for all that, for all that he's desperately wrong about a lot of very important things, I don't think he's wrong about this. Most of the stuff he thinks about is essentially useless in the short and medium term, but that's not the way he thinks. For all that we need far more people thinking about how we are to survive the coming century, I'm glad there's someone thinking about how to survive all subsequent ones without sacrificing the technologies which got us here. The world can afford to have a few people thinking about what happens in the time after the revolution.
@justaguy3518 Před 7 měsíci ⁺¹³
@@Frommerman thank you
@silentobserver3433 Před 7 měsíci ⁺¹⁷¹
@@Frommerman Didn't he literally write a book (Inadequate Equilibria) about how capitalism is a misaligned artificial agent and how most of the current problems are caused by lack of cooperation? I'm pretty sure he understand all of the injustices and problems even without experiencing much of them themselves. He just thinks that "not being killed by AI" is higher priority than "solving the world's injustices". Nothing else matters much if we are facing an extinction event
@SticksTheFox Před 7 měsíci ⁺⁴
And the more difficult thing than that is we each have our own boundaries and morality that defines us. My morality, is possibly, very different from yours
@AndrewBrownK Před 7 měsíci ⁺²¹⁸⁰
major problem with alignment is that humans themselves are not aligned, so how can we pretend there is headway to make on aligning AI if we can't even agree with ourselves first?
@JH-cp8wf Před 7 měsíci ⁺²⁶⁴
I think this is actually a very important point often missed.
I think we should seriously consider the possibility that alignment work itself could be very dangerous- there are plenty of people who could cause extreme damage /by/ successfully aligning an AI with their values.
@sshkatula Před 7 měsíci ⁺⁸⁴
Between many races, religions and cultures there are different human moralities. And if people start to align different AI with different moralities it could end in an AI war. Maybe we should try to evolve wise AI, so It could align us instead?
@thugpug4392 Před 7 měsíci ⁺⁹⁹
@@sshkatula I am never going to let an algorithm prescribe morals to me. I don't believe there is an objective morality. What you're talking about is hardly any different than any number of religions we already have. Instead of a holy book, it's a holy bot. No thanks.
@AkkarisFox Před 7 měsíci ⁺⁴⁰
@@sshkatulaDo we want to be "aligned"? Doesn't the concept of aligning leave out the question of who is being aligned to who?
@AkkarisFox Před 7 měsíci ⁺²⁷
How do you reconcile two diametrically opposed value judgments without intrinsically changing such value judgments and thus manipulating said agent of consciousness.
@supersmily5811 Před 7 měsíci ⁺⁷²¹
I know this is about A.I., but I'm absolutely field testing this the next time I get a Wish in D&D.
@AndrewBrownK Před 7 měsíci ⁺⁶⁰
since it rests on the pretense that the wish fulfiller is aligned with you, might work better on a Cleric's Divine Intervention
@dervis621 Před 7 měsíci ⁺¹⁶
I just waited for a D&D comment, thanks! :D
@DeruwynArchmage Před 7 měsíci ⁺²⁷
Is your DM aligned with you? Does your DM believe that the wish granter or granting mechanism is aligned with you? Has your DM been on lesswrong or watched any content like this?
If your answers are yes, yes, and no; then your wish is probably safe.
@supersmily5811 Před 7 měsíci ⁺⁵
@@DeruwynArchmage Oh, I doubt all of that. I just know it'll mess with 'em and anything I can do to crash my DM's OS is worth trying.
@Julzaa Před 7 měsíci ⁺⁵
The video title made me think immediately of hags in D&D
@4dragons632 Před 7 měsíci ⁺⁷⁸⁶
My absolute favourite part of this story is that if the outcome pump didn't have a regret button then the person saving their mother wouldn't have died. Any time the outcome pump does something which would cause someone to push the regret button and assign negative value to that path through time they _cant_ have pushed the button because the pump wouldn't pick that future. The only way that the pump can do something so bad the regret button is pressed is if it kills the user before they can press it. The regret button is a death button.
@facedeer Před 7 měsíci ⁺⁸⁰
Amusingly, if the multiple-worlds model of quantum mechanics is true, then the death button should work just fine. You'll only end up existing in worldlines where things went to your liking.
@CalebTerryRED Před 7 měsíci ⁺⁷²
@@facedeer in a many worlds universe the machine wouldn't work at all, since every failed universe is equally real as the success universe, and you're more likely to be in one of those. The story kind of requires it be set in a different kind of universe, one where inconsistent timelines that lead to reset never existed in the first place. In that universe, the button can never actually be pressed, but being willing to press it changes what timelines can happen. So we're left with a strange conundrum, you need to be willing to press it in any negative timeline for it to work, but actually pressing it in the current timeline is a death sentence, since the machine won't let it actually be pressed
@oasntet Před 7 měsíci ⁺⁵¹
It does represent an unexplored loophole, though. "and I remain alive and capable of pressing the regret button" appended to the 'wish' turns it into more of a mechanism by which a near-infinite number of copies of you experience every possible outcome and use your own moral judgement about the result. Presumably that avenue was left unexplored because it doesn't really relate to AI, because an AI, no matter how intelligent, is not a time machine or even perfectly capable of predicting the future.
@silentobserver3433 Před 7 měsíci ⁺⁴
@@CalebTerryRED *annoying nerd voice* Well, actually, it *does* work in the many worlds universe, because the universes are not "equally real", they are weighed by probabilities assigned to them. So if the outcome pump can multiply the probability of a timeline by a very small number *without splitting the timeline further*, it can do that *from the future*, because MWI is self-consistent exactly in the described way.
@silentobserver3433 Před 7 měsíci ⁺¹⁷
@@oasntet 1) Not that easy, you could still be brain-dead and not willing to press the button in any scenario, or you could be *technically* capable of doing that, but it'd require you to perform something really hard (that you will obviously fail to do because of the regret button)
2) It is indeed a loophole, I saw a technical research post on the alignment forum about something like this. The gist is that you don't ask your future self if you liked the solution or not, you simulate your past self's utility function through some counterfactual questioning ability. Very complicated and almost definitely sci-fi, but still
@pendlera2959 Před 7 měsíci ⁺²⁶²
This explains why educating a child has to include more than just facts; you have to teach them morals as well.
@ShankarSivarajan Před 7 měsíci ⁺²⁸
_Technically_ true, but that sounds much harder than it actually is, since humans have evolved an innate moral system.
@pokemonfanmario7694 Před 7 měsíci ⁺⁴⁰
@@ShankarSivarajan Humans have a good *self-alignment* system pre-packaged, but our mess of values can easily derail it without a good foundation to support us through development.
@ShankarSivarajan Před 7 měsíci ⁺¹⁷
@@pokemonfanmario7694 Sticking with analogies, I think of it as more similar to language development than learning to walk: unlike the latter, it takes _some_ teaching, but it's so easy that it takes extreme circumstances to screw up badly.
@Willsmiff1985 Před 7 měsíci ⁺¹⁰
@@ShankarSivarajan I’d hesitate to call it innate.
Look at individuals who were hard isolated from other people until later in life; children who grow up this way are EXTREMELY socially deficient while devoid of any direct abusive contact with others.
I’d hesitate to say anything innate is bubbling up from them; social morality as a concept isn’t even a THING as they’ve developed no understanding of social structure.
Without that understanding, what moral rules are there to break???
@ShankarSivarajan Před 7 měsíci ⁺⁷
@@Willsmiff1985 As I said, it's as innate as language acquisition. Sure, it is possible to cripple, but only under extreme circumstances.
@pwnmeisterage Před 7 měsíci ⁺¹⁸⁵
I am reminded of my ancient AD&D gaming days.
You got a wish? The most powerful spell in the game? Congrats!
House rule: it must be written so there's no backsies and so (in theory) there's fewer arguments over the exact wording.
But this was gaming in the days of Gygaxian-era antagonistic, confrontational DMs. The "evil genies" of this story. Inspired to twist and ruin the wish any way they can, determined to somehow find a way to deliberately pervert the wish into something the player did not desire. It's amazing how stubbornly bad the outcome of every wish can be if the DM insists on treating the spell as it it were a powerful curse.
And such was also the common expectation. So players wrote their wishes as complex, comprehensive essays full of legalese conditions, parameters, detailed branching specifications. It is amazing how lengthy and convoluted "a single spoken sentence" can become when it's ultimately motivated by greed. And it's equally amazing how players will keep trying over and over again to get the thing they wished for after repeated horrible failures.
@AtticusKarpenter Před 7 měsíci ⁺²⁷
He-he
And in most cases, they know that GM still can turn their wish into a nightmare, he just must to think longer when so many failsafes included in wish, so they hope GM just will be bored of this sooner than generate properly bad result
@nonya1366 Před 7 měsíci ⁺³²
"The wish has to be a single spoken sentence."
>Writes up entire legal document.
@vakusdrake3224 Před 7 měsíci
The fact they wrote up such long documents makes me think they missed the obvious hack that lets you exploit wishes that are based on english. Which is to just include a clause about the wish being granted according to how you envisioned it being fulfilled at specified time X prior to making the wish. Also if time travel is a possibility then that requires extra caveats to avoid it traveling back in time to mind control you in the past.
@Feuerhamster Před 7 měsíci ⁺⁴¹
>The wish has to be a single spoken sentence
It's good that i'm a bard and I am beginning to feel like a rap god.
@magnus6801 Před 7 měsíci ⁺¹
И теперь, как я понимаю, ты считаешь, что следует показать DM это видео, а потом попросить в желание те самые слова из концовки?
Если уж юдковский считает это выходом, то имеет самому считать это выходом.
@AlcherBlack Před 7 měsíci ⁺⁸⁷
This should be required material when onboarding in any AI lab these days
@danitho Před 7 měsíci ⁺⁴
I think the problem is not that those working on AI don't know better. It's that they want to do it anyway. That's always been a downside of humanity. There will always be those who know what is right and choose wrong anyway.
@EverythingTheorist Před 7 měsíci ⁺⁹³
6:49 I'm so glad that you said this part out loud, instead of just leaving us with a vague "be careful what you wish for". We want our mother to be alive and safe, but we're constrained by our own imagination to believe that her getting out of the burning building is the only way to do that. What if she manages to hide in a room that doesn't burn or collapse? Then she could survive without gaining any distance at all.
Almost all humans already value human life very highly, so telling a human "Get my mother out!" already implies "alive and safe". The outcome pump makes no such assumptions. Like any computer code, it does what it's told, not what you want.
@Ponera-Sama Před 5 měsíci ⁺¹
Who is "we"?
@YourFriendlyShapeShifterFriend Před 4 měsíci ⁺¹
Because it is made to complete it task,not to do it task
@DeusExRequiem Před 7 měsíci ⁺²¹⁰
If the AI runs through an entire future before deciding if it goes back and tries again with a different random outcome, and you are part of that future, then relying on your future self to make the choice would seem like the right response, but it's possible something happens in one future to alter your mental state and make you decide not to change a bad outcome, so you can't even trust yourself. The best outcome might end with you hating it.
@conmin25 Před 7 měsíci ⁺³⁴
The video already addressed this in a way, in the first scenario of blowing up the building you reach for the button to tell the machine to go back and try again but you get killed before you hit it. Reset button not hit = acceptable outcome. You could programed the machine to not let that happen but there are other scenarios witch you might intend to hit the button but can't. There is also the issue of time its self. How far forward can the machine see? Hours? Days? What if you don't realize the consequences of the wish until month later. Would the button still work then?
@patrickrannou1278 Před 7 měsíci ⁺⁵
You just have to not put in "must not happen" specificconditions, but "always must be" extremely generic conditions that don't rely on the effects of the wish itself.
I wish for my grandmother to come out of the building to stand near me within one minute, both of us safe and sound physically, emotionnally and mentally, in a way that if Ias I am right now, before the wish actually takes effect, could know in detail all the resulting effects of the actual wish, then I would still fully approve of these results, without having needed to actually learn those details myself, and alsom, the wish should not do any form of time travel in any of its effects.
This prevents your current AND future self from any fform of mental tampering, or ANY other bad result happeniing like ok she gets out but then gets hit by a car "only because" you made that wish.
Most probably then what would happen:
- Flames break a few windows, but no glass goes to hurt your mother.
- Pushed by the draft, flames seem to randomly avoid your mother in such a way as to "open a path" for her to simply walk out.
- She might hear a voice to encourage her along. Heck she might get a rush of adrenalin to find the strength to move out despite having bad legs.
Or:
- Flames break something.
- That makes a fit neighbour decide to leave his house and come rushing to help.
@tiqosc1809 Před 7 měsíci ⁺¹
machine doesnt accept english@@patrickrannou1278
@conmin25 Před 7 měsíci ⁺³
@@patrickrannou1278 But remember the machine is not magic it is still restricted physical laws. There may not be a possible outcome were "my mother to come out of the building to stand near me within one minute, both of us safe and sound physically, emotionally and mentally." What if every path of escape leads to some sort of injury, she burns her hand on a door knob, hits her head on a wood table, or inhales a large amount of smoke. Which of these options are preferred? That needs to be defined.
There is also consequence. Say the neighbor comes to help and rescues your mother unscathed but gets severely burned in the process. What if there is an option where your mother is minorly burned but the neighbor also only receives minor injury. Would the seconded option be preferred? That also needs to be defined.
@drifter2391 Před 7 měsíci ⁺⁷
@@conmin25 The thing is, it's physically impossible for every path to lead to injury.
Because there are an infinite amount of them.
It's only possible for it to be extremely unlikely.
But because it's only "extremely unlikely", the probability can just be manipulated back to confirm and gaurantee the mother gets out safe and sound.
You only need to understand the information given properly.
@vakusdrake3224 Před 7 měsíci ⁺¹⁷⁵
The fact you have to basically include your entire moral system within the wish for it to be foolproof, is also why you can actually game most wishes that accept english.
Since for most wishes you have the ability to just include a clause that says the wish is done according to how you were envisioning it just before making the wish (this gets more complicated with time travel).
Though of course with certain complex wishes just doing it how you envisioned it will be too limited by your imagination, and having the wish be granted according to your current conception is liable to lead to the wish granting entity just manipulating you (thus why you specify a past version of yourself as the reference).
@vakusdrake3224 Před 7 měsíci ⁺²⁰
This strategy does sort of extend a bit to AI alignment as well: Since with AI it similarly may be a less dangerous idea to use the AI's prediction of one's preferences at some point in the past. In order to ensure the AI doesn't just mind control you, since it's very hard to specify what is and isn't mind control when you get into it.
@SupLuiKir Před 7 měsíci ⁺¹⁰
@@vakusdrake3224 What's the practical difference between Heartbreaker and Contessa when it comes to convincing you to do something?
@adamrak7560 Před 7 měsíci ⁺⁷
This equals to befriending the genie(like true elignment). This is what exactly happens in Disney Aladdin. He even makes a wish when he is drowning and unconscious. Which is what the video describes at the end.
@chilldogs1881 Před 7 měsíci ⁺¹
That was what I was thinking, probs the best way to actually get what you wished for is to ask for what you are actually thinking off
@RandomDucc-sj8pd Před 7 měsíci ⁺¹
I have a proposition to a solution: Include a clause with each wish, such that if you do not explicity say “Keep Reality” within a certain timeframe, it will reset the timeline to before you made the wish and assign an extreme negative value to that timeline. This ensures the genie does not kill you, or make you mute, or do something bad, and that way, you can be 100% sure all future wishes are safe so long as you include that clause, as any future yous that were unhappy with the result would not say “Keep Reality” and therefore would not occur. You could set this timeframe to an appropriate amount of time, say if you wanted a dice to roll your way you would set the timeframe to 10 seconds, but say with your mother it could be 1 day as you need to make sure she won’t die from her injuries, etc.
@macleanhawley1742 Před 7 měsíci ⁺²⁸²
The animation quality of this one was absolutely phenomenal! And honestly the story telling was so good that I had a "ah ha" moment half way through. It's crazy to think that maybe the only effective AI we can make would have some neuromorphic or implied human morality encoded!These just keep getting better and better, thanks for making these!
@AtticusKarpenter Před 7 měsíci ⁺⁷
I fear, one any human cannot contain morality of entire humanity or even his society. So even if person who builded AI (and put his entire morale system in) will be satisfied with results, many others will not. And many moral problems just dont have "right" answer (like pro-life vs pro-choice, AI can make many very powerful arguments in defend of one of the sides, but its still not completely remove dissatisfation from other) so for good, effective AI it may need to understand human morality even better than we, humans, do
@tassiloneubauer5867 Před 7 měsíci ⁺¹
Like with self-driving cars I think this is not an insurmountable problem, because we are setting the bar low. Of course given the scope, such a scenario should be treated with outmost care (I think most scenarios actually going to happen will appear to hasty to me).
@certifiedroastbeefmaniac Před 7 měsíci ⁺²⁵
The Monogatari Series (yes i know, ugh anime) has a very smart quote loosely related to this: "Why do you think we don't say a wish when we want it to come true? Because the moment we try to put it into words, it starts to deviate from what we actually wanted in the first place."
Now my analogy is wishes are like fractals, we can zoom in more and more, define more and more boundaries, but there will always be more details, so its just better to squint and look at the whole thing at once.
@secretagentpasta4830 Před 7 měsíci ⁺³
Ohhh thats a very succinct way to sum up this whole video! Really really nice lil quote 😊
@jackdoyle5108 Před 7 měsíci ⁺⁴¹
"You have no idea how much difficulty we go through trying to understand your human values."
/人 ◕ ‿‿ ◕ 人\
@erikburzinski8248 Před 6 měsíci
Hello kyubey I wish for the ablity to grant anyone the ablity to choose there physical age and when they choose they will become that age over a period of 3 months through semi natural processes. Completely safe and unharmed with there body exactly the same as it was at the selected age. (How does it go wrong)
@gsilva220 Před měsícem ⁺¹
@@erikburzinski8248 It might go wrong if people lose memories, or if the "semi natural processes" turn out not being so natural...
@axelinedgelord4459 Před 10 dny
it’s actually funny in retrospect because kyubey grants wishes exactly as the contractee requests, he just doesn’t tell them that they become the incubators’ livestock, undergoing cruelties with no bound.
@XOPOIIIO Před 7 měsíci ⁺⁵⁵
ChatGPT seemingly shares the ethics of some part of humanity, but it's an illusion, in reality it only values the successful prediction of the next word.
@frimi8593 Před 7 měsíci ⁺⁵
well some of that is artificial/external, as it has preprogrammed blocks that prevent it from saying particularly disagreeable things. However, the other thing to consider is that chat GPT can also be convinced to reach ethical conclusions most people would flatly reject outright. This is because chatGPT effectively takes any ought statement you make as a first principle
@XOPOIIIO Před 7 měsíci
@@frimi8593 It's because it sees where you're going and trying to go along just to make sure that it has more chances to predict the next word.
@Kycilak Před 7 měsíci
But are we sure that the ethics (or indeed any part of mind) can't be deconstructed the same way?
@XOPOIIIO Před 7 měsíci
@@KycilakWhat do you mean?
@Kycilak Před 7 měsíci
@@XOPOIIIO With enough knowledge about an organism (human), you may be able to formulate its values such that they seem as absurd as " the successful prediction of the next word".
@TRquiet Před 7 měsíci ⁺¹⁹
This is absolutely marvelous. Not only did you provide an understandable, step-by-step breakdown of wish logic (which provides context for real-life moral philosophy), but you did it with an adorable dog animation. Amazing.
@DavidJohnsonFromSeattle Před 7 měsíci ⁺¹⁶
Literally, everything you just said equally applies to the act of conveying a message accurately to another person. You aren't talking about wishes or magic powers but actually about communication. If the communication is perfect, the wish will be too. Which incidentally solves this genie problem. You don't know need a genie that knows your wish before you make it and so grants it automatically. You just need another person with enough of a shared context that you can communicate with them fairly effectively.
@lolishocks8097 Před 7 měsíci ⁺³⁵
I was actually thinking about a story for an episode with a device exactly like this and it just went absolutely bonkers. With just the right understanding of reality someone with a device like this could quickly attain godly powers. Also, it ended with the biggest prank in the universe. There is a lot of things you could do relatively safely. A lot safer than living through them yourself.
@frimi8593 Před 7 měsíci ⁺⁹
it reminds me a lot of of the concept of "temporal reverse engineering" from hitchhiker's guide to the galaxy, wherein in addition to there being three spacial dimensions and a temporal dimension, there is also an axis of probability which some devices can observe through and traverse. The process of temporal reverse engineering essentially involves the user making a wish at which point the machine which can perfectly observe the entire universes on all 5 axes (called the new guide which was developed to sell the same copy of the hitchhiker's guide to the galaxy to the same family in infinite probable universes, thus generating infinite income at the cost of only one book) goes back in time and shifts the timeline along the probability axis at various key points to make it so that the wished for event already occurred. The new guide is observed to act like the safe genie in that it already knows what the user wants/needs and already made it happen, such that the current user never experiences misfortune... until they do and the guide is taken by a new user. In fact, each time it helps its current user, it's actually playing out a longer scheme which involves itself trading hands to fulfill the task originally set out for it, which is to destroy the earth in all realities. The destruction of the Earth is a highly uncertain event that happened in the main timeline we follow throughout the series, but not in every timeline. Because its a highly uncertain event, looking down the probability axis shows a series of timelines alternating whether the earth is there or not. Each time the new guide swapped hands and helped its new user, it was simply ensuring that that user would end up in the right place at the right time later down the line for there to be absolutely no trace of the earth left in any timeline.
@cewla3348 Před 3 měsíci
@@frimi8593 amazing book series!
@NagKai_G Před 7 měsíci ⁺²¹
The phrase "I wish you to do what I should wish for", for as many flaws and technicalities as it may hold, really sounds like one of the best wishes a person could make
@Prisal1 Před 7 měsíci ⁺⁴
is it up to the thing to decide what you should wish for
@bitrr3482 Před 7 měsíci ⁺²
@@Prisal1 and to find out what you should wish for, it reads your mind, and what you want. it now contains all of your morality, and knows what to wish.
@BayesianBeing Před 7 měsíci ⁺⁴
@@Prisal1that's the thing. A good genie is only good when its goals and values are fully aligned with yours. So it knows exactly what you will wish for
@NexosPlace Před 7 měsíci ⁺¹
If you ask for that, the machine will pick a wish you could have made from rand(1^infinite) cause you never defined a scope.
You will most likely get your mom out safe and sound, but who knows what else might happen.
@cewla3348 Před 3 měsíci
add a clause that says "that gets my mother out with minimal harm done to anything whatsoever", just to be sure. you now rule out all possibilites that end in death.
@pendlera2959 Před 7 měsíci ⁺⁸³
A few points to keep in mind when coming up with solutions here:
1. If the solution violates the laws of physics, the machine just gives an error code 1:19
2: If your only measure of success is your mother's safety/health, then potentially anyone or anything else might be harmed. 8:00-8:50
3: The machine picks the first "answer" that fulfills your wish based on random chance, so the more probable an answer, the more likely it is to be picked first. That's why you have to rule out anything you don't want. A dam breaking and putting out the fire while killing your mother might be more probable than the firefighters getting there sooner.
4: It's not super clear, but I think the machine only works from that point on. It can only change the future, not the past. You can't wish for the fire to not have started once it has.
@zotaninoron3548 Před 7 měsíci ⁺¹³
My 'wish' would be to preserve my capacity to hit the reset button. That if I lost control of the device or in any way become harmed it would reset for a band of time in which I could make assessments. Then I could reset any result that passed the automatic reset criteria with my own judgement. And the virtual time versions of me that resulted would veto the more unfavorable outcomes.
@Vidiri Před 7 měsíci ⁺⁶
@@zotaninoron3548 So the entirety of a human morality, in other words?
@CaedmonOS Před 7 měsíci ⁺⁷
@@zotaninoron3548which hilariously enough would mean you wouldn't even need to make a wish
@alittlefella985 Před 7 měsíci ⁺³
But what if you wished for the health and safety of every human and mammal in the vicinity?
@CaedmonOS Před 7 měsíci ⁺¹
@@alittlefella985 just by random chance because of quantum jiggling everything in the area is cryo Frozen
@zotaninoron3548 Před 7 měsíci ⁺²⁰
My instinct reaction about a third of the way through the video is to ignore the mother and focus on guaranteeing my capacity to use the reset button. It would automatically reset if I lost that capacity, and I could then reset any outcome in which wasn't aligned to my interests.
@4dragons632 Před 7 měsíci ⁺²
The outcome pump will kill you any time you reach for the reset button, because futures where you press it are the worst possible futures for the pump so it will do anything to pick a future where you dont press it.
@Vidiri Před 7 měsíci ⁺¹⁰
@@4dragons632 They mean making their wish something like "I wish I retained full power to push the regret button" so that the pump is forced to pick a future in which the maker of the wish would not want to press the regret button, despite still being fully able to.
This would ensure any future where you physically could not push the regret button was avoided, as well as futures bad enough to make you press it. It's essentially the only wish you could make that would ensure the outcome would align with the entirety of your morality (at least as far as your perspective is concerned)
@4dragons632 Před 7 měsíci ⁺⁴
@@Vidiri It doesnt accept english inputs though, you'd need to somehow get the 3D scanner to include information about you being able to press the regret button and still not pressing it. Still though, you would hope that would be possible and inbuilt into the next model of the pump.
@zotaninoron3548 Před 7 měsíci ⁺⁷
@@4dragons632 The video includes examples of addressing a multitude of contingencies that you could try to import in a futile attempt to address all possible wrong outcomes, including the physical state of the mother. I would assume defining yourself as unharmed, unrestrained and capable of performing a specific gesture on the side of the device prior to a time limit or a reset occurs automatically would be possible.
This is just me thinking about it offhand, I am curious what holes people could punch into this solution. Because I'm more inclined to think I'm missing something than that I've found a complete solution to the analogy given.
@4dragons632 Před 7 měsíci ⁺⁷
@@zotaninoron3548 In that case maybe the pump is smashed flat by the falling beam instead of you. Or you suffer a stroke that puts you in a permanently happy hallucination. Whatever it takes to not have the button get pushed.
@yaafl817 Před 7 měsíci ⁺³⁰
To be fair, as a programmer, I'm pretty sure a simple enough algorithm could still give you a good enough result, or at least tone down the amount of possible results enough for you to pick one. Yes the results are always infinite, but you can sample them by outcome distance. If you like one, or like some particular property of one, you could extract them and go through an iterative process to find a solution you're happy with.
Basically, a wish search algorithm.
@Zippyser Před 7 měsíci ⁺³
That my friend is thinking with space rocks well done sir. One often forgets about such elegant solutions.
@sophialaird6388 Před 5 měsíci
E.G, “Keep my mother alive for as long as possible”?
@ultimaxkom8728 Před 5 měsíci
@@sophialaird6388 With the original concept: your mother would then have quantum immortality, since _"for as long as possible"_ points to infinity. Also, what is _"alive"_ anyway?
@sophialaird6388 Před 5 měsíci
@@ultimaxkom8728 that could be true, but it’s a lot easier to make “live for as long as possible” something you can live with than “get your mother as far away from the building as possible”. The original goal in the video is misaligned.
@RobbiePT Před 3 měsíci
Exactly, there's a spectrum between horrible outcomes and a perfect outcome full of pretty decent outcomes. Like and 80/20 rule of wishes. Get 80% of the utility of a perfect wish for 20% of the effort. Really, probably like a 0.01/99.99 (or even more extrem) rule in this case considering the difficulty of encoding or learning "an entire human morality"
@ethanstine426 Před 7 měsíci ⁺³²
I kinda feel bad for laughing through a not insignificant portion of the video.
@Yitzh6k Před 7 měsíci ⁺⁹
Imagine instead of an emergency reset button you were to have a "continue" button with a preset timer. If you haven't pressed continue after the time has elapsed, all is reset. This uses your own brain as the judgement system, so it is "safe"
@terdragontra8900 Před 7 měsíci ⁺⁶
something else could push the button, if it has to be your finger your finger could be ripped off, if your health cant be harmed you could press it on accident, if you forbid such accidents im really impressed you programmed it to be able to tell what an accident is
@oasntet Před 7 měsíci ⁺⁶
More importantly this is just equivalent to an AI that has to check every decision with a human. The probability pump is a rough analogy for AI, but the reset button is a human-in-the-loop system, which an AI cannot be to be useful.
@cewla3348 Před 3 měsíci
@@oasntet it remembers previous, denied things and avoids stuff like that. Are you really calling the GPTs not ai?
@cewla3348 Před 3 měsíci
@@terdragontra8900 if you do not think about pressing the continue button, it restarts. if you lose thought, it then restarts. if you die, it then restarts.
@terdragontra8900 Před 3 měsíci
@@cewla3348 at the moment, we can't reliably scan someones brain and measure if theyve thought about something (even though theres interesting brain scan stuff that can kind of do). also, even if we solve that, that doesnt prevent cases where you are manipulated into thinking everything is fine even though it absolutely isn't if you are thinking straight and have all the information (such as a horrible thing happens and is completely hidden from you, or you are drugged in some way, or it feeds you "propaganda" and changes your mind)
@matthewgamer1294 Před 7 měsíci ⁺⁵
There's a Simpson Treehouse of horror episode, where homer asks for a turkey sandwich in detail, so it is a "wish that can't possibly go wrong" and than the meat is dry. No wish is safe.
@onedova2298 Před 7 měsíci ⁺¹¹
We play d&d and we learned that wishes always have a catch if you don't choose your words wisely.
@zacharyhawley1693 Před 7 měsíci ⁺¹
In D&D Wishes are best to replicate other spells. Especially ones with long casting times or other annoyances. The monkey paw thing was supposed to be optional.
@onedova2298 Před 7 měsíci ⁺²
@@zacharyhawley1693 I didn't really think about that. I guess we used the teleportation spell more than anything else without knowing.
@zacharyhawley1693 Před 7 měsíci
@@onedova2298 You were using it right RAW. The monkeypaw thing is only supposed to happen if you try to exceed what a 9th level spell can reasonably do.
@namename1302 Před 7 měsíci ⁺⁸
I know this video is about AI alignment, but I think it introduces the basis for a problem that applies to other humans as well (and, in doing so, reflects back on the entire concept of AI alignment).
The outcome pump obviously doesn't 'get' your human wishes in the same way another human would. If you asked a HUMAN to 'get my mother out of that burning building', they would almost certainly come up with a solution that adheres at least somewhat to your set of preferences. I think it's pretty obvious that this is because the outcome pump lacks any cultural context. Most people share a pretty large subset of general guidelines with most other people- including 'i would prefer if my parents lived longer rather than shorter, all else being equal', among many, many other guidelines, which are intuitively grasped in order to realize the real wish: some nebulously-defined idea of 'rescue'.
However, the argument put forward in this video remains valid. There IS no safe wish short of explaining your entire morality and value structure. This applies even to requests with no particular guarantee of success - as with AI, and as with other people. Asking for help from another person is, in theory, exactly as poorly defined as asking for help from an AI- there's just more cultural context to clue fellow humans in.
Ultimately, this reflects on the AI alignment issue- Yes, it's infeasible to comprehensively explain to an AI exactly what moral choices you want it to make every single time. But, it's at least equally infeasible to explain the same to another human. In the video, you note that an outcome pump which IS somehow perfectly aligned to you, would need no instruction at all. Putting aside the possibility of a human failure in reasoning - which would hardly be a point in the humans' favor anyway - the same is true of a human being who has somehow been convinced to agree with you on literally every single issue of ethics and motivation - which is arguably an even more absurd concept.
To be clear, I don't personally trust AI very much (as a non-expert). But I think the suspicion people reasonably give it is revealing, given that human beings are equally incomprehensible, while also being more prone to logical mistakes and conflicts of interest.
@zygfrydmierzwinski6041 Před 7 měsíci ⁺¹¹
Animation quality grows exponentially from video to video, and I love it.
@bennemann Před 3 měsíci ⁺²
Eliezer Yudkowsky (the author of the text of this video) wrote an incredible 133-chapter long fanfic called "Harry Potter and the Methods of Rationality", set in an alternative universe where Harry has the I.Q. of a gifted genius and solves many of the wizarding issues with logic rather than magic. I cannot recommend it enough, it is probably the best derivative work of Harry Potter in existence! I read a couple years ago and I still think about it frequently.
@t_c5266 Před 7 měsíci ⁺²
First wish would be something along the lines of "I wish the intention of my wishes is explicitly understood and my wishes are fulfilled as I intend."
There you go. Wishes are now fixed
@joz6683 Před 7 měsíci ⁺²⁵
This channel never ceases to amaze me. The depth and breadth of the videos are phenomenal. The videos cover subjects that I did not know that I needed. Thanks to everyone involved for your tireless work.
@parz1val205 Před 7 měsíci ⁺²⁴
genuinely a work of art, the animations and writing are top tier and the entire premise is really what i think the world needs to be thinking about right now given current events :|
Thanks for all your great work
@isaaclinn2954 Před 7 měsíci ⁺²
One of the reasons I loved HPMOR was because Harry immediately tried to use the time turner to factorize the product of two large primes, the failure of which gave us a reason why he can't find the solution to any problem whose solution is verifiable and whose search space can be ordered. Eliezer is an excellent author.
@theeggtimertictic1136 Před 7 měsíci ⁺⁴
This animation gets the point across very clearly and deals with what could be a heavy subject in a light hearted and entertaining manner ... well done 👏
@jonhmm160 Před 7 měsíci ⁺¹⁶
This shows very well the challenges of alignment from an individual perspective, but for the human race as a whole it's even worse/harder. I don't think there is a single person in the world I would be ok with giving Superintelligence like powers. Even tough he would still be aligned with himself, it's a big gamble that it would create a great society for everyone else. So in essence we need a superintelligence to have some sort of super morality which is aligned with the entire world if one such thing even exists.
@Woodledude Před 7 měsíci ⁺¹
That, or just create a diverse array of superintelligent entities using human minds as bases for each one. That way we're not picking *just one person,* but hopefully representing a good breadth of hunanity.
@conmin25 Před 7 měsíci ⁺⁷
@@Woodledude But then we would have the same problem humans have. That we don't agree and sometimes don't get along. Even the best intentioned humans can spark conflict with there differing beliefs and opinions. If we just put a variety of the best human moralities (if such a thing can be jugged) in these AIs then they would also argue and spark conflicts.
@Woodledude Před 7 měsíci ⁺⁵
@@conmin25That's much better than there being no argument about an objectively terrible direction for humanity. No argument with enough power behind it to matter, anyway.
It doesn't really stop humanity going in a terrible direction, but it does at least make it less likely - At least, given the constraint that we MUST construct at least one powerful AGI.
Having the same problems we do today, but on a greater scale of intelligence, is better than having an entirely novel problem on top of all the other ones - That being an effectively omnipotent dictator.
And if we're actually careful about our selections, MAYBE we'll actually get a group of human-base AGIs that are actually trying and succeeding in doing good in the world.
AGI research is basically a field of landmines, where the goal is to find one that's actually a weight-activated chocolate fountain that turns off all the other landmines.
It's, uh... Not pretty.
The only real option is proceeding with incredible caution, and being certain of everything we do before we do it.
@supernukey419 Před 7 měsíci
There is a proposal called coherent extrapolated volition that is essentially a supermorality
@terdragontra8900 Před 7 měsíci
sometimes humans in "conflict", in the broad sense, "fight" in a way that doesn't involve, you know, death and other things we'd definitely like to avoid. its not necessarily bad if the AIs compete with each other, wrestle for influence, etc, if there's a system where AIs are more likely to "win" if we like them more. but, i have no idea if that's a feasible type of system, it may not be. @@conmin25
@ianyoder2537 Před 7 měsíci ⁺⁶
In my own personal stories, genies, like all other magical creatures and phenomena still have rules they must follow. In the genie's case it's the law of conservation: matter, energy, and now ability and ideals cannot be created or destroyed only transferred from one form to another.
So hypothetically you say "I wish I had a beautiful kind loving girlfriend." Well the genie can't simply create a another person, so the genie must then find a woman who's beautiful and kind to love you. However the genie cannot create the feelings of love, so it takes the feelings of love out of some one else, modifies sed feeling to apply to you, and implants it into sed woman. Well were did this stolen love come from? Well the genie will take the path of least resistance and find the closest relationship to draw from.
So in essence. In order to have a relationship of your own the genie ended a relationship of some one close to you.
@Sparrow_Bloodhunter Před 7 měsíci ⁺¹
"I wish that you would do what I should wish for." is such an incredible genie lifehack.
@Dawn-Shade Před 7 měsíci ⁺¹
I love how the thumbnail has reflection that is different for each eye glasses, it actually creates 3D effects when viewed in crossed-eye!
@ChaiJung Před 7 měsíci ⁺⁵
The biggest problem with all of these Monkeys paw type scenarios are the assumption that the Djinn or wish granter has a condition where they only understand literalisms and are buttholes. If I go to carpenter and want to buy a chair, I'm going to get a chair and it'll be within the general understanding of a chair and NOT some bizarre addition or concept outside of what's understood to be a chair. If I'm interacting with a powerful wish granter (who doesn't have the ability already to understand normal language), I'd likely get my wish
@Egg-Thor Před 7 měsíci ⁺⁴
This is one of your best videos yet! I'm so happy I subscribed to you back when I did
@alexwolfeboy Před 7 měsíci ⁺²
Oh my Dog, I adore the animation on the video. I know it was all talking about your grandma dying... but the little paw was too adorable to be sad!
@Julzaa Před 7 měsíci ⁺²
Your production quality is phenomenal, you are amongst the few creators on youtube I really wish they'd have 10, 20x more subscribers! And the team behind this is huge, can't say I'm surprised. Props to all of you 👏
@imangellau Před 7 měsíci ⁺³
Absolutely love the production of this video, including the music and sound effects!!✨
@vanderkarl3927 Před 7 měsíci ⁺¹²
Are we even sure that a genie with an entire human morality would be safe? Whose?
If not, are human moralities coherent enough to take a weighted average or union or what have you? I imagine we'd all get along a lot better if that was true.
@callen8908 Před měsícem
Newly discovered your productions. You excite my brain, and inspire me beyond words. I cannot thank you enough
@mathpuppy314 Před 7 měsíci ⁺¹
Wow. This is extremely well made. One of the best videos I've seen on the platform.
@enjoy_life_88 Před 7 měsíci ⁺⁴
Wow! I wish you millions of subscribers, you deserve them!
@granienasniadanie8322 Před 7 měsíci
RAndom glitchin youtube's algorithms gives them million subscribers but glitch is quickly detected and channel is took down by youtube.
@tornyu Před 7 měsíci ⁺⁹
Honest question: could you make successively better outcome pumps by starting with a weak one (can reset time n times per wish), and then use it to wish for a new outcome pump that is 1. more moral and 2. more powerful (can reset time n+1 times), and repeat?
@conmin25 Před 7 měsíci ⁺⁷
You would still have to define for it what is more moral or not so that it knows if it is getting closer to that goal. Which if you can define all morality in machine language you're already done.
@cewla3348 Před 3 měsíci
@@conmin25 you decide if it's moral or not?
@veritius340 Před 7 měsíci ⁺¹
The Outcome Pump not checking to see if the user is incapacitated and unable to press the Regret Button is a pretty big design oversight.
@Jellyjam14blas Před 7 měsíci ⁺²
Holy moly! the animation is so amazing! And the discussion about wishes is really well thought out and nicely presented :D
@newhonk Před 7 měsíci ⁺³⁷
Extremely underrated channel, keep it up! ❤
@morteza1024 Před 7 měsíci ⁺³
If the device is complete, your brain can be it's function.
At least three things are enough but it can be optimized more:
It needs to store a specific time in order to reset to that time. This number can be manually adjusted.
Reset button so if I don't like the outcome I press the button.
Auto reset 100 years after that specified time on the device so if I somehow died or was unable to press the button or change the time on the device it will reset automatically.
@minhkhangtran6948 Před 7 měsíci ⁺¹
Wouldn’t that just trap you into how many years you live + 100 years loop, more or less? That sound like hell
@gamingforfun8662 Před 7 měsíci
You would need to add a way yo prevent the loop
@morteza1024 Před 7 měsíci
@@gamingforfun8662 Move the reset time forward.
@morteza1024 Před 7 měsíci
@@minhkhangtran6948 You won't remember any of them because they're the same as not happened and you can move the reset time forward.
@gamingforfun8662 Před 7 měsíci ⁺¹
@@morteza1024 stopping the flow of time to just live multiple lives I don't even remember doesn't sound so good
@dodiswatchbobobo Před 7 měsíci ⁺¹
“I wish to gain the thing I am imagining at this moment exactly as I believe I desire it.”
@hiteshadari4790 Před 7 měsíci ⁺³
What the hell, that was brilliant animation and great narration, you're so underrated.
@smitchered Před 7 měsíci ⁺¹⁰
I like how you guys, and Eliezer, and the general LW community are going the hard route for convincing people of AGI's dangers. Not going the easy route by eg mentioning a terminator-style apocalypse or saying that we should regulate globally, because China or something. I get that this makes sense to divert as much attention to alignment, true, technical alignment, as possible, but I imagine this is also the natural consequence of raising oneself to be loyal to good epistemics, instead of beating the other tribe at politics or something. You point out the real problems, which are hard to understand, inferentially far away, and weird, out of the Overton window. Good job, as always!
@vanitythenolife Před 3 měsíci
Never thought id watch an 11 minute video going in depth to human morality and genies but here I am
@ambrosia777 Před 7 měsíci ⁺¹
Outstanding episode. From animation to story, you've done amazingly
@SatanRomps Před 7 měsíci ⁺³
This was wonderful to watch
@celestialowl8865 Před 7 měsíci ⁺⁸
An outcome that is "too unlikely" somehow resulting in an error implies a solution to the halting problem!
@kluevo Před 7 měsíci
Alternatively, running through the scenarios of something 'too unlikely' causes the outcome processor to overheat and crashes. The program isn't halting, it just fried the computer.
Perhaps another processor sees the outcome processor crashed/is non-responsive and sent an error code?
@celestialowl8865 Před 7 měsíci
@@kluevo Maybe, but then you're always at risk of frying the computer because you can never know which wishes are infinitely unlikely lol
@tornyu Před 7 měsíci
I interpreted it as: the outcome pump can reset time a finite number of times per request. More powerful pumps can reset more times.
@ErenYeager-xk3cy Před 7 měsíci ⁺¹
What a freaking god damn amazing video. The soundtrack, the narration, the animations, the script, the editing.
Abso-fucking-lutely perfect!!!
@cuppajoe2 Před 7 měsíci ⁺¹
Another great video from you guys. Keep it up!
@fluffycat679 Před 7 měsíci ⁺⁷
Now, I know the Outcome Pump has no ill intentions. It can't, and isn't, actively trying to upset me, and its simply a matter of how I use it, and its illogical to hold against it the unsatisfactory outcomes that result from my misuse of its power. But, with all that being said... It blew up my house. So no, we are not friends, it killed my mother.
@jansustar4565 Před 7 měsíci ⁺⁶
(as mentioned in another comment) use yourself as the evaluation function).
Option 1:
After N years at most, determine the satisfaction of myself (and maybe other people I care for) about the outcome of the scenario.
The only problem with this is if the insides of your brain is modified to adjust the evaluation function, which isn't all that nice, but you can get away with adding a potential test to see how close your mentality is to the mentality of before. This still has some problems, but is way better than the alternatives.
Option 2:
On first activation: Change the events in a way such that the second time I activate the machine, I will choose the evaluation function I would be the happiest with with my current state of mind. Not activating it a second time (within a timeframe?) is an automatic reset.
With this, you bypass the entire problem of "There is no safe wish smaller than an entire human morality" by encoding the entire human morality inside of the eval function.
@vakusdrake3224 Před 7 měsíci ⁺⁴
Given this scenario I'm not really sure how this avoids it just granting wishes in ways that lead to the button being pressed twice without your involvement. The point the video made about it ensuring you don't press the regret button generalizes to most other similar sorts of measures.
@jfb- Před 7 měsíci
your brain is now in a jar being constantly flooded with dopamine.
@celestialowl8865 Před 7 měsíci ⁺²
Whos to say you understand your own mind well enough to know that the outcome produced to maximize your own happiness will be the outcome you desire in the instantaneous moment?
@nikkibrowning4546 Před 7 měsíci
This is why I like the phrase, "Without otherwise changing the state of (person) or any other being, do thing."
@cefcephatus Před 7 měsíci
This is phenomenal. The phrase "I whish for you to do what I should wish for." Is powerfu. And what that unsafe genie we're talking about? Yes, it's just us.
@smileyp4535 Před 7 měsíci ⁺⁴
I always thought the best wish was "perfect knowledge and ability to make and fullfil the best possible wish or wishes from my perspective accross all time and outcomes" or "the ability to do anything" and essentially become god
I'm not sure if those actually are the best wishes but I've put a loooot of thought into it 😅
@minhkhangtran6948 Před 7 měsíci
Hopefully what’s best for you isn’t accidentally apocalyptic to everything else including your mother then
@CaedmonOS Před 7 měsíci
@@minhkhangtran6948unlikely as I assumes he probably doesn't want a reality that would harm his mother or cause an apocalypse
@michaeltullis8636 Před 7 měsíci ⁺⁴
Phillip K. Dick said that "For each person there is a sentence - a series of words - which has the power to destroy him." There almost certainly exists an argument which would persuade you to become a libertarian, or a communist, or a Catholic, or an atheist, or a mass murderer, or a suicide. If you gained "perfect knowledge and the ability to make and fulfill the best possible wish or wishes from your perspective across all time and outcomes", your perspective would change. What would it change to? I figure it must depend on what order you hear all the mysteries of the universe. And if the values you have as a god depend on the details of your ascension, a hostile genie could just turn you into the god they want around (or a god that unmakes itself).
@gabrote42 Před 7 měsíci ⁺³
So many of these is great, and while I miss Robert Miles' standalone content, this is not too bad a substitute
@theallmemeingeye5927 Před 7 měsíci ⁺¹
I'm so glad you made this, it's one of my favourite stories by EY
@ApprendreSansNecessite Před 7 měsíci ⁺¹
This is so well written. Bravo!
@Flint_the_uhhh Před 7 měsíci ⁺⁴
This reminds me of Fate/Zero.
⚠️⚠️SPOILER!!!! ⚠️⚠️
The main character was a contract killer who has seen the worst sides of humanity - wars, famine e.t.c
At the conclusion of he story, he obtains a wish granting device and makes a wish to save all of humanity from these problems.
He doesn't know how to save humanity, but his train of thought is that since it is a wish granting device, it will surely know of a way to accomplish this goal.
However, since he himself cannot fathom of a way to save humanity and was simply hoping the device would perform a miracle, the device tells him that it will grant his wish through methods he can understand.
The device then decides to destroy humanity, since it's technically a way to save humans from all our problems, and also a solution that he can fathom.
@SQUID_KID102 Před 7 měsíci ⁺³
this channel is better then that one with ducks
@sentzeu Před 4 měsíci
One of the good Elizer parables.
@HansLemurson Před 7 měsíci ⁺²
I _WISH_ that this video becomes famous.
@aleksythehorse5984 Před 7 měsíci ⁺³
I love how the firefighter was so tall that he had to duck under the door frame.
Hot < 3
@theeggtimertictic1136 Před 7 měsíci ⁺²
Yes I noticed that too... a nice detail 😊
@guillermoratou Před 7 měsíci ⁺⁴
This is mind boggling but also very simple to understand 🤯
@vev Před 7 měsíci
The Outcome Pump analogy is intriguing
@AlfiePT Před 7 měsíci ⁺²
Just wanted to say the animation in this episode is amazing!
@Cqlti Před 3 měsíci ⁺⁷
bro should have just wished he could walk
@NickTaylorRickPowers Před měsícem ⁺¹
Didn't specify he couldn't walk using only his hands
Now their both fkd
@BrunoPadilhaBlog Před 5 dny
Didn't specify how fast.
Now he can walk at 1 meter per hour.
@a_puntato29 Před 7 měsíci ⁺⁵
this entire video i was just thinking 'get my mother out of the building alive without any bodily or mental harm to her or any other object or being'
kinda frustrating, but a really well made video regardless!!
@lucas56sdd Před 7 měsíci ⁺³
Define "Harm"
@a_puntato29 Před 7 měsíci ⁺¹
@lucas56sdd i mean you could say the same thing to literally any word used in the video itself- what does the house exploding mean? im pretty sure we're assuming that whatever cosmic deity or wish machine we're using can understand basic words
i mean, the creator does say otherwise at the beginning, but i cant think of any logical way you'd specify everything that followed in the video without the words used, so.. man idk my train of thought is entirely gone
@minhkhangtran6948 Před 7 měsíci ⁺³
Granted, now she is a living unfeeling statue made out of diamond, so she felt no bodily or mental harm
@ShankarSivarajan Před 7 měsíci ⁺⁵
@@a_puntato29 The point is that _your_ "basic word" smuggles in your entire morality. So you're not really disagreeing with the point made here.
@pendlera2959 Před 7 měsíci
I think you'd get an error code. Simply having the house on fire means that many objects will be harmed whether your mother survives or not.
@clockworkjirachi6437 Před 6 měsíci
Author of Clockwork Jirachi here. I can say from experience that this is pretty much how it works. Don't want scrutiny? Just insist on scrutiny not being a factor in the proxy you should choose. Simple.
@error-eb3mc Před 7 měsíci ⁺¹
great vid. ai misalignment is definitely a big issue, and this analogy is wonderful ;) keep up the amazing work ra! also hi robert i love your work
@atomicflea4360 Před 7 měsíci ⁺⁴
I understood every thing and totaly not going to have to reaserch thearetical phisics
@beowulf2772 Před 7 měsíci ⁺³
hii
@IllusionisticOrtus Před 2 měsíci
Great video with a perfect way to explain it with good examples
@AltDelete Před 7 měsíci ⁺²
THANK YOU. AI is whatever, what I'm trying to do is be ready with the right wish parameters for a potential genie scenario, and this is a good angle. Maybe the best angle I've heard. Thank you.
@Kankan_Mahadi Před 7 měsíci ⁺²
Augh~!! My brain~!! Too much complexity~!! It hurts~!! But I absolutely love the animations & art style - so adorable.
@MAKiTHappen Před 7 měsíci
Amazing video, amazing explanation, amazing animation. Some rather smart genie out there really made it happen
@mungelomwaangasikateyo376 Před 5 měsíci
I love how Mom is so calm
@breadwatcher3908 Před 7 měsíci
The first couple minutes of this helped me better understand outer wilds
@otktoxoid1873 Před 7 měsíci
love the thumbnail, each reflection has its perspective, and does create a 3d effect
@carsont1635 Před 6 měsíci
And this is what we're walking (maybe even running) running towards. In the real world. Right now. Im trapped in existential horror and the tiniest sliver of hope. Godspeed Paul Chistiano and all the wonderful AI alignment researchers.
@alpacaofthemountain8760 Před měsícem
Great video! Love the ways it makes me change my mind
@StrayVagabond Před 3 měsíci ⁺¹
On the other hand, "i wish for you to grant my wishes as i intend them, not as you interpret them, causing the least amount of pain and suffering required to fulfill them"
@alterego3734 Před 3 měsíci ⁺¹
There is a relatively safe wish way smaller than human morality: save the mother, while minimizing the number of futures explored (an upper bound could also be set). For example, only allow the device to pick one out of a thousand possibilities. In the worst case, it picks the worst out of a thousand, which is not that bad, as it could have happened anyways with a probability 1/1000.
A slightly more complicated but better way is to define a local distance function, and then minimize the distance from a typical future within the vicinity of the desired change. While a meaningful distance function is non-trivial to define, it does _not_ require "all of human morality". A relatively simple AI that understands which scenario is close to another one is enough.
In fact, this is how natural language works. When someone says "I want my mother to be saved", the other speaker doesn't need "human morality" to understand the statement. Implicitly, there is an "all else being equal" appended.
@ballom29 Před 7 měsíci ⁺¹
Thinking about it there is a very easy fix to make this wish granting machine much safer and predictable.
The whole problem is than we ask for an output , and the machine will give a us valid output, that might not align with our expectation.
But it's actually TWO outpout we want, and the whole problem happen because we are trying to specify each elements of the set that encompass the second output.
What are the two output ?
--> "I want my mother out of the house, and be happy with the outcome"
specifying "happy with the outcome", mean any future that would make you unhappy, mother dead, harmed, traumatised, great damage created to save her, etc... will not be considered, since you wouldn't be happy in thoses futures
or if we try to avoid some semantic due to the vagueness of the term "happy" :
--> "I want my mother out of the house, and don't regret the outcome and I am still alive and unharmed"
Effectively makign the regret button useless, elimiting any future where you do'nt have regret while also avoiding future where you don't have regret because you are dead/unconscious
@nattol432 Před 7 měsíci
Excellent as always!
@X-SPONGED Před 7 měsíci
A general backup that was proposed for making wishes is adding "as per my expectations" or "as how I imagine it" as a clause after stating your wish. For example "I wish to have unlimited power as per my expectations", now instead of being blasted by a heat ray supplied with infinite power, you have unlimited power as to your expectations. However, it is paramount to understand that the genie can simply decline by saying it cannot know what your expectations may be and if such a situation does occur, you're back to hiring a team of lawyers for at least 2 years to language check your 500+ page wish contract for you, could be faster or slower with consultation from the genie itself.
@rossrobots5160 Před 7 měsíci ⁺¹
The thought cannon from adventure time is an example of how a safer class genie could be unsafe, if it just reads your mind, the wish could be an erratic thought

Další v pořadí

Automatické přehrávání

Everything might change forever this century (or we’ll go extinct)