how a simple programming mistake ended 6 lives

Sdílet
Vložit
  • čas přidán 17. 05. 2024
  • I've been told the worst thing that can happen to a developer is their code crashes in production? Well.... what happens if that production environment is in a hospital? This video tells the story of one of the Therac-25 incidents, and how Ray Cox ultimately died because of a programming error in a safety critical system.
    Therac-25 Paper: web.stanford.edu/class/cs240/...
    Therac-25 User Interface: web.mit.edu/6.033/2007/wwwdoc...
    🏫 COURSES 🏫 Learn to code in C at lowlevel.academy
    📰 NEWSLETTER 📰 Sign up for our newsletter at mailchi.mp/lowlevel/the-low-down
    🙌 SUPPORT THE CHANNEL 🙌 / lowlevellearning
    🔥🔥🔥 SOCIALS 🔥🔥🔥
    Low Level Merch!: lowlevel.store/
    Follow me on Twitter: / lowleveltweets
    Follow me on Twitch: / lowlevellearning
    Join me on Discord!: / discord
  • Věda a technologie

Komentáře • 1,1K

  • @LowLevelLearning
    @LowLevelLearning  Před 8 měsíci +88

    🏫 COURSES 🏫 Check out my new courses and find the SECRET discount code at lowlevel.academy

    • @sidhuplayz4281
      @sidhuplayz4281 Před 7 měsíci

      hi

    • @gipugly
      @gipugly Před 6 měsíci +1

      no

    • @nicholaschaves716
      @nicholaschaves716 Před 5 měsíci

      yes@@gipugly

    • @julesviolin
      @julesviolin Před 5 měsíci

      That looks similar to the Varian 600 I used to work on in 2003 in UK
      Training was completed in Milpitas.
      I assume this couldn't happen in our machines ?

    • @___Zack___
      @___Zack___ Před 4 měsíci

      Great video, new subscriber here. Just pointing out something constructive - "interface" and "interlocks", not "innerlocks", etc. They are literal opposites . Thanks for the efforts

  • @HardcoreGamers115
    @HardcoreGamers115 Před 8 měsíci +6760

    "never tested until it arrived at the hospital..." Thats got to be the worst case of testing in production ever recorded

    • @me0101001000
      @me0101001000 Před 8 měsíci +391

      Speaking of which, wanna go to the Titanic in a submersible?

    • @SoftAsFur
      @SoftAsFur Před 8 měsíci +116

      only to be topped by the OceanGate Titan sub.

    • @boy_deploy
      @boy_deploy Před 8 měsíci +51

      Maybe the software has been tested only on a hardware emulator i.e. another software sending/receiving data.. Which may lack the simulations for the actual hardware delays.

    • @SandburgNounouRs
      @SandburgNounouRs Před 8 měsíci +18

      tested that it COULD work, or that it CAN work. The first is the dev pov "set it like that, it works". The second is a UX pov, stories and personnas are taken in account.

    • @Namrec_Molai
      @Namrec_Molai Před 8 měsíci +6

      Man when some minor employee screws a document they fire him and his job is done, he can retire and enjoy staying in home with no other chances to screw up anything else never again
      I just dunno who can fire this...
      Self made, self tested, self guaranteed
      If you cant do it, dont start in the first place, You murderer

  • @MindlessTurtle
    @MindlessTurtle Před 8 měsíci +3715

    You'd think the medical industry would get better at preventing this kind of thing. Instead, they've hired better lawyers to deny this kind of thing.

    • @NithinJune
      @NithinJune Před 8 měsíci +71

      i hope thé family got a large payout from the company and the hospital

    • @Acetyl53
      @Acetyl53 Před 8 měsíci

      Their entire industry is a scam. House of cards built on lies used to prop up another house of cards. This is standard dark triad operating procedure. 100% expected, in fact, you should just assume it and look for the exceptions.

    • @blacklistnr1
      @blacklistnr1 Před 8 měsíci +164

      There is actually an ISO with very strict guidelines on how to develop such critical software, one thing which particularly stood out to me is static memory i.e. you know all the necessary resources beforehand and prepare for the worst case.
      There's also a tracing level for the documentation, where the max level is being able to trace all use cases through requirements to every line of code responsible for them.
      So there definitely are methods for prevention.
      NASA also has some interesting design processes, if you're interested in reading. Using an old, specific JavaScript engine in space is one of the consequences I find quite funny.

    • @charlieking7600
      @charlieking7600 Před 8 měsíci +18

      ​@@blacklistnr1which one? Could you share the number of ISO standard?

    • @Namrec_Molai
      @Namrec_Molai Před 8 měsíci +14

      Who allow this industry to function like this?
      If there is something wrong with them, why they dont make them stop?
      This is not a kindergarden to play with laser guns, or playing as a doctor

  • @nclanceman
    @nclanceman Před 8 měsíci +1929

    My first year CS professor started the entire class with a lecture on how bad code can kill people and how we should take bugs seriously. It's always chilling hearing stories about this.

    • @jordixboy
      @jordixboy Před 8 měsíci +37

      im a self taught engineer and this is just common sense, like bruh wtf

    • @r4ych836
      @r4ych836 Před 8 měsíci +6

      Can you share any other significant cases?

    • @erikkonstas
      @erikkonstas Před 8 měsíci +7

      ​@@r4ych836I'm not sure if bad code has ever killed people other than via the Therac-25... I'm also not sure I'd want for examples to exist, for obvious reasons...

    • @kphaxx
      @kphaxx Před 8 měsíci +38

      @@erikkonstasTeslas lmaoo

    • @ollantayscocos8709
      @ollantayscocos8709 Před 8 měsíci +35

      @@r4ych836 my last lecture in an AI course was solely dedicated to ethics. Our professor shared some examples, but one was that an ai made to make insurance rates for people started to give racist rates. Because it was trained on data going back to the 40s or something, the AI was biased to giving certain minorities worse rates than they should have been given

  • @capsey_
    @capsey_ Před 8 měsíci +920

    The more you learn about engineering, not only software but hardware, mechanical, etc., the more you learn that world around you is held by duct tape and prayers

    • @GuyFromJupiter
      @GuyFromJupiter Před 8 měsíci +58

      As someone who works in industrial automation, this is way too true! Our ability to put duct tape on a system without shutting it down, even the code, is pretty spectacular though!

    • @contactdi8426
      @contactdi8426 Před 8 měsíci +34

      Hahaha, what a perfect way to describe “held by duct tape and PRAYERS”

    • @youkofoxy
      @youkofoxy Před 8 měsíci +18

      Praise the Omnissiah.

    • @1kvolt1978
      @1kvolt1978 Před 8 měsíci +11

      @@youkofoxy As a former electrician I tell you: Yeah, you better do, He is your only hope and salvation!

    • @bartudundar3193
      @bartudundar3193 Před 5 měsíci +8

      As an engineer, not entirely true. I am in the biomedical industry and maybe its just us but our products get tested. Like a lot. Some of the tests sometimes seem ridiculous but you are frequently reminded that regulations are written by blood.

  • @MarcelSchr
    @MarcelSchr Před 8 měsíci +1655

    I wish the entire source code were online, but for those interested, excerpts of it can be viewed in an investigative report online.

    • @Name-gl6lf
      @Name-gl6lf Před 8 měsíci +82

      could you link? i can't seem to find any code

    • @gcl2783
      @gcl2783 Před 8 měsíci +18

      AECL never released the source code

    • @erikkonstas
      @erikkonstas Před 8 měsíci +106

      I suspect the reason they never released it is that there *might* have been way more horrendous bugs than that one (yes, exactly what I said)...

    • @PanoptesDreams
      @PanoptesDreams Před 8 měsíci +74

      @@erikkonstas it's proprietary code, they're not just going to give it away for free. That same code is based on older IP from the previous machine. And I imagine future iterations will also have common code. Just like how Windows still has code from 20+ years ago in it.

    • @yurikadzz
      @yurikadzz Před 8 měsíci +74

      Propietary code is a crime against humanity

  • @kcnl2522
    @kcnl2522 Před 8 měsíci +401

    One person, hobbyist, alone, in assembly. Recipe for disaster.

    • @ThePC007
      @ThePC007 Před 8 měsíci +36

      Most importantly, there were no unit tests. Though admittedly, I don’t think any unit testing frameworks even existed at that time.

    • @Novusod
      @Novusod Před 8 měsíci +31

      Most coders back then were hobbyists. There wasn't a huge computer industry in 1985 with companies specializing in niche software.

    • @threeMetreJim
      @threeMetreJim Před 8 měsíci +14

      It's more likely that the operation of the code was poorly specified, and the complete lack of testing.
      A long time ago I was a hobby programmer in microcontroller assembly and ended up programming for a living for a while. I had the task of not only programming, but adding features, and debugging poorly documented code from previous programmers. I've not long corrected (and enhanced) some buggy micro code that was downloaded from the internet with zero documentation (had to disassemble, re-code and re-assemble), so it is quite possible for a 'hobbyist' programmer.
      As processors can crash unexpectedly (power glitches, static discharge, radio interference), having no hardware safety features on a safety critical device is not something I'd be happy with, and is probably illegal these days (if it wasn't at the time).

    • @danieI.strohman
      @danieI.strohman Před 7 měsíci

      @@threeMetreJim unfortunately, there is a trend of more and more elevators having fewer hardware safety features and relying more and more on software to keep everyone safe. the youtuber beno has done some videos on some of these.
      czcams.com/video/K7GHNpZlK7A/video.htmlsi=rImsxbn9PZsUBZ4N
      czcams.com/video/-gg0RTd5A4U/video.htmlsi=BmdbyH3DcPRXhw1j&t=137
      czcams.com/video/NBUeeJauf5A/video.htmlsi=i7GAT79KlpkFroiT

    • @gabry96colo
      @gabry96colo Před 7 měsíci +4

      having a big red button that stops the machine even if the software doesn't want to is a requirement in most industrial applications (coded assembly-like until 10-15 years ago ). hardware safety is a must imho, the last think i would do is removing safety features to make the machine cheaper

  • @shimadabr
    @shimadabr Před 8 měsíci +1397

    That error message though... "Radiation is either too high or too low" WTF 😂.
    "50/50, let's do this!" - Medical staff

    • @darylphuah
      @darylphuah Před 8 měsíci +57

      It just means radiation is out of permitted range.

    • @shimadabr
      @shimadabr Před 8 měsíci +196

      @@darylphuah Yeah, but the error message is kind of useless. The doctor can't know if the patient is about to get some sweet cancer or not haha.

    • @TurtleKwitty
      @TurtleKwitty Před 8 měsíci +148

      @@shimadabr The doctor also has no clue that that is the intended message, all they got was error 54 not the actual error reason

    • @ricardodegenova
      @ricardodegenova Před 8 měsíci +70

      The message didn't say that, the message said 'dosage input 2' error. The 'dosage too high or too low' message was associated later on. The interface also said the user had only received 6 rads out of 202. The operator was totally correct in unpausing it. If the process only just started, there's no problem unpausing it and continuing to the end of the procedure

    • @nick066hu
      @nick066hu Před 8 měsíci +4

      Reminds me of those 'scientists' measuring radiation in Chernobyl disaster to be exactly the level of the maximum value the device could show on the scale. Was quite plausible. 😀
      I still don't get it, nobody blamed that 50/50 thinking medic ? I am not allowed to do things that have a 0.000000001% chance of harming someone.

  • @Alex-qf1pm
    @Alex-qf1pm Před 8 měsíci +530

    Writing async code in assembly. Of course it had bugs.

    • @user-yv1qs7sy9d
      @user-yv1qs7sy9d Před 8 měsíci +83

      More like: Writing async code. Of course it had bugs.
      Most languages do not prevent data races, and I have yet to hear of a language that would help in this specific occasion without support from the hardware itself, i.e.: in this case the magnets and filter.

    • @HappySlappyFace
      @HappySlappyFace Před 8 měsíci +58

      More like: writing async code without knowing that it is async so it is written as non async

    • @309electronics5
      @309electronics5 Před 8 měsíci +28

      It had been written by a hobbyist student ofcourse it would be dangerous. A hobbyist never has the mindset to think about every critical safety system that has to be implemented to make software safe

    • @rivershen8199
      @rivershen8199 Před 8 měsíci +43

      I feel like the bigger problem was the primary design choice of the programmer to have an interface where you can freely write any kind of data and have a confirm command at the bottom.
      Every user would assume that nothing happens to the machine until the confirm command is sent. Why would he make the machine read certain values instantly long before the confirm command is read?

    • @khatdubell
      @khatdubell Před 8 měsíci +4

      @@rivershen8199 Probably why you should hire a professional, not a hobbyist.

  • @Yotanido
    @Yotanido Před 8 měsíci +2341

    Race conditions are notoriously hard to debug. Because you only have a couple milliseconds for the exact right conditions to occur to trigger the bug.
    This is a race condition with an EIGHT. SECOND. WINDOW.
    Had they tested it properly, they would have been almost guaranteed to find this. This is not just negligence, this is recklessness.

    • @skellious
      @skellious Před 8 měsíci +323

      this is also one of the best examples of why you need to pay people to try and break your software. you will by default always enter things correctly, you wrote the software, you wrote the procedure manual, you know what you are doing. the user DOES NOT KNOW WHAT THEY ARE DOING, and therefore is not restricted by your assumptions. this allows them the freedom to screw things up in ways you never imagined. This is why it is preferable to pay highly trained chimpanzees (also called QA testers) to find these issues first (no offense to QA people, you are literal life-savers)

    • @d00dEEE
      @d00dEEE Před 8 měsíci +193

      @@skellious I worked on FDA regulated software, and we'd recruit complete noobs to test it, maybe a project manager or someone who has no knowledge of software or the subject area. We called them "monkey testers" and they'd misunderstand just about every instruction, thus flushing out all sorts of bugs that knowledgeable users would navigate around.

    • @bailey125
      @bailey125 Před 8 měsíci +96

      @@d00dEEE We also do the exact same thing where I work in the NHS. I build an application and test it thoroughly. We then send it off to doctors, clinicians and ward staff to test before actually going live. Because they do everything so "wrong" they are able to produce errors me and my team wouldn't have thought of ourselves and then we're able to fix it before making the application accessable to the whole organisation.

    • @bob450v4
      @bob450v4 Před 8 měsíci +6

      Rust

    • @HHJoshHH
      @HHJoshHH Před 8 měsíci +3

      @@bob450v4 🤣

  • @yondaime500
    @yondaime500 Před 8 měsíci +484

    Reminds me of when I went to the dentist to get an X-ray, and saw that the machine was running Windows Vista. I felt like I was in a Final Destination movie.

    • @erikkonstas
      @erikkonstas Před 8 měsíci +49

      Uh, sorry to say, but you practically were...
      *[CONTENT WARNING] Careful before clicking "Read More"*
      Amongst other shit, very early versions of Vista were notorious for just up and crashing out of nowhere, or even not booting at all for no reason (the then infamous Red Screen of Death); if it crashed, who knows what would happen to the ray emission???

    • @stanleybochenek1862
      @stanleybochenek1862 Před 8 měsíci +21

      @@erikkonstas that’s more terrifying to think about

    • @onetwothree2617
      @onetwothree2617 Před 7 měsíci +48

      Luckily for dental the usually the xray emitter is its own device and doesn't use an external pc.The 'film' or reciever is what is connected to the pc. Unless you are getting a panoramic then its up to the manufacturer lol. Hopefully the engineers put hardware interlocks on everything now .

    • @spungbopscarepans
      @spungbopscarepans Před 7 měsíci +4

      hell nah the military also uses windows xp america’s screwed

    • @kikihun9726
      @kikihun9726 Před 7 měsíci +3

      Just to remind, most of the metro systems are running from a floppy.

  • @hvfd5956
    @hvfd5956 Před 4 měsíci +25

    Been there. In the mid 1980's I was treated in a Therac-25. Fortunately, mine didn't fail. I did hear about the failure and it took all I had to walk in there every weekday for 30 treatment days, knowing that it COULD fail. I worked in software and had a sub-routine called SNO - for Should Not Occur that printed an error message and exited. I was amazed at how many times I hit that routine. I was very glad it was there. FYI - I am now up to 5 rounds with old man cancer and I am still here. The average for Mom and the three kids stands a 7, so I get to look forward for two more. Yea Me!

  • @LuisEn20005
    @LuisEn20005 Před 8 měsíci +37

    Even the name "Malfunction 54" sounds scary for a simple bug, and then is MORE scary when you see "Malfunction 54 (12777 rads delivered)"

  • @joaopedrorocha4790
    @joaopedrorocha4790 Před 8 měsíci +166

    It would be a public service to have a series on this topic: "Code that kills". There are many cases like this on which code that runs essential infrastructure end up costing lives?
    Thanks for sharing this!

    • @erikkonstas
      @erikkonstas Před 8 měsíci +3

      I sure fucking hope not, at least I'm not aware of any...

    • @nua1234
      @nua1234 Před 8 měsíci +10

      Considering how many other failings, most of the blame isn’t the code.
      The decision to remove the hardware interlock, and just reuse the previous software (which was designed with the assumption the interlock was there), without extensive testing and examination was the biggest failing.

    • @SneedFeedAndSeed
      @SneedFeedAndSeed Před 6 měsíci +2

      Code that kills
      Would it write it for me?
      With your hand so still, it makes me believe
      In the software's sins
      Let me compile now and never die
      I'm alive

  • @TailRecursion
    @TailRecursion Před 8 měsíci +247

    I'm a full-time software engineer and part-time nuclear/radiation nerd. I've heard this story on other channels and read about it online, but nobody else goes into detail about the software aspects, which I find the most interesting. Great stuff LLL!

    • @erikkonstas
      @erikkonstas Před 8 měsíci +3

      Well, too much detail is impossible, the code isn't out there at all.

    • @pro-socialsociopath769
      @pro-socialsociopath769 Před 7 měsíci +2

      They probably don't want it public cause it's recycled code that still gets used today 💀

    • @danmurad8080
      @danmurad8080 Před 4 měsíci

      I’m writing a book for junior developers, what in your opinion was the root cause? Failure to test?

    • @TailRecursion
      @TailRecursion Před 4 měsíci

      @@danmurad8080 I think the testing aspect is very important and would've absolutely reduced the severity of the failure, but in this case when hardware is involved, relying too much on software to assume the hardware state without any way to verify it is begging for disaster. Even something as simple as a 3D printer would be a lot more hazardous without sensors to ensure the hardware is in the right state.

    • @TimsCabana
      @TimsCabana Před 3 měsíci +1

      I have been writing assembly language code for over 30 years. I was cringing... watching this video.

  • @mushroomcrepes4780
    @mushroomcrepes4780 Před 8 měsíci +60

    the virgin pre-production testing vs the chad testing in production

  • @okaylord
    @okaylord Před 8 měsíci +394

    Sounds like a chain of failure from the machine, to the hospital in all regards. The machine manufacture did not care and the Hospital also did not care too.

    • @shimadabr
      @shimadabr Před 8 měsíci +49

      I can understand the mistake by the hospital administrators. They are paying top dollar for cutting edge equipment, so they kind of expect it to be made with high standards. But the main fault is at the company, it's a chain of negligence.

    • @Rin-qj7zt
      @Rin-qj7zt Před 8 měsíci +12

      Not caring should be a crime for things like this

    • @ne0nmancer
      @ne0nmancer Před 8 měsíci +21

      @@Rin-qj7zt It's called negligence

    • @liegon
      @liegon Před 8 měsíci +15

      There were six incidents though in different hospitals. The machine did something it was not supposed to do, and the user interface lied about it.

    • @31redorange08
      @31redorange08 Před 8 měsíci +1

      *either, not too.

  • @pacifico4999
    @pacifico4999 Před 8 měsíci +98

    I hate this "too high or too low" type of error. It's like searching for an email on Lotus Notes: "Your search returned no results or too many results". Please be specific with error messages

    • @sg39g
      @sg39g Před 5 měsíci +5

      ERROR 418 I'm a teapot.

    • @DROGOC0P
      @DROGOC0P Před 4 měsíci +4

      ERROR 88: something's wrong but I won't tell you what

  • @ChrisM541
    @ChrisM541 Před 8 měsíci +183

    Something like this, where someone's health is at stake, should have had a team of programmers agreeing on, and reviewing each others code. The root cause wasn't the lone programmer - it was all those above him who signed off on that lone programmer. Disgusting working practice and yes, that lone programmer should also have recognised the danger immediately.

    • @khatdubell
      @khatdubell Před 8 měsíci +13

      Basically the plot to Jurassic park

    • @nelsonahlvik6650
      @nelsonahlvik6650 Před 8 měsíci +34

      To me the worst part of this was the removal of hardware interlocks. Software can NEVER be relied on 100%, even if it has been extensivley tested. Physical switches and relays should ALWAYS be in place for safety critical applications. If there were hardware interlocks in place in the Therac-25 this would never had happened. Sure, the bug would still have been there, but the machine couldn't have hurt anybody as the emitter PHYSICALLY would not have been able to activate without the magnets in place.

    • @darylphuah
      @darylphuah Před 8 měsíci +16

      He was a lone programmer, working in assembly on a rather complicated machine. He may have been a "hobbyist", but I reckon he is more skilled than many current software engineers.
      Mistakes like these happen, logical errors and race conditions are incredibly common when working on any complex system. He "should have" caught it, is not expected. In fact current software engineering practices expects programmers to make mistakes like these. Which is why as you said, we have pair programming, code reviews, unit testing, etc.
      In critical systems like this, break testing should have been done to identify potential failure points.

    • @erikkonstas
      @erikkonstas Před 8 měsíci +6

      ​@@nelsonahlvik6650Not just that, what if some hardware filter breaks off the machine and BOOM, EVERYTHING within a 5km radius is exposed??? Yes, the software would be bug-free, but the plastic broke physically so the radiation core was out there, not controlled by the software anymore...

    • @QueenofTNT
      @QueenofTNT Před 8 měsíci

      @@erikkonstas The Therac-25 (and its older sibling, the Therac-20) used a double-pass accelerator that did not use a radiation source (such as Cobalt or Cesium) like older machines. The double-pass system uses a magnetron to create a beam, which only activates upon operator input to start a treatment. So, thankfully, if you're not in the same room as it, you're probably fine. This was probably the ONE good thing the Therac-25 had going for it.
      As a sidenote though, incidents of exposure via radiation sources from old radiotherapy and xray machines have happened before, and it is not pretty. I would imagine most radiotherapy machines nowadays use a magnetron instead of a radiation source as it's much safer, more easier to maintain, and easier to decommission. No deadly radiation sources, all you need to do is disconnect the power and it's powerless.

  • @myanrueller91
    @myanrueller91 Před 8 měsíci +272

    I remember the Kyle Hill video on this, and he glossed over the software bugs part of the Therac tragedy. This shines a different light on the importance of software safety, especially in mission critical or life saving tools. Kyle's focused on the tragedies and their relationship to the ongoing nuclear age.
    Very different and interesting perspectives.
    As a software engineer, it is always chilling to recall this story.

    • @dixztube
      @dixztube Před 8 měsíci +11

      I don’t know why but he annoys me lol

    • @vanjazed7021
      @vanjazed7021 Před 8 měsíci +16

      @@dixztube Kyle? Yeah, his videos, even about very serious topics, started to feel like History Channel talking about aliens.

    • @inconnu4961
      @inconnu4961 Před 8 měsíci +2

      @@vanjazed7021 I thought his target audience were kids/young adults.

    • @aldeywahyuputra5719
      @aldeywahyuputra5719 Před 8 měsíci +4

      I agree, I think both videos shine on their own different (but valid) intended context and thus, their own perspectives.
      While Kyle's video focuses more on the whole incidents as their target audience is for the broader masses, this video focuses more on the software itself. Nevertheless, I also think both videos do succeed in bringing the negligence and recklessness of AECL and hopefully can add more to the topics on code safety as a cautionary tale.

    • @compu85
      @compu85 Před 7 měsíci +6

      One of the most amazing things about this ordeal is at one point, AECL issued a bulletin telling the hospitals to use a screwdriver to pry the up arrow key off the VT100 keyboard, and to glue the key switch in place so it couldn’t be activated. The FDA was not amused by this “fix”.

  • @robottwrecks5236
    @robottwrecks5236 Před 8 měsíci +138

    Software never got tested until it was shipped? Sounds like all the AAA games coming out

    • @godnyx117
      @godnyx117 Před 8 měsíci +6

      *Modern AAA games.

    • @liegon
      @liegon Před 8 měsíci +7

      The difference being of course, that AAA games are extensively tested, despite their many bugs, and that they are not safety critical systems.

    • @khatdubell
      @khatdubell Před 8 měsíci +3

      Yeah, but john, when the pirates of the Caribbean game breaks, the pirates don't eat the user.

    • @robottwrecks5236
      @robottwrecks5236 Před 8 měsíci +4

      @@khatdubell Scribbles down game idea

    • @stanleybochenek1862
      @stanleybochenek1862 Před 8 měsíci

      Yea

  • @josephdvorak9241
    @josephdvorak9241 Před 8 měsíci +43

    Sobering and sad. A reminder that clean, thoroughly tested code is crucial, together with the assumption that there still may be bugs no matter how many edge cases are accounted for in the tests.

    • @nelsonahlvik6650
      @nelsonahlvik6650 Před 8 měsíci +4

      and hardware locks are the most important, they make sure that even if the software goes wrong nobody gets hurt

    • @erikkonstas
      @erikkonstas Před 8 měsíci +2

      ​@@nelsonahlvik6650Or if vulnerable parts of the hardware go wrong, the locks protect the entire vicinity (e.g. if the locks worked correctly, Chernobyl wouldn't have exploded).

  • @forbiddenera
    @forbiddenera Před 8 měsíci +20

    This is why I won't ever code anything where human life is at risk

    • @309electronics5
      @309electronics5 Před 8 měsíci +3

      If i was the company i would hire a software engineer who is verified and thinks about everything, not a hobbyist coding student

    • @jamesm4957
      @jamesm4957 Před 8 měsíci

      ​​@@309electronics5its not about the developer but the company does not fully test the system. The right thing is you should employ a separate QA to handle this kind of edge cases

    • @edgeeffect
      @edgeeffect Před 8 měsíci +5

      Except that verifications are usually a matter of paying the fee and sitting through the course and have next to nothing to do with competence.

    • @LSHV
      @LSHV Před 6 měsíci

      @@309electronics5They want money, sweet sweeeeet money, that's all. Sadly

    • @fragileomniscience7647
      @fragileomniscience7647 Před 2 měsíci

      Hoare calculus and verification

  • @maxcryer8654
    @maxcryer8654 Před 8 měsíci +183

    This style of video is really interesting, it would be pretty cool if you could produce more videos with stories like this

    • @celticwinter
      @celticwinter Před 8 měsíci +2

      Examples that immediately come to mind are the assembler bug in the moon landing (could be fixed) and entering imperial values into a metric controlsystem by NASA, I think (crash and burn).

    • @lodgin
      @lodgin Před 8 měsíci

      Yeah, I really like these kinds of videos. Kevin Fang has been doing these kinds of videos for a little while now.

    • @erikkonstas
      @erikkonstas Před 8 měsíci

      ​@@lodginWoah, somebody else knows that name! His channel is severely underrated, I may say...

    • @erikkonstas
      @erikkonstas Před 8 měsíci

      I agree, except for the part where there are casualties... Ariane 5 (wrong direction which led to a crash due to a FP error) comes to mind.

    • @XeL__
      @XeL__ Před 8 měsíci

      yea simple and welle xplained demonstrated, more MORE
      im a cancer survivor i had chemio and radio and pills, i was curious
      the nuking machine is intense for sure, when nurse use 2 inch lead vest "oh its just to protect me from being nuked alive by your treatment"
      they literaly told me "to kill cancer cell we kill you and cancer and hope you survive while cancer die"
      O_O ok lets try lol

  • @Varian_t
    @Varian_t Před 8 měsíci +105

    I started my career in Dental X-Ray designing and manufacturing company as a Junior Embedded System R&D engineer.
    There the hardware team has the master role always they critisize and having less trust in software😂. I remember how regorous regression tests they've done before going to launch a product.

    • @nick066hu
      @nick066hu Před 8 měsíci +2

      you are my man, ... if I may ask you. I noticed I am not always given those lead filled radiation protective ponchos (i don't know the exact name) any more nowadays when a dental X-ray is made on me. Am I right thinking it is because the newer (cone shaped beam) machine produce less radiation dose, and also less stray radiation with the cone shaped beams ? ..or just negligence and I should ask for one.

  • @shinyrayquaza9
    @shinyrayquaza9 Před 8 měsíci +5

    bro wtf why does it start before you even hit the start button, bro why doesn't it double check the conditions with something that dangerous and change if it notices new values!?!?

  • @cherubin7th
    @cherubin7th Před 8 měsíci +18

    The problem with testing is that you test what you think you should test. If they would never had the idea to change the mode afterwards, this bug might have been unnoticed despite testing.

    • @clamhammer2463
      @clamhammer2463 Před 8 měsíci +3

      That's only if you do basic software testing. in reality, there are like 15 levels/variations of types of testing and one of those is throwing random inputs at it repeatedly to see what failed

  • @RandomGuy37
    @RandomGuy37 Před 8 měsíci +17

    I'm studying programming in university right now and this was one of the examples my professor used to demonstrate how a mistake in code could have massive and sometimes even fatal consequences. He also pointed out that with more testing and a better graphical user interface this all could've been avoided.

    • @hagiasmos314
      @hagiasmos314 Před 5 měsíci

      Testing can NEVER demonstrate the absence of defects.

  • @kizhissery
    @kizhissery Před 8 měsíci +16

    Legends test in production: Ocean gates

  • @AntonioZL
    @AntonioZL Před 8 měsíci +20

    Writting software is a weird experience. It doesn't matter how many scenarios you've simulated and prepared for, there's always something that WILL go wrong.

    • @adissentingopinion848
      @adissentingopinion848 Před 8 měsíci

      If you go into military/FAA spec hardware verification, it reaches a point where EVERY bit of every variable MUST be toggled. The most advanced testing methods either spam your inputs with every possible combination of data, or they use Mathematical proof software (!) that verifies that no failures are physically possible. The airplane control software CANNOT fail, and you must prove it as such.
      One guy. Assembly. No testing. I might not sleep tonight...

  • @unity3dconcepts434
    @unity3dconcepts434 Před 8 měsíci +30

    I'm working for an organisation that creates training checklists for operators working and operating machines in manufacturing sector. This video is an eye opener for me to why I must be more focused when writing my code. People's lives depends upon what I write.

  • @Ramonatho
    @Ramonatho Před 8 měsíci +56

    I'm still confused (other than for profit motive) why the same machine for 180 rad would be used for 12.5k rad dosages.

    • @shimadabr
      @shimadabr Před 8 měsíci +28

      Just a guess (don't understand this subject), but I think it's because the x-ray mode projects a strong beam that is then "regulated". The problem was that the "regulator" was not in position.

    • @DevinBaillie
      @DevinBaillie Před 8 měsíci +17

      ​@@shimadabryou're pretty close.
      To produce X-rays, the machine accelerates electrons and then crashes them into a tungsten target. The target stops the electrons and X-rays are produced. The dose rate from the X-rays is less than 1% of the dose rate from the electrons - most of the energy is lost in the target as heat.
      To produce election treatments, electrons are accelerated with no target in place and deposit their energy in the patient directly.
      So for the same electron beam current, the X-ray dose is orders of magnitude less than the electron dose. Or, put another way, to get the same dose, the beam current must be orders of magnitude higher in X-ray mode than in electron mode.

    • @Jeanbose
      @Jeanbose Před 7 měsíci +2

      I would say, because the software doesn't check the dose before sending it (aka: the dosage doesn't have a prefixed limit for each mode in the software) it got sent anyways

    • @Jeanbose
      @Jeanbose Před 7 měsíci +1

      Let's say you want to send 25000 electrons, but you put it on X rays, it will do it because it doesn't have a safeguard on it, that tells the system not to do it since it doesn't have any hardware safeguards either

    • @xxxxxxxxxx6903
      @xxxxxxxxxx6903 Před 2 měsíci

      ​@@DevinBaillie- Thank you for that info. I've always wondered why these machines could produce lethal radiation doses? The explanation of the software glitch made perfect sense, especially given the vintage of the equipment. But the magnitudes of higher overdose of radiation never made sense to me. I'm betting most reading these comments after watching these Therac videos still don't get it either. The now known software glitch would cause the unit to enter X-Ray mode, without enabling the electromagnetic beam deflector to hit the Tungsten target (instead of the patient being the target of 10-20,000RADs). Poor victims of these machines, ☢️ probably one of the longest most agonizing ways to go! 😱

  • @Schytheron
    @Schytheron Před 8 měsíci +45

    When I graduated I was offered a job as a software engineer at a biomedical company that sold medical hardware to hospitals. I didn't read to much into the details but it was a machine that was built to automatically feed (on a timed interval or when certain conditions are met etc.) doses of medicine/substances via IV to patients. They also sold heart rate monitors etc.
    The pay was good and the job was very enticing but I could not bear to accept it, precisely because of things like this, that were shown in this video. I could not handle the stress. Constantly having to worry if my spaghetti code is going end up costing someone their life (accidental overdose). Fuck that! I know there are engineers out there that write better code than me that would be better suited. I have no problem admitting that. I don't need this level of worry in my life. I am good.
    Something like this happening and me being responsible has to be one of my biggest nightmares as a software engineer.

    • @MikhailBorisovTheOne
      @MikhailBorisovTheOne Před 6 měsíci +6

      You are also self conscious enough to anticipate these things happening. Which alone makes you more qualified than most. Had these managers more of that, those deaths could be avoided. But greed clouds judgement.

    • @AlwaysOnForever
      @AlwaysOnForever Před 5 měsíci +1

      I am with you haha, I don't want to feel guilty for the rest of my life

    • @franklofarojr.2969
      @franklofarojr.2969 Před 5 měsíci

      So work on Windows Update. Where failure is not an option, it is a certainty! :)

    • @DatTeilchen
      @DatTeilchen Před 4 měsíci

      by writing what you wrote, you are more qualified for that position than 99% working in the medical field.
      I used to have your attitude, but then I found a lot of programmers in medicine just have a better "fuck it" attitude than me.
      "Who could have known"?
      You, you dummy, if you did your homework.

  • @LoganAcer_20
    @LoganAcer_20 Před 8 měsíci +7

    New fear unlocked: going into a surgery and the machine just bluescreen mid-surgery

  • @joeymurphy2464
    @joeymurphy2464 Před 8 měsíci +33

    I took a safety class that presented an interesting perspective on the question of "can software fail". You seem to say yes. In the class, they claimed that software does not fail, because it always does what you tell it to. Whether what you told it was what you wanted, that's where you get problems. But that's not the software failing, that's you failing.

    • @radiosification
      @radiosification Před 8 měsíci +20

      "can software fail?" Can it fail at what? Can it fail to do what we expect? Absolutely. Can it fail to do what it should do according to its instructions? Also yes, because rarely you can get a random error like a flipped bit in RAM, or even an error in the design of the CPU. So I would say software can fail either way you look at it.

    • @WilcoVerhoef
      @WilcoVerhoef Před 8 měsíci +4

      Those are hardware failures though.

    • @n1ppe
      @n1ppe Před 8 měsíci +4

      @@WilcoVerhoef Cosmic rays can also cause bit flips.

    • @WilcoVerhoef
      @WilcoVerhoef Před 8 měsíci +3

      @@n1ppe Yup, that's the hardware failing

    • @n1ppe
      @n1ppe Před 8 měsíci +4

      @@WilcoVerhoef Which in turn could cause the software to fail or do something it wasn't supposed to do, which could've been prevented if you had made it better. So software can fail and have glitches, so I don't understand your point.

  • @zeez7777
    @zeez7777 Před 8 měsíci +6

    Ngl the 'hobbyist programmer' isnt to blame for ALL of this and is probably leagues above anyones skill from the comments that are talking bad about him.
    This is on the company who hired him and didnt perform any testing or specify exactly how they want it to work.

  • @gold4963
    @gold4963 Před 8 měsíci +36

    I first heard of and learned about the Therac-25 in a college technological ethics class.
    But I never knew what exactly happened in the code! So interesting and tragic!

  • @GuyFromJupiter
    @GuyFromJupiter Před 8 měsíci +110

    Having a hobbyist write the program honestly isn't a huge error in my eyes, as I'm sure he was plenty skilled. What blows my mind is that it was never properly tested to ensure this type of thing was impossible. It doesn't matter how skilled you are at programming, you will make mistakes. We rely on others to help us catch them and correct them.

    • @stephenfazekas5054
      @stephenfazekas5054 Před 8 měsíci +10

      Hey Boeing outsourced the mcas coding to india for only $4 an hour they save a ton of money

    • @dreadedenterprise51
      @dreadedenterprise51 Před 7 měsíci +1

      In this case, the programmer was programming in assembly. Assembly is an extremely difficult low level language that hobbyists should not be using to make medical devices with

    • @kwiky5643
      @kwiky5643 Před 7 měsíci

      True but i dont think it was an hobbyist, unless he wanted to suffer. Cause Assembly ...

    • @EperkeGMD
      @EperkeGMD Před 6 měsíci +5

      @@kwiky5643 well back then there wasnt anything else

    • @shadowchasernql
      @shadowchasernql Před 6 měsíci +1

      @@EperkeGMD total lie. As a devout FOCAL programmer, you disgust me.

  • @sioux4358
    @sioux4358 Před 8 měsíci +15

    I'm putting exactly 0% on the developer.
    The company that contracted them didn't do their due diligence, and you can't expect a solo dev to account for EVERY single edge case.
    They chose to test in prod.

  • @mpiloz8016
    @mpiloz8016 Před 8 měsíci +33

    as a programmer, I've always been afraid of going into something as serious as the medical field. I'm not always 100% confident about the code that goes out as I don't have testers, my code get's tested in live environment. I can't have blood on my hands.

    • @GuyFromJupiter
      @GuyFromJupiter Před 8 měsíci +4

      I work in industrial automation, and there is a similar issue for us when something is critical for human safety. Safety critical aspects of our programs must always be tested and validated by a third party, and I wouldn't have it any other way. That said, usually safety stuff is pretty simplistic and pretty much guaranteed not to fail even before testing. It's a whole other world working on a medical device like this.

    • @stanleybochenek1862
      @stanleybochenek1862 Před 8 měsíci +1

      And idk if you go to jail for it too yea

    • @1kvolt1978
      @1kvolt1978 Před 8 měsíci

      Everyone will die in the end...

  • @cern1999sb
    @cern1999sb Před 8 měsíci +9

    As a Software Engineer, I can say the statement "Software Will Fail" is very true. The only real way around this is redundancy, and in software, that typically means multiple independently developed systems which must all agree on an answer for it to executed

  • @ThePandaAgenda
    @ThePandaAgenda Před 8 měsíci +93

    As bad as the code might have been written, you gotta give it to the guy to actually put in an error message that informs the operator that they are at fault and should rethink their settings.

    • @khatdubell
      @khatdubell Před 8 měsíci +18

      It sounded like, from the video, he had gotten a lethal dose before the error message.

    • @erikkonstas
      @erikkonstas Před 8 měsíci +3

      ​@@khatdubellNot what they were saying at all...

    • @khatdubell
      @khatdubell Před 8 měsíci

      @@erikkonstas Sure about that?
      czcams.com/users/clipUgkxNT7FBU-YOqzPFtTk6qApltMuSbuyyFMB?si=78CImXtDNe6S-y9-

    • @Gamebuster
      @Gamebuster Před 8 měsíci +7

      Everything the operator entered was correct by the time the operator actually initiated the procedure.

    • @anthonyseichter
      @anthonyseichter Před 4 měsíci +2

      I feel bad for this dude. He might not even know they defeated the safety device and might not have been rad qualified or given a source to really check other than timing and position, and the year... Imagine him giving the hospital a hundred miles of docs and procedures to test and they just, went to production... Imagine the operator not even reading the manual for a device like that or watching it work with the covers off to understand it's operation.
      Still though, get a code review at least. That said if I was this dudes associate and he said hey look at this, I wouldn't admit my eyes hit it unless I told him it was champion or bug free, and even then was the 8 second race condition in the docs? The reviewer might assume a quick click and done sub half second race. What a tragedy all around.

  • @DogeOfWar
    @DogeOfWar Před 8 měsíci +64

    I feel the enjoyment you put into this one, thanks for the great content LLL!

    • @LowLevelLearning
      @LowLevelLearning  Před 8 měsíci +14

      I had a great time with this one :) Thanks for watching!

  • @twiggs823
    @twiggs823 Před 8 měsíci +5

    a finger chopping machine sent me here

  • @voila5751
    @voila5751 Před 7 měsíci +9

    As a programmer myself, this was like watching horror movie. Like nowadays even the internet form that You use to order socks has more automatic tests and testing process then that machine that x-rayed those people to death. Really I cannot imagine the despair to be the ones that got that killing dosage :(.
    Like ... every programmer I know uses more or less defending coding strategies, I just cannot imagine I would even allow the machine to emit that dosage in too short timeframe. Just, I am shocked.

  • @larryd9577
    @larryd9577 Před 8 měsíci +9

    Well, they also made an error when naming that thing. Therac-6 plus Therac-20 would be Therac-26. Classical off-by-one-error.

    • @Alex-hy7nx
      @Alex-hy7nx Před 8 měsíci +1

      "Therac-6 plus Therac-20 would be Therac-26. Classical off-by-one-error."
      Good one

  • @peterfitzpatrick7032
    @peterfitzpatrick7032 Před 8 měsíci +5

    Why would the machine be able to physically give such a lethal dose in the first place, regardless of the software...I mean NO ONE is going to be prescribed such a high dose... ever !! 🙄

  • @ethanphelps5308
    @ethanphelps5308 Před 8 měsíci +9

    One of my Comp Sci professors, Clark Turner, was part of the investigation into the Therac-25 incident and I remember him telling us the story about how he and another person found the race condition that led to these people's demise. They wrote a paper about the investigation. Crazy stuff

  • @kx4532
    @kx4532 Před 8 měsíci +2

    The Therac 25 a mandatory study for all engineering students.

  • @OsmosisHD
    @OsmosisHD Před 7 měsíci +4

    Unbelievable that they were so careless about such a critical piece of code.

  • @JordanBeagle
    @JordanBeagle Před 8 měsíci +7

    8:35 Hardware interlocks and oversights should always be included
    "Too make things cheaper"
    Ah, another money over lives situation

  • @insertoyouroemail
    @insertoyouroemail Před 8 měsíci +2

    As a developer this makes blood boil.

  • @0rigina1_0fficia1
    @0rigina1_0fficia1 Před 7 měsíci +2

    "Idk man it worked on my machine"

  • @Tartarus144
    @Tartarus144 Před 8 měsíci +27

    that's so screwed up wtf

    • @ishark7822
      @ishark7822 Před 8 měsíci +1

      Why? that was a mistake... no one did it on purpose.

    • @Tartarus144
      @Tartarus144 Před 8 měsíci +12

      @@ishark7822 i phrased that wrong, i just think it's insane that bad code can literally kill in some cases

    • @siddarthspillai1499
      @siddarthspillai1499 Před 8 měsíci +5

      @Tartarus144 it's insane that these corporation will do anything for profits, even if it puts someone's life at risk. I'm certain that some employee might have asked for better testing and got his concerns ignored. I see that even today

    • @Tartarus144
      @Tartarus144 Před 8 měsíci +2

      @@siddarthspillai1499 agreed

  • @FredoCorleone
    @FredoCorleone Před 5 měsíci +3

    This is exactly why I fight so hard as a programmer to employ good testing strategies, a testing plan is always better than a good lawyer in my opinion.
    I'd not want people to die for my mistakes, I'd dedicate heart and soul to good software engineering.

  • @leescott8278
    @leescott8278 Před 8 měsíci +6

    Shoutout Kyle Hill for covering this 2 years ago, his Half-life Histories series is phenomenal!

  • @FueledbyJohn
    @FueledbyJohn Před 8 měsíci +5

    One example I have from my fathers experience was in which he had assembled and installed robotic arms and the plc's he'd designed at a car plant and the programmer came in to do the software setup and calibration my father had made him aware the safety isolation switches hadn't been completed and he was like no its fine so he proceeds to send inputs to the robotic arms which also had in production car bodies on as you can guess the arm slammed through the roof of thirty cars as my father had to attempt to stop the incident the following day those cars had scrap marked onto them.

  • @kev1830
    @kev1830 Před 8 měsíci +6

    We had a whole unit on this in undergrad comp. Ethics. Crazy how this went on for so long

  • @_modiX
    @_modiX Před 8 měsíci +73

    Sorry, but ignoring an error and just unpausing is risking the life of the patient in this field.

    • @tambow44
      @tambow44 Před 8 měsíci +41

      In today's world, 100%, but don't forget this was the 1980s and things were a lot different back then, as explained in the video.
      We don't know what training the operator was given surrounding that error, or the Beam Mode race condition. As well, the screen reported only 6 rads being applied. Again, this operator had more than 5 years experience with these machines so it's safe to assume they'd seen this issue before, or were at least told it was safe to proceed depending on what the screen reported in conjunction (that only 6 rads were reported).
      What is an absolute risk is producing a medical device than is capable of emitting 12k of rads. Considering more than 1000 is guaranteed to kill you...

    • @ME0WMERE
      @ME0WMERE Před 8 měsíci +33

      From a short summary of this, I gathered that the users continually got errors like 'malfunction X', which often meant nothing. So, they eventually just ignored them. Maybe the staff should have a little blame, but the majority should be on the obfuscated errors and poor software design.

    • @DMSBrian24
      @DMSBrian24 Před 8 měsíci

      yes, but not if the person wasn't at all educated about this and if there wasn't any procedure they were taught to follow

    • @_modiX
      @_modiX Před 8 měsíci

      @@ME0WMERE Oh wow that should just be reported and investigated. Errors to ignore, but yes, I guess these were different times ...

    • @liegon
      @liegon Před 8 měsíci +6

      The user interface clearly lied to the operator and gave no indication that any settings were wrong. Even if they had known that Error 54 was indicating too low / too high radiation, since the displayed settings were correct, it is very conceivable that they would assume everything was fine and unpause.

  • @andrewhooper7603
    @andrewhooper7603 Před 8 měsíci +2

    "This error says the radiation delivered was either too high or too low..."
    "I would greatly appreciate more specificity."

  • @lopiklop
    @lopiklop Před 5 měsíci +1

    "Entirely software controlled in 1986." Is a scary sentence.

  • @CodingWithLewis
    @CodingWithLewis Před 8 měsíci +10

    Absolutely insane. Incredible video.

  • @artey6671
    @artey6671 Před 7 měsíci +4

    Yesterday I took an exam for a computer science / electrical engineering and those races were also part of that course. Now I feel a little guilty for having somewhat skipped over that part.

  • @binary_ironclad
    @binary_ironclad Před 8 měsíci +1

    This was a great video. I’m a fan of your stuff in general, but this was awesome. I’m hopeful you turn this into a regular thing - the horrifying ramifications of software/tech things gone wrong.
    Great job!

  • @JustPyroYT
    @JustPyroYT Před 8 měsíci +15

    Great Video!
    I can also recommend Kyle Hills video about this software bug :)

  • @glitchy_weasel
    @glitchy_weasel Před 8 měsíci +3

    What a great video! It's definitely wild that people once thought that software was invincible.

  • @wisteela
    @wisteela Před 8 měsíci +3

    Always good to see more about this.

  • @HedroomMax
    @HedroomMax Před 3 měsíci +1

    This situation sent a chill down my spine. It reminded me of the time when I was designing security and door opening systems and the fears I had of software bugs or electronic design flaws. The extended weeks I would leave one system working alone 24/7 while another system monitored it.
    I can't believe that an industry that ships a machine with the lethal potential of this one would not test for it or even be tempted to eliminate fail-safe mechanical systems.

  • @daedalus_00
    @daedalus_00 Před 8 měsíci +1

    "Dose too High or too Low" is a terrible design choice. Let that serve as a lesson to write more meaningful error messages.

  • @SierraSierraFoxtrot
    @SierraSierraFoxtrot Před 8 měsíci +58

    I heard of this accident, I did not know it happened SIX TIMES!
    EDIT:
    I think I head of another completely unrelated radiation overdose... terrifying.

    • @LowLevelLearning
      @LowLevelLearning  Před 8 měsíci +21

      For the sake of brevity there were actually two more bugs in the code as well. Not a great system :)

  • @alexnezhynsky9707
    @alexnezhynsky9707 Před 8 měsíci +7

    Please tell me the family sued the manufacturer for negligence

  • @r.pizzamonkey7379
    @r.pizzamonkey7379 Před 3 měsíci +1

    "Software can't fail" has got to be the single most terrifying thing to hear someone say when they're creating medical equipment.
    I mean, even in hardware you don't rely on a single point of failure, why would you do any different for software?

  • @pa6370
    @pa6370 Před 8 měsíci +1

    I was a surgical lighting service technician who spent 20 years and 6 months on the road and in the workshop repairing, designing, developing, and modifying imported equipment to meet local standards with occasional type testing. I learnt to no longer be surprised at how manufacturers used inappropriate materials, components, and mechanical and/or electrical designs that were sometimes fundamentally unsafe. Often, the worst features of a product would be forgotten and repeated a few equipment generations later, with each new model being fundamentally more complex, less reliable, more costly to own and with an ever shorter lifespan.
    I'm so glad to have left the industry and hopefully all of my trailing liability behind.

  • @newmonengineering
    @newmonengineering Před 8 měsíci +4

    The problem with bugs is you have to test every condition you don't plan for. It's always some obscure condition that no one thought about that happens and causes the issue. There is no way to test every user accident in freak cases many times. You can test code for function but you can't test user situations. There will always end up some strange case where an operator did something you had not planned for.

    • @hagiasmos314
      @hagiasmos314 Před 5 měsíci +2

      Well said. In other words, reliance on testing can never deliver defect-free software. Instead, it's necessary to somehow _prevent_ errors in the first place.

  • @nelsonahlvik6650
    @nelsonahlvik6650 Před 8 měsíci +4

    To me the worst part of this was the removal of hardware interlocks. Software can NEVER be relied on 100%, even if it has been extensivley tested. Physical switches and relays should ALWAYS be in place for safety critical applications. If there were hardware interlocks in place in the Therac-25 this would never had happened. Sure, the bug would still have been there, but the machine couldn't have hurt anybody as the beam PHYSICALLY would not have been able to activate without the magnets in place.

    • @Axel_Andersen
      @Axel_Andersen Před 8 měsíci

      For many systems today "hardware interlocks" are not feasible. It is not possible to implement, say, antilocking brakes or fly-by-wire systems with hardware performing the safety role. Or say a medication calculation software or patient medical record system. A wrong dose or a missing re-call for a patient that has cancer or the wrong patient's data shown to a doctor can all kill.

  • @SumanRoy.official
    @SumanRoy.official Před 8 měsíci +1

    Please release more stories like these :)

  • @thomasrichards8055
    @thomasrichards8055 Před 8 měsíci +2

    Even in the ‘90s there was still that attitude of software doesn’t fail. Take the despatch software that the London Ambulance Service started using in the ‘90s (LASCAD).

  • @theinquisitor18
    @theinquisitor18 Před 8 měsíci +5

    I know hindsight's 2020, but I don't know why there wasn't an event handler, even a basic one, so that nothing would happen without operator input. I understand this was one person, and I truly admire that they built this by themselves. They had to be under a shit ton of stress because someone that talented should be able to foresee the issues with reading inputs prematurely.

    • @khatdubell
      @khatdubell Před 8 měsíci

      Everything happened with operator input.

    • @theinquisitor18
      @theinquisitor18 Před 8 měsíci +3

      ​​​@@khatdubellI mean an explicit event handler, such as a button. The machine was doing stuff while she was still entering/correcting data. Nothing should have occurred until she was done entering the information, and she hit a commit button.

    • @khatdubell
      @khatdubell Před 8 měsíci

      @@theinquisitor18 I see.

  • @RedPlayer_1
    @RedPlayer_1 Před 8 měsíci +5

    Already saw Kyle Hill's video on the topic but its cool to see a more programming oriented approach

  • @dkzv12
    @dkzv12 Před 6 měsíci +1

    For me as a programmer for machinery it is the typical "blame everything on the programmer" thing. It is normal, that code has bugs and you will never find all of them. Therefore mechanical and electrical safety locks have to be implemented to prevent such malfunctions. In this case the software didn't do much wrong. It even gave an error message. The main software problem was, that it was possible to skip the message and continue.
    The main problems in this case were the removed hardware lock in this newer model of the machine for cost reduction. And the decision of the management to let the costumers continue using this devices, even after more than one accident was reported with this machine type.

  • @peters8758
    @peters8758 Před 5 měsíci +1

    Don't forget, in the 1970's even 16 kilobytes was a lot of DRAM. Devoting a few KB's to error codes or safety redundancies would have been a huge deal.

  • @godnyx117
    @godnyx117 Před 8 měsíci +5

    As someone who knew and (like EVERYONE) was "bored" to do testing on my software, I have now done a complete 180 degree turn and testing is ALWAYS in my mind!
    Test your software people! Write A LOT of SIMPLE and easy to debug tests (because remember, tests are code as well and they may have bugs)! And try to think about edge cases!

  • @AediusFilmania
    @AediusFilmania Před 8 měsíci +5

    The one person that write all the assembly and didnt test it managed to kill only 6 person !
    Honestly, i'm very impress

    • @woosix7735
      @woosix7735 Před 8 měsíci +1

      haha. I mean back in the day everything was written in assembly because programing languages weren't a thing yet.

    • @AediusFilmania
      @AediusFilmania Před 8 měsíci

      @@woosix7735 agreed !
      But just thinking about all that could go wrong I wouldn't have had the shoulder for the job xD

    • @woosix7735
      @woosix7735 Před 8 měsíci

      absolutly true

    • @godnyx117
      @godnyx117 Před 8 měsíci +4

      @@woosix7735 Programming languages were a thing since the 40s, what are you talking about????

    • @309electronics5
      @309electronics5 Před 8 měsíci

      It was a student who was a hobbyist what ya expect? A real engineer would have thought about everything

  • @JordanBeagle
    @JordanBeagle Před 8 měsíci +1

    5:10 "Single hobbyist programmer alone in assembly" was all we needed to know

  • @xxxxxxxxxx6903
    @xxxxxxxxxx6903 Před 2 měsíci +1

    The bigger takeaway from this story isn't that ancient code lacked logic and user input safeguards. Rather, that Therac's upper management made unethical design choices to lower the cost of production. Coupled with minimal pre-shipment testing of said units. "It was decided to remove the physical (electro/mechanical) safeguards and rely entirely on software to lower costs"!

  • @korbinadkins2610
    @korbinadkins2610 Před 8 měsíci +3

    please do more of these videos.

  • @Saturate0806
    @Saturate0806 Před 8 měsíci +12

    The problem is that the treatment mode (the state) is saved in the ui. However the ui should not have saved the state but rather queried the hardware in stead. Once you create copies of data, you create extra work you self of keeping them in sync. Just query the source of the data in stead

    • @benhetland576
      @benhetland576 Před 8 měsíci +5

      You're definitely on to something there. I've seen many times similar problems arising in multithreaded code (non-life threatening, luckily...) over the years just because the same information is stored at more than one location within the program. Often it is stored in one or more UI-related places plus in the program's core "business model" code (whatever that does). One can avoid race conditions all the way to your heart's liking, but still fail to keep _all_ values in sync.
      Sometimes one user input may depend on another input, which might even be hidden somewhere _else_ in the UI (eg, different tabs in a Windows dialog box), but the code that validates the dependent input retrieved (a copy of!) the other controlling input just at the wrong time. That could appear similar to reading the E vs X entry before validating the dose entry in this case.

    • @mieszkogulinski168
      @mieszkogulinski168 Před 8 měsíci

      Sounds like a task for React

    • @benhetland576
      @benhetland576 Před 8 měsíci

      @@mieszkogulinski168 You should not make it even worse than it already is by introducing the messiness and unpredictability of Javascript into the mess...

  • @fencedfruit940
    @fencedfruit940 Před 7 měsíci

    Heard about this from my Software Engineering class, but never really looked into it. Great video!

  • @usxquantum
    @usxquantum Před 8 měsíci +1

    0 accountability, 0 repercussions for actions, = 0 desire to fix anything

  • @spookdot
    @spookdot Před 5 měsíci +4

    I've heard many scary stories, but as a programmer, this is true horror. Not just because they did almost nothing to minimize errors like this, nothing that we would do today was done there. But it almost sounds like someone seriously programmed a deadly device with speed over safety in mind. I don't know why but that thought horrifies me

    • @franklofarojr.2969
      @franklofarojr.2969 Před 5 měsíci

      Like a Tesla

    • @spookdot
      @spookdot Před 5 měsíci

      @@franklofarojr.2969 I never considered that, but now that I'm thinking about it and considering what I've heard about them. Yeah, that is probably how Tesla programs its vehicles

  • @philipc4272
    @philipc4272 Před 8 měsíci +10

    Your explanation of the actual failure is not correct, as far as I can tell, reading a report into the incident. Firstly, you can't shape X-Rays (or any photon beam) with magnets. These magnets are used to shape the electron beam when in electron mode.
    Secondly, the machine produces an electron beam as its native source of radiation, and needs a target in front of the beam to produce X-Rays instead, as well as a flattening filter which attenuates a lot of the beam energy. Producing X-Rays requires the maximum electron beam power, a power around 100 times greater than that used in electron mode. Therefore, the patient did NOT "receive X-Ray radiation at electron radiation doses", they actually received electron radiation at X-Ray radiation doses, since neither the target nor the flattener is placed in the beam (7:43)

  • @SilverRingss
    @SilverRingss Před 7 měsíci +1

    It’s insane that the people who ignore flaws like this aren’t held criminally responsible. I get it, less people would risk developing revolutionary things if that were the case but at some point negligence needs to outweigh intention.

  • @aldairacosta4393
    @aldairacosta4393 Před 8 měsíci +2

    Real men test in production

  • @ZarHakkar
    @ZarHakkar Před 8 měsíci +3

    2:20
    It still said OP. MODE: TREAT X-RAY in the bottom left. Unfortunate that the operator didn't see or recognize that.

    • @bubbacat9940
      @bubbacat9940 Před měsícem

      I am guessing that the video was a recreation and wasn't accurate to what was actually displayed on the machine but i could be wrong

  • @dom1437
    @dom1437 Před 8 měsíci +5

    The epitome of breaking prod

  • @HelieNerb
    @HelieNerb Před 8 měsíci +1

    Bro imagine being that programmer and realizing your program caused the death of 6people 💀 that’s actually so sad

  • @kurtcpi5670
    @kurtcpi5670 Před 6 měsíci +2

    This highlights how even simple syntax errors can compile and run, but not work as intended. There's an old joke that only people who code will get, but it's hilarious because everyone who codes in multiple languages has had to contend with the differences in syntax:
    if (GoNuclear = 1) {
    launch_nukes();
    }
    else {
    remain_chill();
    }

  • @01001000010101000100
    @01001000010101000100 Před 8 měsíci +6

    I wonder what some software bugs from old times could do in software controlling the nukes. I mean those systems are pretty old...

    • @Uerdue
      @Uerdue Před 8 měsíci

      Yeah, forget about "nuclear deterrence" - the only reason humanity has managed to refrain from starting a nuclear war yet is the collective fear of some off-by-one error in an old piece of COBOL code causing the missile to detonate right at the start. How embarrasing would that be!

    • @benhetland576
      @benhetland576 Před 8 měsíci

      But that can apply to software from the "new times" just as well...

  • @ErazerPT
    @ErazerPT Před 8 měsíci +3

    IMHO the most egregious part was not the race condition, but that there wasn't a "check before commit" on the "activation phase". If it had a "Execute" option, that then signaled a task to start execution AND said task then copied the input values present AND checked them for sanity AND presented the check results to the operator AND then executed WITH operator confirmation this wouldn't happen so so easily. If i had a €1 for every piece of s**t I've had to deal with just because the powers that be are adamant not to inconvenience the users with proper confirmations...
    p.s. i was the "paranoid guy" that wrote checks and bounds at the front facing UI, the back end server code, the SQL Stored Procedures and the DB Schema. Sure, you waste resources, but yeah, I'd rather have something more solid than something that "fails spectacularly real fast".

    • @khatdubell
      @khatdubell Před 8 měsíci

      Yes, this was my takeaway.
      He tried to save time by executing the operation in real time as you're typing it into the terminal, but trying to save 8 seconds cost 6 people their lives.
      Premature Optimization _is_ the root of all evil.

  • @rpeetz
    @rpeetz Před 8 měsíci +2

    Yes no matter the field, the end user will always find a way to break software.