Real men test in production… The truth about the CrowdStrike disaster

Sdílet
Vložit
  • čas přidán 6. 09. 2024
  • Try Brilliant free for 30 days brilliant.org/... You’ll also get 20% off an annual premium subscription
    An analysis of CrowdStrike's official explanation of the code hat resulted in the largest IT disaster in history that crashed 8.5 million Windows computers in a single day.
    #programming #crowdstrike #thecodereport #windows
    💬 Chat with Me on Discord
    / discord
    🔗 Resources
    CrowdStrike Official Explanation: www.crowdstrik...
    Analysis from Twitter x.com/taviso/s...
    CrowdStrike Part 1 • Some bad code just bro...
    It works on my machine • 7 Things No Programmer...
    🔥 Get More Content - Upgrade to PRO
    Upgrade at fireship.io/pro
    Use code YT25 for 25% off PRO access
    🎨 My Editor Settings
    - Atom One Dark
    - vscode-icons
    - Fira Code Font
    🔖 Topics Covered
    - Analysis of CrowdStrike disaster
    - Why did windows machines crash?
    - What causes the blue screen of death
    - How do windows kernel drivers work

Komentáře • 2,3K

  • @richdobbs6595
    @richdobbs6595 Před měsícem +9257

    I once was a employee at Carbon Black, a competitor to CrowdStrike, working in automated testing. It was competitive with the worst software development practices of any organization I've ever been exposed to. The devs were fairly smart, but the assumption was that the purpose of testing was to bless the code that they had written. I agreed to step up and manually test one dev's code, and I reported back that every time that I tried to run the code it killed the process without leaving any diagnostics. The dev said, how can I troubleshoot the problem without any good data? I looked at his code and identified he was not checking for null pointers but just dereferencing them anyway. This was an important step in getting myself terminated as not being a team player.

    • @GrizikYugno-ku2zs
      @GrizikYugno-ku2zs Před měsícem +333

      "Say you're a Karen without saying you're a Karen" - robot woman's voice

    • @MrDragonorp
      @MrDragonorp Před měsícem +1144

      Bro you weren't a team player. Didn't you hear, real men only fix after production

    • @JoseMonteverde
      @JoseMonteverde Před měsícem +389

      Classic Señor dev

    • @XeenimChoorch-nx8wx
      @XeenimChoorch-nx8wx Před měsícem +69

      ZII. Zero is initialization. Controlled failure vs uncontrolled failure

    • @cybervigilante
      @cybervigilante Před měsícem +177

      @@MrDragonorp TAF - Test After Failure

  • @rosgoncharuk2403
    @rosgoncharuk2403 Před měsícem +2763

    Anyone who's working in IT should only be surprised how doesn't this happen each month.

    • @XIIchiron78
      @XIIchiron78 Před měsícem +219

      Only through the tireless efforts of countless engineers correcting other people's stupidity (and sometimes their own) does the world make it through another day without disaster.

    • @millanferende6723
      @millanferende6723 Před měsícem +42

      If only things were about you know quality, and truth... instead of "who can get the most attention the fastest."

    • @dickduquesne
      @dickduquesne Před měsícem

      that's exactly what I was thinking, indeed

    • @HD-fc4ds
      @HD-fc4ds Před měsícem +18

      working in IT and working in multibillion security company is different.

    • @tokopiki
      @tokopiki Před měsícem +54

      I'm working in IT and I'm everyday amazed how far we've made it without the modern civilization folding on itself. Being part of that clusterf%ck in the belly of the beast is equally awesome and terrifying.

  • @h3w45
    @h3w45 Před měsícem +5345

    It's very self asssuring to know that I and a programmer at one of the most advanced tech companies have the same practices

    • @PhilLesh69
      @PhilLesh69 Před měsícem +60

      I always copy the entire path of the existing code into a -bak or -date directory, then run the new code in place to test it on one production level server, before deploying it to the rest of the servers. That way in can use scp to copy the old working coy back if things really go belly up.
      But I guess when they rely on automation and all kinds of layers of abstraction between them and the code, they cannot do it that simply and easily.

    • @ForcefighterX2
      @ForcefighterX2 Před měsícem +85

      Of course they cannot (simply) test their code in production level environments. Corporations have made un-maintainability into an art form, where a single deployment-step is so automated, but requires so many manual steps as well, that no single person can ever deploy anything easily.
      And when you are new to the organization, and learned for the first time how insanely convoluted their deployment process is, you undoubtedly asked "why!?". But as always the answer is "has grown historically" (legacy). And by that time you entered the organization, it would take weeks or months to re-implement this insane architecture into something which can actually be deployed in a sane manner.
      But we all know to never touch a running system. Even if it's a running nuclear bomb close to detonation.

    • @maleldil1
      @maleldil1 Před měsícem

      @@PhilLesh69 have you heard about Git?

    • @TheTexasTyrant323
      @TheTexasTyrant323 Před měsícem +3

      So what’s up with the whole Diddy thing?

    • @scottgillespie2690
      @scottgillespie2690 Před měsícem +17

      I don’t see many engineers on LinkedIn with more than a year or two of experience at a company before they move on. It took me a few months to understand our codebase and with all of the reorgs and compounding layers of rotating management it made it difficult for anyone to sit and focus on much of anything for very long.

  • @alkebabish
    @alkebabish Před měsícem +663

    As a web developer, I can confirm testing in production is the best way to go: the added pressure focuses you, and it saves having to push things to production. The way I like to do it is over ftp with notepad. Or if I'm on the toilet I'll use my phone and edit the files directly using the cpanel file manager. If my software was running on all the most vital computers in the world, I imagine that pressure would make me sharp as a knife and I'd never make a mistake.

    • @awrjkf
      @awrjkf Před měsícem +46

      Chad

    • @ian-tumulak
      @ian-tumulak Před měsícem +21

      Real web devs fix in production.

    • @trueperson-o2z
      @trueperson-o2z Před měsícem +20

      Holy fuck I haven't laughed this hard at a comment in months

    • @azibekk
      @azibekk Před měsícem +9

      As a developer i like to watch ci/cd on the toilet too 😂 it really helps me to not sit too much in the toilet

    • @westevo
      @westevo Před měsícem +1

      Giga Chad!

  • @zirconium5849
    @zirconium5849 Před měsícem +351

    As a Rust programmer I cac confirm that this was our plan to get Rust into production

    • @ashwithchandra2622
      @ashwithchandra2622 Před měsícem +18

      Even upgrading to c++23 will work instead of changing whole code to rust.

    • @lucascamelo3079
      @lucascamelo3079 Před měsícem

      ​@@ashwithchandra2622wait until c++60, safe memory edition.

    • @conquerorofindia
      @conquerorofindia Před měsícem

      ​@@ashwithchandra2622works! but fails in prod😢.

    • @5uryaprakashPi
      @5uryaprakashPi Před měsícem

      ​@@ashwithchandra2622, how? What is so special about CPP 23

    • @backslash68
      @backslash68 Před měsícem

      @@ashwithchandra2622 what feature of c++23 will make it immune to null pointer de-referencing?

  • @ray-mc-l
    @ray-mc-l Před měsícem +2137

    Hahaha damn that "Real men test in production" pic with the submarine guy killed me

    • @TheGunnarRoxen
      @TheGunnarRoxen Před měsícem +203

      it certainly killed the sub guy...

    • @ethereal2620
      @ethereal2620 Před měsícem +96

      Wait until you find out that his last name was *Rush.* 😮

    • @Kronabyss_
      @Kronabyss_ Před měsícem +7

      I bet he thought the same thing

    • @XDarkGreyX
      @XDarkGreyX Před měsícem +5

      Jeff made that joke in a vid right after that disaster already. If you enjoyed this, go back and watch that.

    • @alaskandonut
      @alaskandonut Před měsícem +10

      Stockton Rush

  • @smithwillnot
    @smithwillnot Před měsícem +950

    I just finished my first puzzle on Brilliant and got invitation for job interview in CrowdStrike. Wish me luck boys.

    • @stsm6192
      @stsm6192 Před měsícem +19

      Crack up, like it👍😆

    • @ViriKyla
      @ViriKyla Před měsícem +2

      Lololol

    • @stsm6192
      @stsm6192 Před měsícem +8

      @@smithwillnot good luck i know you will do a better job, test check, test check, test check, lol

    • @stsm6192
      @stsm6192 Před měsícem +5

      @@smithwillnot oh i forgot disable automatic updates on any OS before you send out the updated patches ha ha,

    • @generationm2059
      @generationm2059 Před měsícem +15

      Don't forget to test in production on a Friday and have another job on standby!

  • @huy1k995
    @huy1k995 Před měsícem +4979

    Real men test in production... ON A FRIDAY

    • @pingvingaming
      @pingvingaming Před měsícem +281

      The best code is the Friday 5 minute before go home time is the best code

    • @donaldobrien9171
      @donaldobrien9171 Před měsícem +63

      The competent folks are on summer vacation

    • @mesiroy1234
      @mesiroy1234 Před měsícem +14

      F yeah Ianit ever ncode nerd
      Litrealy never coded in my life
      BUT I KNOW DONT UPDATE ON FRIDAY

    • @sanketjadhavar
      @sanketjadhavar Před měsícem +25

      At 5:30 PM😂😂😂

    • @weho_brian
      @weho_brian Před měsícem +63

      actually real men don't test their code at all, they just push their code and wait for a scream test

  • @weston8400
    @weston8400 Před měsícem +61

    In my experience here's how problem solving with code works.
    "I want to solve this problem with code. Here is my plan"
    "Let's write the code now."
    "Testing the code. Oh no, there are bugs."
    "I fixed the bugs."
    "Oh wait, what's this?"
    "This problem has to do with stuff I can't just fix, guess I'll work around it."
    "I hate my life, this is really hard."
    "This works, but it shouldn't. It looks ugly and I hate it."
    "Whatever, it's working."

    • @mukta4689
      @mukta4689 Před měsícem +6

      *pushes code in production*
      *crash*

    • @9tales9f
      @9tales9f Před měsícem +2

      "This works, but it shouldn't."
      and that's where you ask for your friend's device

  • @reedmclean3574
    @reedmclean3574 Před měsícem +42

    3:50 This felt like a personal attack, I'm literally writing a to-do list application right now as one of my first apps.

    • @GammaFn.
      @GammaFn. Před měsícem +6

      Don't feel attacked, writing your own to-do app is a right of passage

  • @the_primal_instinct
    @the_primal_instinct Před měsícem +4199

    Commented faster than CrowdStrike devs push into production

  • @bryokyo
    @bryokyo Před měsícem +943

    "Most of us will be dead by then", that got me rolling

    • @ImperativeGames
      @ImperativeGames Před měsícem +29

      There is nothing funny about nuclear war.

    • @almaximus03
      @almaximus03 Před měsícem +7

      That got me rolling too😅

    • @gustavo9758
      @gustavo9758 Před měsícem +22

      I thought I got the joke... until I actually did and was like "wait a minute..."

    • @ArawnOfAnnwn
      @ArawnOfAnnwn Před měsícem +4

      ​@@ImperativeGames So you say. I find it hilarious! 😅😎

    • @trumpetpunk42
      @trumpetpunk42 Před měsícem

      In 2021 my previous employer invited a pair of supposed doctors to tell us with a straight face that we would literally all be dead in five years if we didn't get the experimental injection. 2026 confirmed!

  • @kwan8247
    @kwan8247 Před měsícem +2035

    1:13 what the hell is this stock video lmao

  • @StarLightDotPhotos
    @StarLightDotPhotos Před měsícem +21

    This was 100% a culture issue. I left Crowdstrike in March of 2024 specifically because of these types of quality issues. I never expected anything to blow up this bigly, but the culture that enables this type of thing is why I left.

  • @NotGarbageLoops
    @NotGarbageLoops Před měsícem +187

    Programmers are generally terrified about missing deadlines and will do whatever you command them to. It's up to the project manager to track delays and ensure the boss is notified in advance that deadlines will be missed. It's up to the boss to ensure they have good project managers and QA testing practices. Yes, this is indeed an organizational failure.

    • @WhiteSharks-wz6kn
      @WhiteSharks-wz6kn Před měsícem +2

      So are all devices that used Crowdstrike unusable now and need a fresh windows install?

    • @MatthewDeveloper
      @MatthewDeveloper Před měsícem +18

      ​@@WhiteSharks-wz6knNot really, just boot into safe mode and get rid of the borked driver.
      This is sure going to be annoying for the IT team if they need physical access to do it, and don't forget this must be done for EVERY DEVICES.

    • @akin242002
      @akin242002 Před měsícem +6

      ​@WhiteSharks-wz6kn No. Just need to delete the latest Crowdstrike driver. Usually 2 major steps.
      A) Either get the specific encryption key access to the company laptop/desktop first or go straight into safe mode.
      B) Go to the command prompt and delete the latest Crowdstrike driver file (c-00000291*.sys).
      FYI... I work in IT. Out team of 13 had to go through this process for 700+ employee laptops 💻 on Friday. Some old and some new. Interesting stories to tell at a bar or on Reddit.

    • @klausstock8020
      @klausstock8020 Před měsícem

      @@akin242002 What everyone hears: "Delete the latest Crowdstrike driver file (c-00000291*.sys)."
      What every malware author hears: "Delete all CrowdStrike files (c-00*.sys).".

    • @andrewroberts7428
      @andrewroberts7428 Před měsícem +1

      the existence of project managers is often an organizational failure

  • @intp
    @intp Před měsícem +514

    The company I work for has under 200 employees, under 30 devs, and we devs are writing education software. But even we have 5 levels test environments before any change hits production. That's besides automated tests written by the API devs, automated tests written by the front end devs, and automated end-to-end testing by the QA team. Then there is required peer reviews of all code, and the QA dev manual testing. It's scary if a software company with such a critical product is releasing code without at least these guard rails.

    • @JxH
      @JxH Před měsícem +26

      "The company I work for has under 200 employees, under 30 devs, ..."
      FYI - The number zero (two places) is compatible with that sentence structure.

    • @darrennew8211
      @darrennew8211 Před měsícem +44

      I heard from one employee that there's no automated testing. Also, this update was flagged to pass all canary testing at individual companies and to deploy everywhere immediately. And the driver itself is flagged that if it fails during boot Windows shouldn't disable it and boot anyway. The file that caused the crash was all zeros content. This is either intentional and someone shorted a lot of stock, or it's criminally negligent.

    • @Kenionatus
      @Kenionatus Před měsícem +29

      ​@@JxHI'd be highly impressed if zero devs managed to pull off that much procedure.

    • @reverse_shell.asm.sh.exe1
      @reverse_shell.asm.sh.exe1 Před měsícem +23

      worked at a place that had 15 "QA" that all they did was click on the functionality, didn't even read the code, send it to the client to click around and than push to production, worst company I've ever worked at, fuck those guys - when I raised this issue I was fired withing 2-3 weeks!

    • @Anda146
      @Anda146 Před měsícem +3

      ​@@reverse_shell.asm.sh.exe1 Dodged a bullet in my book. I hope you are well in another firm now. 😊

  • @Guru4hire
    @Guru4hire Před měsícem +1051

    The idea that a rust enthusiast would "prove a point" is the most believable thing in the world.

    • @reverse_shell.asm.sh.exe1
      @reverse_shell.asm.sh.exe1 Před měsícem +7

      idk I like the point of where this was done on purpose to practice for a real event

    • @rj7250a
      @rj7250a Před měsícem +28

      I mean, if the driver was writen in Rust then it would crash anyway, since Rust by default crashes on memory unsafety.
      The c++ code already checked for null pointer as mentioned in the last twitter thread in the video.

    • @alexanderSydneyOz
      @alexanderSydneyOz Před měsícem +3

      "The idea that a rust enthusiast would "prove a point" is the most believable thing in the world."
      Well, other than what actually happened

    • @minerscale
      @minerscale Před měsícem +9

      ​@@rj7250a I feel like a kernel driver should probably have a panic handler that unloads or maybe restarts the driver with a count of number of retries. That way any unrecoverable errors (bar compiler bugs/unsafe block promises not being kept) will not bring down the system

    • @jfbeam
      @jfbeam Před měsícem +21

      @@minerscale I see you don't know much about writing kernel-mode stuff. Unlike a userspace application, nothing is tracked in kernel space, so there's no way to know how to "restart" or unload the offending driver... or anything that has been commingled with it. You have to the driver's shutdown and exit code; once it's done anything "bad", none of its data structures can be trusted, and by extension, the entire kernel, as in ring 0 it could've messed with literally anything.

  • @treyquattro
    @treyquattro Před měsícem +604

    that "real men test in production" meme was sick!

    • @alxk3995
      @alxk3995 Před měsícem +21

      That got created right after the incident. But it's gold. 😂

    • @christopherg2347
      @christopherg2347 Před měsícem +29

      One could say it was...Titanic.

    • @seanburke424
      @seanburke424 Před měsícem +11

      "Everyone has a QA system, but not everyone has a production system"

    • @nicholasvinen
      @nicholasvinen Před měsícem +17

      I don't always test my code, but when I do, I do it in production...

    • @christopherg2347
      @christopherg2347 Před měsícem +1

      @@seanburke424 That saying doesn't make sense to me.

  • @GoogleDoesEvil
    @GoogleDoesEvil Před měsícem +60

    This isn't even the first time this quarter CrowdStrike caused a bunch of machines to kernel panic/bug check. In June, Falcon Sensor was causing RHEL 9.4 to kernel panic. In April, it caused Debian to kernel panic. In both of those cases though it was a Linux kernel bug.

    • @SaraMorgan-ym6ue
      @SaraMorgan-ym6ue Před měsícem +4

      Crowd strike crashed linux a few months back crash Microsoft now it's Macs turn dun dun dun🤪🤪

  • @techgroveusa
    @techgroveusa Před měsícem +54

    Emphasizing quality assurance and the organization's responsibility underscores why continuous integration and proper testing are so crucial.

  • @cyfrowymuza
    @cyfrowymuza Před měsícem +637

    that's right - a classic null pointer dereference... nobody expects the spanish inquisition

    • @traveller23e
      @traveller23e Před měsícem +24

      it's such an insufficient explanation, a null pointer dereference is a symptom not the root cause.

    • @mesiroy1234
      @mesiroy1234 Před měsícem +3

      Ianit ever ncode nerd
      Litrealy never coded in my life
      BUT I KNOW DONT UPDATE ON FRIDAY😊

    • @TestyMcTestypants
      @TestyMcTestypants Před měsícem

      They must now sit in the comfy (gamer) chair.

    • @johnsmith1953x
      @johnsmith1953x Před měsícem +3

      LOL! This has been a problem since the late 1970s.
      and it STILL IT!!! OMG!!!!

    • @rj7250a
      @rj7250a Před měsícem +6

      Again, it was not a null pointer, there was a null check in the code.

  • @nova_supreme8390
    @nova_supreme8390 Před měsícem +36

    The prosecutor: Show me on this graph where did Crowdstrike touch you?
    Windows: "points at the kernel and starts to sob"
    The prosecutor: I have no further questions, your honor.

  • @FireStormHR
    @FireStormHR Před měsícem +184

    WHY DIDNT I KNOW THAT 1:30 VIDEO EXISTS?

    • @wesleyrm
      @wesleyrm Před měsícem +21

      Pure GOLD

    • @csvscs
      @csvscs Před měsícem +10

      Please share a link to it!!!

    • @geeshta
      @geeshta Před měsícem +32

      It's called Making of WrestleMania: The Arcade Game it's on YT

    • @bsherman8236
      @bsherman8236 Před měsícem +1

      Some memes never get old

  • @ericwelsh4853
    @ericwelsh4853 Před měsícem +174

    I worked at a small Dot Com in the early 2000's. We had a QA process for pushing changes to the production web sites.
    After the QA department had tested a new release, the QA manager manually signed a form that was printed on a sheet of paper, then that sheet of paper was handed to the sysadmin responsible for deploying changes to production.
    Seems like a foolproof process?
    Nope.
    After working there a few months, the QA manager told me that the producers (product owners) were printing out those forms and forging the QA manager's signature.
    We had no idea we were pushing untested code to production, yet until we found out about this we were being blamed because the production web sites were unreliable.

    • @reverse_shell.asm.sh.exe1
      @reverse_shell.asm.sh.exe1 Před měsícem +28

      worked this year in a company that had their QA not look at the code at all, have them just test the functionality by clicking shit on the website, than send to the client (which doesn't understand code) to test it by also clicking shit around and if QA and the client said ok it was pushed to production.. Once I said code needed to be reviewed I lasted another 2-3 weeks before getting fired! fuck those guys, I hope that reporting their asses actually made something happen, but I doubt it

    • @Kreze202
      @Kreze202 Před měsícem +8

      Interned on a major national telcom company as a Security Business Partner, the company had quite a rigid pentesting system where every new system or update requires a form that requires 2 written signatures, one from the higher ups of the cybersec team that confirms that the new asset is good to go for prod and one from the dev team. Turns out some dev teams (the company had multiple dev teams for different projects) just pushed to prod anyway without ever having this signed form or even requesting the cybersec team for one.

    • @ericepperson8409
      @ericepperson8409 Před měsícem +4

      It's still a more robust system than most software companies employ these days. Somehow Agile is thought to mean in a lot of teams, it if compiles, it's good to go.

  • @RainingArtillery
    @RainingArtillery Před měsícem +21

    Let's also mention that not only does the driver run in kernel mode, but it's flagged as running on boot. That is why this outage was so bad: Bluescreen because of driver -> Reboot, ah, this driver is marked as an essential part of the system that we can't boot without -> Bluescreen. Meaning them rolling out a fix will not fix machines automatically, an IT tech has to go over to every single machine and manually reboot in safe mode to have the fix actually applied.

  • @abg44
    @abg44 Před měsícem +446

    This goes to show that outsourcing to one single third party for Kernel intrusion detection isn't the best idea ever, lol

    • @pluto8404
      @pluto8404 Před měsícem +93

      or having universal automatic updates pushed to your machine.

    • @JxH
      @JxH Před měsícem +9

      So you want Norton, and McAfee, and Kaspersky, and CrowdStrike, and ... ALL installed at once ?

    • @lachlanmckinnie1406
      @lachlanmckinnie1406 Před měsícem +54

      @@JxH More like some companies use product A, other companies use product B, not a single using all at once. To use an agricultural analogy, you want a security polyculture, as monoculture is vulnerable to disease.

    • @roganl
      @roganl Před měsícem +24

      @@lachlanmckinnie1406 The great clownstrike famine of `24.

    • @robertfiedor7559
      @robertfiedor7559 Před měsícem

      m try

  • @systematicpsychologic7321
    @systematicpsychologic7321 Před měsícem +1661

    Regarding option 3: just wait to see if in 2025 you start hearing "The new government requested data that unfortunately was irrevocably lost during the Crowdstrike debacle."

    • @elderman64
      @elderman64 Před měsícem +102

      Wouldn't be surprised to see that happening just in 2024 itself

    • @Uveryahi
      @Uveryahi Před měsícem +1

      😮! well not that 😮

    • @justsignmeup911
      @justsignmeup911 Před měsícem +42

      Funny how that only happens to government systems

    • @remigiuszbloch
      @remigiuszbloch Před měsícem +87

      or Secret Service internal communication history was lost during Crowdstrike situation... as they say: don't let crisis go to waste...

    • @flowerofash4439
      @flowerofash4439 Před měsícem +26

      don't tell me they are going to fly a plane straight to a server and blame the asians and their budhism...

  • @mhadi-dev
    @mhadi-dev Před měsícem +240

    "It's an organization failure" - A great programmer once said.

  • @PSP92262
    @PSP92262 Před měsícem +20

    The fact that QA doesn't seem to be a thing anymore is mind-boggling.

    • @backslash68
      @backslash68 Před měsícem +2

      what do you think? we are in the Agile era now. Fail fast, fail often, QA is not needed.

    • @Blur4strike
      @Blur4strike Před měsícem +1

      QA costs money and companies are loath to part with the money for proper QA. Better to let the customers/clients do the testing for them as it's cheaper.

  • @michaelogden5958
    @michaelogden5958 Před měsícem +11

    I'm a retired IT guy, part of a team that did global pushes quite regularly. While a flaw in one of our pushes might "only" take down our presence on the web, there were layers upon layers of pre-push testing, staged releases, and so forth. I remember the pucker factor each and every time we did a "for real" push. I empathize when I hear of D'oh!!! misadventures.

  • @johnwilliams3075
    @johnwilliams3075 Před měsícem +159

    "Failing upwards" seems to equal "They ~sure~ look great in a suit, let's promote them!". I've seen this over, and over, and over, over the last 30+ years, and it never ends well. It usually goes one of two ways:
    1. The person in charge of a thing ends up being so bad or disinterested in their job that some really important thing ends up spectacularly failing even though they avoid blame (ie. today's example), and they stick around to screw up the next thing they're put in charge of. Occasionally they suffer the consequences of their ignorance, but by then the organizational and repetitional damage is done.
    2. They muck around for a few years, cluelessly rising on the org chart until they shuffle off to some new employer who's even more impressed with their fashion sense, usually leaving behind a two-comma morass of overdue projects, impossible deadlines, expensive and inappropriate software subscriptions, disgruntled technical staff, and the like.

    • @planescaped
      @planescaped Před měsícem +35

      More that they know how to talk. The distance one can get simply by confidantly bullshitting your way through life is incredible.

    • @davidtitanium22
      @davidtitanium22 Před měsícem

      I'm convinved that people need to be a certain level of psychopath to be "leaders" and it has nothing to do with their competence

    • @markmendez3939
      @markmendez3939 Před měsícem +5

      Does anyone remember on what basis Israel chose their first king?
      ... That guy would look good in a crown

    • @XIIchiron78
      @XIIchiron78 Před měsícem +18

      The thing to understand is that the C level doesn't work for the company or for the customers. They work for the shareholders. So CEOs who make obviously and openly stupid decisions outwardly are often just in effect cooking their books by sacrificing everything else to cut expenses and deliver a quarterly return. And then they bail with a great resume and a bunch of money before everything implodes. Or sometimes even after it implodes, because shareholders don't care and can easily move on to the next legacy brand with their gains. They know when to get out.
      This practice of corporate looting that pervades America started pretty much with Jack Welch who gutted GE while managing to earn an entire cult following for doing so.

    • @szilardfineascovasa6144
      @szilardfineascovasa6144 Před měsícem

      @@XIIchiron78Someone that gets it.

  • @BeepBoop2221
    @BeepBoop2221 Před měsícem +194

    It's boeing all over again, engineers and QA replaced with suits.

    • @csibesz07
      @csibesz07 Před měsícem +5

      Yeah. From reflex, I compared it to that disaster when explaining to others.

    • @BeepBoop2221
      @BeepBoop2221 Před měsícem

      @@csibesz07 crowdstrike is now blaming businesses for not having disaster recovery!

    • @gezenews
      @gezenews Před měsícem

      replaced with slaves.

    • @allangibson8494
      @allangibson8494 Před měsícem +12

      Also done in India in a “low cost engineering center”. Lunch time Friday roll out of updates…

    • @angkhoa1216
      @angkhoa1216 Před měsícem +5

      @@allangibson8494Hopefully after this fiasco and trump’s being president, the damn suits can stop outsourcing important shits

  • @b4ttlemast0r
    @b4ttlemast0r Před měsícem +112

    What's crazy is that the update didn't even change any executable file. A change to a data file should not be able to crash the entire program and even operating system.

    • @AIrtfical
      @AIrtfical Před měsícem +11

      Not true a misconfigured config file yaml json toml files regularly cause parsing crashes however it’s unacceptable that a tool like this isn’t resilient to fail safely and gracefully. It’s running as the windows root or in the 0 layer perhaps it crashed it detected itself as a threat or the Os ? Unsure but static config can definitely causes crashes unsure why the bsod was happening unless the OS runtime requires this service to be running or fail this way which would be weird.

    • @Roboprogs
      @Roboprogs Před měsícem +7

      One level’s data is another level’s code, sometimes.

    • @opposite342
      @opposite342 Před měsícem +4

      ​@@AIrtfical
      it's exactly what you said actually. The program forces itself as a requirement for windows to be functional

    • @BinToss._.
      @BinToss._. Před měsícem +13

      @@AIrtfical It's a boot-start driver. If any boot-start driver experiences an unhandled exception, the entire boot sequence fails. If Windows detects and disables a bad boot-start driver (I don't know if it can), the system would be running (yay), but it would violate company policy by running without a required software (uh-oh).

    • @obsolete959
      @obsolete959 Před měsícem +7

      Kernel-level operations have to crash the system when encountering an error, because not crashing can lead to far worse outcomes when dealing with direct memory access. It is by design, and smart design at that.
      Now you can argue that not being able to boot without the faulty driver instantly after is not the smartest design, but that's on Crowdstrike for flagging their drivers are boot-start drivers.

  • @NFSHeld
    @NFSHeld Před měsícem +8

    By the way, Friday 26th is "Admin appreciation day", where you can thank your system administrators who probably spend their weekend reading up on the issue and rebooting all the machines in safe-mode to remove the problematic config file.

  • @Tony-dp1rl
    @Tony-dp1rl Před měsícem +6

    We've started calling the practice of deploying to Production without testing ... CrowdStriking

  • @mursie100
    @mursie100 Před měsícem +122

    4:16 this Stockton Rush OceanGate meme is unhinged 💀

    • @ren3059
      @ren3059 Před měsícem +13

      darkest image☠

    • @kv4648
      @kv4648 Před měsícem +10

      "...willing to die on that hill" 💀

    • @beskamir5977
      @beskamir5977 Před měsícem +4

      @@kv4648 More like valley.

    • @Roboprogs
      @Roboprogs Před měsícem

      Thanks for context. I thought it was a nuke, rather than a sub. Too tired tonight, I guess.

  • @GSBarlev
    @GSBarlev Před měsícem +812

    So apparent Crowdstrike Falcon broke a Debian image about three months ago, but because Linux doesn't actually force software updates, it fucked the VMs of a few dozen nerds who reported the issue and rolled back to the previous image before the entire global ecosystem went down.
    Seems like there's a few lessons to be learned here.

    • @___Kevin
      @___Kevin Před měsícem +13

      Interesting

    • @AlexiosLair
      @AlexiosLair Před měsícem +109

      Classic small stick that holds the entire global infrastructure from collapse

    • @ren3059
      @ren3059 Před měsícem +4

      wait wtf

    • @rafazieba9982
      @rafazieba9982 Před měsícem +153

      This update was not "forced by Windows". It wasn't even done by Windows. CrowdStrike updated the rules itself.

    • @shroomer3867
      @shroomer3867 Před měsícem +49

      The only lesson you need to learn, is to shut up and update your windows system as soon as possible or else we'll do it for you!
      - Microsoft.

  • @SamBrockmann
    @SamBrockmann Před měsícem +226

    Hiring George Kurtz for your C suite seems to be a bad idea.

    • @libertybelllocks7476
      @libertybelllocks7476 Před měsícem +9

      He might as well retire after this.

    • @SamBrockmann
      @SamBrockmann Před měsícem +59

      @@libertybelllocks7476 , that's the problem: he probably will get hired as the CEO somewhere else if he wants to be. Give it a few years, and he'll be fine. Instead of, you know, being poor and unemployed, like he deserves.

    • @loggjohnable
      @loggjohnable Před měsícem +5

      He is the founder too

    • @SamBrockmann
      @SamBrockmann Před měsícem +17

      @@loggjohnable , which makes it even worse.

    • @JodyBruchon
      @JodyBruchon Před měsícem +15

      And yet they hired him for their C++ suite

  • @jackatk
    @jackatk Před měsícem +6

    4:53
    “Pre-planned in advance”
    Bruh

  • @duotronic6451
    @duotronic6451 Před měsícem +6

    When I was in IT, we would release security updates to IT computers & servers & volunteers a week before releasing to the rest of the company.

  • @homeboy_jay
    @homeboy_jay Před měsícem +23

    "Well, not so fast" @2:41 pie in the face ABSOLUTE FREAKIN GOLD 🤣🤣🤣

  • @rekire___
    @rekire___ Před měsícem +82

    First test, you must.
    Production testn't, you don't.
    -Yoda, coding of art

    • @alexandredevert4935
      @alexandredevert4935 Před měsícem +11

      Never a Friday you release

    • @backslash68
      @backslash68 Před měsícem

      if ( nullptr == ptr) thou shall write, the unintended equal operator in comparison to avoid (those are called "Yoda conditions" btw.)

  • @k98killer
    @k98killer Před měsícem +52

    That stock footage of the smiling people all flipping off the camera is golden

    • @TheOneWhoMightBe
      @TheOneWhoMightBe Před měsícem

      I think it was personal for the blonde in the background. 😂👌

  • @obsolete959
    @obsolete959 Před měsícem +7

    What's worse is that Crowdstrike updates bypass staging policies. So even the smart companies that run critical software updates in their own test systems first to make sure they don't break anything before updating all computers still got the CS update forced upon them. So not only did they ignore their own staging and testing policies, they also ignored everyone else's staging and testing policies.

    • @ShawnFumo
      @ShawnFumo Před měsícem

      Yeah, the problem seems to be that those staging/testing policies apply to new versions of the sensor, but not to the data definition files. Which might be ok in theory if they were actually bulletproof against bad data files. But no matter what, they shouldn't have sent out the update to all their clients at the same time. Even if they sent it to a few thousand and waited an hour before sending the rest, it probably would have been enough to prevent this huge disaster. Just bad policies on top of bad policies

  • @matthewthomasomeara
    @matthewthomasomeara Před měsícem +9

    The speaker casually rolls over "staggered roll out" as if it's just one of a laundry list of safeguards. But isn't this kinda the big one? Code errors happen. Stagger the roll out and you minimize the damage.

  • @mistersunday_
    @mistersunday_ Před měsícem +152

    I opt for multidimensional lizard overlords, because incompetence is scarier

    • @madmax43v3r
      @madmax43v3r Před měsícem +5

      It does make sense, they like to do test runs before the main event.

    • @KatR264
      @KatR264 Před měsícem +8

      This is probably why conspiracy theories have the following they do, in the face of the more likely reality of incompetence.

    • @ThePowerLover
      @ThePowerLover Před měsícem

      Why not both?

    • @christophkogler6220
      @christophkogler6220 Před měsícem +8

      @@KatR264 That's a significant part of the reason. People are distressed by chaos, so they look for patterns and signs to explain things away, and also enjoy feeling like they know more than others. Put those together and you get conspiracy theories that both explain chaos and strange events and let them feel superior for 'seeing the truth'.

  • @ilirlluka6789
    @ilirlluka6789 Před měsícem +39

    Love the skit with Bret "The Hitman" Hart lecturing the computer nerds about dereferencing a null pointer.

  • @iAmTaki
    @iAmTaki Před měsícem +28

    5:06 what do you mean "most of us"? LMAO

  • @nac.mac.feegle
    @nac.mac.feegle Před měsícem +6

    The school of "Hey it compiled, it must work." I've been coding for almost 40 years. Yeah, I'm old. It drives me nuts that we do not learn lessons. Company hiring a guy who thinks delivering and using software is testing should have the entire C-suite fired. What happened to the concept of continuous integration, automated testing? Bosses are always too cheap, arrogant, impatient, whatever to put money into testing. And clients, to be fair, are also disinclined to plan for and budget testing.

    • @0LoneTech
      @0LoneTech Před měsícem

      There are languages where that's far closer to truth. Of course some people complain bitterly when GHC says their program is incomplete rather than produce a broken executable. Ada in particular was designed with this goal, published in 1983, but it likely never will get the huge marketing campaigns Rust or Java enjoyed.

  • @chounoki
    @chounoki Před měsícem +3

    0:06 The blue ball is hilarious. Real or fake?

  • @lullullullul
    @lullullullul Před měsícem +77

    Sheesh I love opening CZcams to a fresh Fireship 🥺

  • @AntonAdelson
    @AntonAdelson Před měsícem +23

    4:15 that version of "real men test in production" is ... WOW!

  • @tommy516
    @tommy516 Před měsícem +109

    "Real Men Test in Production", such a great mem...er, process.

  • @jeraldbottcher1588
    @jeraldbottcher1588 Před měsícem +4

    This boggles my mind as an IT professional. I was part of a team that deployed patches and software for years. This included OS deployment patch deployment, software deployment the whole thing on both Workstations and Servers. We tested our patches extensively before pushing them out to the entire population of the environment. This 1st included a sandbox environment, then a select user / system environment, then we would stage our patches out over several hours so if something happened we could back out before catastrophe struck. And honestly sometimes we would find problems with the patches, and we would be able to immediately stop, suspend and even back out.
    Yes we would use 3rd party vendor solutions to help with this, and any time we changed ANYTHING we would follow our testing procedures and matrix, normal business. We would never shirk our procedures to test 1st, then deploy. To me this is a total failure of IT Governance and failure to maintain standards. (IT Governance is setting and maintaining standards and policies for the IT Infrastructure)

    • @TheSacredDude
      @TheSacredDude Před měsícem

      Also an IT professional. You must be very lucky and VERY sheltered, because way too much of the industry works like this nowadays. It's the kind of thing that happens when you let the normals worm their way in. They immediately make a run on all the leadership positions that all the competent staff don't want to have to do anyway, and then they start getting rid of every policy, procedure, and precaution that could potentially stand in the way of their yearly bonus. Eventually, that shit metastasizes all the way up to the C-Suite, and that's when the seriously unethical and even illegal shit starts happening. I just got kicked off a project this month due to refusing to perform a task that the customer leadership made an extremely public show of ordering us not to touch. The PR hire who was told to take it over waited for months for their leadership to be sequestered in a multi-week meeting, then went psycho on my entire department until my management gave in. There was never any complaint that I was wrong, that I caused any problems, or that I crossed any lines. The official reason is that I was "seen butting heads" too many times. Meanwhile, this guy has almost completely destroyed one application, and is very likely going to tank an upgrade for another, MUCH more vital one by the end of the year.
      tl;dr....Stay where you are. NEVER leave that company.

    • @jeraldbottcher1588
      @jeraldbottcher1588 Před měsícem

      @@TheSacredDude Alas I retired from that job and no longer have to fight any of those battles

  • @misubi
    @misubi Před měsícem +5

    I worked in software QA for years. Insane that they literally didnt have a battery of various os configurations setup to test their builds on either in real or virtual forms before live updating. 😮

    • @klausstock8020
      @klausstock8020 Před měsícem

      It's also crazy that apparently a lot of companies bought and deployed the CrowdStrike software without having their penetration testers penetration test it first.
      "Nah, the marketing guy from CrowdStrike said that they did that test."
      "Did you also ask whether their test was successful?"
      "Yes, but suddenly there were free bottles of Champagne and free ladies everywhere..."

  • @YuNherd
    @YuNherd Před měsícem +48

    my hunch is that the juniors are left to fend themselves to make release

  • @axazexz1991
    @axazexz1991 Před měsícem +137

    Looks Bjarne Stroustrup pulled all his hair out while creating the language.

    • @YuNherd
      @YuNherd Před měsícem +4

      he went malding?

    • @javabeanz8549
      @javabeanz8549 Před měsícem +7

      Are you sure he didn't get that from writing C code? So he wrote C++ while he still had some hair left.

    • @csibesz07
      @csibesz07 Před měsícem +9

      He put his hair into c++

    • @trumpetpunk42
      @trumpetpunk42 Před měsícem +1

      Very relatable

    • @Roboprogs
      @Roboprogs Před měsícem

      @@javabeanz8549nah, I’m pretty sure it was the ++ that did it.

  • @alaskandonut
    @alaskandonut Před měsícem +12

    The image of Stockton Rush next to the OceanGate Titan sub with the caption “real men test in production” is one of my favorite images.

  • @douglasphillips1203
    @douglasphillips1203 Před měsícem +4

    Never attribute to malice that which can be attributed to incompetence. They simply never expected the file to contain null bytes so they never checked for it.

    • @0LoneTech
      @0LoneTech Před měsícem +2

      When their business model is entirely predicated on claiming they're less incompetent than the vendors of actually needed software on the same system, and will cover for those, this level of incompetence is gross negligence at best.

  • @saaofficial5415
    @saaofficial5415 Před měsícem +3

    Now I understand why Null Pointers are called as Billion Dollar Mistakes 💀

  • @Xhadp
    @Xhadp Před měsícem +13

    I immediately knew this was a management/structural problem not a simple IT/QA "standard" miss. So not at all shocked by that being one key takeaway lesson from this.

  • @mrlunatic2022
    @mrlunatic2022 Před měsícem +179

    I use arch btw

  • @bdd2ccca96
    @bdd2ccca96 Před měsícem +6

    a huge part of the blame must go to the CTOs of the corporations. they are the ones who are "testing in production" by allowing auto updates to run on production servers, and without a working DR plan.
    it is gross negligence that any change to production is not run in a testing environment first.

  • @astrodysseus
    @astrodysseus Před měsícem +2

    4:10 well it's both the employee and an organization issue. So many "developers" write bad codes (and that's such a gentle way to put it) and have zero professionalism about it. And organizational of course as knowing that, you have to create safeguards around deployments

  • @ren3059
    @ren3059 Před měsícem +66

    Real men test in production… Insert OceanGate meme

  • @trevorkiwoi
    @trevorkiwoi Před měsícem +18

    I was watching this video when I remembered that I didn't check for a nullptr before attempting to dereference my variable. Thanks for the reminder

    • @Caellyan
      @Caellyan Před měsícem +1

      Now switch to rust and that won't happen 🤣

    • @FemboyCatGaming
      @FemboyCatGaming Před měsícem

      @@Caellyan Switch to rust and your program wont compile

    • @DankMemes-xq2xm
      @DankMemes-xq2xm Před měsícem +1

      @@FemboyCatGaming better not to compile, than to compile and have things break in unforeseeable ways

    • @FemboyCatGaming
      @FemboyCatGaming Před měsícem

      @@DankMemes-xq2xm rusts borrowing and shadowing system is far more convoluted then c pointers

  • @Itsallfun3000
    @Itsallfun3000 Před měsícem +16

    "But I can't test outside prod my data doesn't exist!"😅

  • @DustinRodriguez1_0
    @DustinRodriguez1_0 Před měsícem +63

    Extra little detail: There wasn't so much a logic problem in the channel file... the channel file was null. Not zero size, but full of nothing but null bytes. And their kernel module apparently does ZERO checking for validity before trying to work with such files. Should be criminal negligence, but that is literally legally impossible since zero enforceable software standards of any kind exist.

    • @tetrahedrontri
      @tetrahedrontri Před měsícem +4

      I shudder at the day I let politicians describe how my code needs to be written. Yikes on that whole concept.

    • @NoConsequenc3
      @NoConsequenc3 Před měsícem +2

      @@tetrahedrontri thankfully in the USA they've decided that judges are more important than experts when it comes to this kind of thing. Wouldn't want people knowledgeable in a field to make decisions in it.

    • @prezentoappr1171
      @prezentoappr1171 Před měsícem +1

      ​@@NoConsequenc3
      This saddens me cuz most congress results are not consulted before with a task-force of experts.
      Also why license to a game sux than buying it steam vs Gog.
      Bruce Willis (lack of reference of an old article, prolly mads up but made headline anyway because of lack of cross checking vs ahoy physical games videos)

    • @prezentoappr1171
      @prezentoappr1171 Před měsícem

      Extra detail: most development on digital laws are from the states - that Bruce Willis iTune article lawyer called for making arguments on that article or any licence to a game instead of owning the game case.
      I think from Eurogamer, but I know it from a hyperlink rabbit hole from chrome suggestions

    • @jobicek
      @jobicek Před měsícem +1

      But it's not just their negligence, it's also on the heads of people running those systems. When you have a critical system, one of the things you control is updates. Because every update is a potential disaster.
      Back at university, we had a simple rule - if your code crashes, you're finished; zero points. Never assume anything. Just because a specification says that you'll receive two integers doesn't mean that you'll always receive two integers. Always fail gracefully. There should be no input that causes your program to crash.

  • @GRamerDim
    @GRamerDim Před měsícem +2

    0:11 pyrocynical jumpscare reference

  • @watt_the_border_collie
    @watt_the_border_collie Před měsícem +11

    I had to Google Bret Hart's clip about dereferencing a null pointer to know if it's AI generated or not. It just looked so random, but I was relieved it was a real footage

  • @BalvinderSingh-uh3my
    @BalvinderSingh-uh3my Před měsícem +11

    "Real men test in production… "I couldn't help myself but LOL, thumbs up for that alone.

  • @DanSoloha
    @DanSoloha Před měsícem +8

    1:05 is the best stock footage I’ve ever seen 😂

  • @HarrisonLuiEKYiss
    @HarrisonLuiEKYiss Před měsícem +4

    4:30 I think this is the same thing as “CrowdStrike didn’t check their code”? There is a CNA article which states that CrowdStrike “Skipped checks”. The article also mentions that the update “should have been pushed to a limited pool first”.

    • @akin242002
      @akin242002 Před měsícem +1

      Also, rolling it out on a Friday. Programmer sins check list completed.

  • @markh.6687
    @markh.6687 Před měsícem +3

    "Quality Assurance?? What's that??"

  • @mrhaftbar
    @mrhaftbar Před měsícem +15

    The printer in Ring 0 is killing me.
    because it is true

    • @roganl
      @roganl Před měsícem +3

      That's an artifact of the 90's an the bane of MSFT support - Gots to luv us some 3rd Party drivers - THEY SUCK.

  • @DagothDaddy
    @DagothDaddy Před měsícem +199

    What really happened explained below:
    Management doesn't actually read your PRs.

    • @ethanfreeman1106
      @ethanfreeman1106 Před měsícem +17

      it's worse than that. the entire organization is based around maximizing profit without putting in the work.

    • @leswine1582
      @leswine1582 Před měsícem

      😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅

    • @NN-sp9tu
      @NN-sp9tu Před měsícem +8

      You need to test the hell out of any changes to the codebase if a failure can wipe out millions of computers. Human eyes on a PR are not enough

    • @madmax43v3r
      @madmax43v3r Před měsícem +1

      It's probably a 20 year old pile of shit, with high turn-over, nobody wants to maintain old crap.

    • @keejj
      @keejj Před měsícem +3

      There probably was a problem report, but management couldn't read it because their account didn't have access because they reduced the number of licenses for the tool because it was too expensive.

  • @thegame4027
    @thegame4027 Před měsícem +15

    The memes and visuals have been next level this video

  • @kriscollinstunes
    @kriscollinstunes Před měsícem +2

    That transition to the ad read was, well, brilliant!

  • @avram202
    @avram202 Před měsícem +1

    That slap in the back about the null pointer is how my father taught me everything in life, it worked wonders, I'm definitely using it on all my children

  • @giantWario
    @giantWario Před měsícem +42

    I love how you just casually implied that most people in the world will be dead in the next two years.

    • @ImperativeGames
      @ImperativeGames Před měsícem +5

      Nuclear war.

    • @TheOneWhoMightBe
      @TheOneWhoMightBe Před měsícem +2

      The living will envy the dead.

    • @XIIchiron78
      @XIIchiron78 Před měsícem

      AI is here. Most or all of humanity is about to become obsolete. Hell, even worse - they're just competition.

    • @bramvanduijn8086
      @bramvanduijn8086 Před měsícem

      @@XIIchiron78 No it isn't, those are just next-word or next-pixel predictors. They don't understand anything and they're definitely not thinking. There's no "they" there to do the thinking. These are just glorified calculators. You could build one from vacuum tubes.

  • @petersuvara
    @petersuvara Před měsícem +9

    Dave from Dave's Garage has the best description of why this happened.

    • @JxH
      @JxH Před měsícem +1

      Dave discussed two "Rings", 0 and 1. Here it's four "Rings", 0 to 3.

    • @Knirin
      @Knirin Před měsícem +8

      @@JxHWhile technically x86 has four security rings, in practice most operating systems use just two. Ring 0 for the kernel and Ring 3 or occasionally Ring 1 for all user code.

  • @Hasty_Bahadin
    @Hasty_Bahadin Před měsícem +11

    wait, why would most of us be dead by 2026? 😰 5:00

    • @LuxuriantCarrot
      @LuxuriantCarrot Před měsícem +8

      I think he meant 2036, because there's a meme of "AUGUST 12 2036, HEAT DEATH OF THE UNIVERSE!"

    • @babymetalenjoyer
      @babymetalenjoyer Před měsícem +1

      ​@@LuxuriantCarrotAUGUST 1 2036, THE HEAT DEATH OF THE UNIVERSE!

  • @SeverityOne
    @SeverityOne Před měsícem +2

    'Never attribute to malice that which is adequately explained by stupidity.'
    Also, if I see any more Brilliant adverts today, I'm going to be sick.

  • @Chuck_vs._The_Comment_Section
    @Chuck_vs._The_Comment_Section Před měsícem +1

    As someone who has been interested in Security Nightmares for some time, I am not surprised that such a catastrophe has occurred. After all, it was foreseeable that it would happen sooner or later. But I can't help but wonder how it is that nobody is still demanding that software manufacturers are liable for damage caused by their junk software.
    Because as long as IT companies like CrowdStrike get away with a slap on the wrist, nothing will change. Then it's only a matter of time till the next security nightmare.

  • @JxH
    @JxH Před měsícem +5

    0:38 Once upon a time, I posted that picture of McAfee on FB and it was more-or-less immediately taking down.
    Sometimes I think that Anti-Virus / Anti-Malware companies were invented to make Microsoft look simply-excellent in comparison.

  • @vincenzusgaming
    @vincenzusgaming Před měsícem +17

    I feel like the dev who made the mistake won't be punished. The entire fault definitely goes to the QA team

    • @BTrain-is8ch
      @BTrain-is8ch Před měsícem +25

      The dev shouldn't be punished. This sort of failure is an institutional one not an individual one.

    • @reverse_shell.asm.sh.exe1
      @reverse_shell.asm.sh.exe1 Před měsícem +6

      the dev probably already got fired.. now, I have no idea if he'll get sued as well and if a judge would understand it

    • @ayylien3070
      @ayylien3070 Před měsícem +2

      Nah the higherups 100% threw him under the bus.

    • @akin242002
      @akin242002 Před měsícem +1

      When you push code to production on a Friday without peer review/QA process, you deserve to be fired.

  • @andrewwalsh2755
    @andrewwalsh2755 Před měsícem +5

    It probably was an organisational failure...
    Like Boeing, the manifestation is doors blowing off etc... but the real cause is unwise organisational changes... to boost profits, personal and corporate... at the expense of quality...
    I don't know if Crowdstrike has shareholders, but there will be pressure to increase profitability...
    ... outsource IT to India... employ under qualified, cheaper, staff etc... put pressure on managers to deliver... who put pressure on staff to deliver...
    ... and the manifestation is... global computer failure...

  • @Troy_Built
    @Troy_Built Před měsícem +3

    I've seen several places today that still have the computers messed up. They are running but something else is going on. Files that they try to retrieve are no longer there. Customer histories wiped out. One place said the computers came back up on Friday and this morning the server fried.

  • @Lord_Omni
    @Lord_Omni Před měsícem +2

    We had a bunch of bunches of autotests updates by our tester. And they were failing regularly, and there was a tiny code that was showing who to blame o) And we had full testing before going to production, and still had minor rare occurring bugs there.

  • @theApeShow
    @theApeShow Před měsícem +4

    1:30 THIS IS GOLD! Where do you even find this stuff?
    Amazing. I love the internet.

  • @me_12-vw1vi
    @me_12-vw1vi Před měsícem +5

    1:30 bro this made my day

  • @bide7603
    @bide7603 Před měsícem +4

    That wrestling programming meme is life

  • @rentabestfriend
    @rentabestfriend Před měsícem +1

    Damn the transition into the ad was super smooth i barely noticed it

  • @theultimate3753
    @theultimate3753 Před měsícem +2

    That one intern who accidently commit and push his test codes and the seniors devs just accepted his codes

  • @Tixnou
    @Tixnou Před měsícem +9

    Imagine if this happened to one of the computers that run the Matrix we live in

    • @tinad8561
      @tinad8561 Před měsícem +1

      This is actually the big argument against simulation theory-an environment of that complexity running any length of time at all wouldn’t have glitches, it’d have bricked by now.

  • @AyoDamilareMichael
    @AyoDamilareMichael Před měsícem +5

    I knew it's gonna be hard not to mention rust.

  • @DingleMcMingle
    @DingleMcMingle Před měsícem +7

    Holy shit, I never knew 15 years ago Justin Bieber was only 16 years old.

  • @um8078
    @um8078 Před měsícem +1

    Saw that rust joke from a mile away

  • @Tenly2009
    @Tenly2009 Před měsícem +2

    Any IT organization worth its salt would not give a third party direct access to update their critical servers and workstations simultaneously, and on a completely unknown and ad-hoc schedule. They should have demanded pilots, staggered roll-outs and well-documented AUTOMATIC ROLLBACK and RECOVERY processes. These are not “new lessons”. Those paradigms were drilled into my head in 1998 and were probably around for at least a decade before that. An outside vendor having Ring 0 access to a company’s servers is also a huge security risk. Heads should roll - not just at CloudSfrike but also the IT director of any company that implemented crowd strike without control mechanisms in place.

  • @domelessanne6357
    @domelessanne6357 Před měsícem +4

    I already realized there were multiple lizard timelines, and I wasn't too bemused