IT WAS A REGEX?!? - Full CrowdStrike Report Released

Sdílet
Vložit
  • čas přidán 11. 09. 2024
  • Recorded live on twitch, GET IN
    Report Link
    www.crowdstrik...
    My Stream
    / theprimeagen
    Best Way To Support Me
    Become a backend engineer. Its my favorite site
    boot.dev/?prom...
    This is also the best way to support me is to support yourself becoming a better backend engineer.
    MY MAIN YT CHANNEL: Has well edited engineering videos
    / theprimeagen
    Discord
    / discord
    Have something for me to read or react to?: / theprimeagenreact
    Kinesis Advantage 360: bit.ly/Prime-K...
    Get production ready SQLite with Turso: turso.tech/dee...

Komentáře • 1,3K

  • @styleisaweapon
    @styleisaweapon Před měsícem +1823

    "We solved it with regex!" .. "Which variant of regex?" .. "What do you mean which variant of regex?"

    • @-Kal-
      @-Kal- Před měsícem +110

      OH NO

    • @seniorchonkza997
      @seniorchonkza997 Před měsícem +103

      There's different versions of regex? What's the point in that?

    • @NODGD
      @NODGD Před měsícem +17

      🤦

    • @crazycrazy7710
      @crazycrazy7710 Před měsícem +164

      @@seniorchonkza997 I asked Perl developer this question and he asked about the point of existence of normal regex. Not a war I want to get into.

    • @theondono
      @theondono Před měsícem +74

      The unlicensed variant, of course

  • @phraun
    @phraun Před měsícem +728

    To clarify, CrowdStrike's testing and deployment process specifically for channel 291 was operating on what you'd call the "Hopes and Dreams" algorithm. Got it.

    • @RoninX33
      @RoninX33 Před měsícem +5

      So well put!

    • @sirius4k
      @sirius4k Před měsícem

      But they put their herts and souls into it!
      czcams.com/video/ttmsiU-GZlg/video.html

    • @multivariateperspective5137
      @multivariateperspective5137 Před měsícem +6

      Actually, I think this variant is the “close your eyes and pray” aka, of course it’s been checked already

    • @styleisaweapon
      @styleisaweapon Před měsícem

      Educated Wish

    • @Jeez001
      @Jeez001 Před měsícem +3

      The CEO was also responsible for biggest cyber security outage before at Mcafee..

  • @EmperorShang
    @EmperorShang Před měsícem +879

    CrowdStrike: "We investigated ourselves and found it was actually a boo boo and not an owie. Please leave."

    • @thachester
      @thachester Před měsícem +10

      You would think if you knew how much of the critical computing market you were operating on, you would have basic protections to prevent something like this. Not to mention, there's 8,000 employees that work for crowd strike. Not 1 of them had a light bulb moment, "Maybe we should protect ourselves better."

    • @Kaenguruu
      @Kaenguruu Před měsícem +25

      @@thachester Honestly I suspect that it's a management thing because I can't imagine that nobody thought about better protections for this.

    • @ultimaxkom8728
      @ultimaxkom8728 Před měsícem +7

      _"Please leave."_
      Lmao

    • @evocorporation6537
      @evocorporation6537 Před měsícem +7

      @@thachester They probably did have a ''jeez this ought to be fixed'' and then were slammed because we only make features/perform bugfixes boi, not required refactors!

    • @thefallencat2080
      @thefallencat2080 Před měsícem +9

      @@thachester I find that a common experience is to point out a problem only for management to go "nah it's fine"

  • @asmrddict
    @asmrddict Před měsícem +269

    Them: Everything broke because we didn't test for a 21st variable.
    Fix: Test for a 21st variable.
    Upcoming news: Everything broke we didn't test for a 22nd variable.

    • @SirLightfire
      @SirLightfire Před měsícem +20

      Given their past history of taking down linux servers, i wouldn't be surprised if this actually happens again

    • @lynx-titan
      @lynx-titan Před měsícem +7

      normalise formal methods

    • @tomorrow6
      @tomorrow6 Před 11 dny

      And with modern development coding that test may not work unless the feature is enabled to allow the test

  • @lfarrocodev
    @lfarrocodev Před měsícem +783

    A regex with over 20 parameters, what could go wrong

    • @alexeiboukirev8357
      @alexeiboukirev8357 Před měsícem +21

      Expected blackjack but no luck.

    • @autohmae
      @autohmae Před měsícem +27

      it's worse, every parameter is a regexp input

    • @aldproductions2301
      @aldproductions2301 Před měsícem +46

      The 21st, apparently.

    • @Spoonbringer
      @Spoonbringer Před měsícem +12

      I bet someone thought he was *really* clever.

    • @theairaccumulator7144
      @theairaccumulator7144 Před měsícem +11

      I've written worse plenty of times and never had any issues. The hate boner towards regex is really insane today.

  • @LewisMoten
    @LewisMoten Před měsícem +594

    The “real” problem is that someone applied the wrong shirt size to a Jira ticket.

  • @neruneri
    @neruneri Před měsícem +873

    It's so unfair. When these nerds regex, they get a billion dollars. When I regex all over myself, the cops get called!

    • @Nick12_45
      @Nick12_45 Před měsícem +11

      ?

    • @Squiggy545
      @Squiggy545 Před měsícem +34

      Count of 3...We both say what regex means

    • @ismbks
      @ismbks Před měsícem +11

      ​@@Squiggy545 drawing swastikas

    • @hidoryy
      @hidoryy Před měsícem +28

      oh lord i'm gonna regex

    • @autohmae
      @autohmae Před měsícem +41

      Maybe you shouldn't regex on child processes !

  • @uumlau
    @uumlau Před měsícem +385

    That's a lot of text to say, "We didn't test it prior to deploying to millions of installations worldwide." They only needed to deploy it to an internal instance to prove that it would crash everywhere.
    In other words, this is why 100% unit test coverage gives managers a false sense of security.

    • @fltfathin
      @fltfathin Před měsícem +30

      if only just one person tried it in their machines first.

    • @EwanMarshall
      @EwanMarshall Před měsícem +37

      @@fltfathin or you have CI create a VM to deploy to and try to run with the update file just once

    • @Drazil100
      @Drazil100 Před měsícem +19

      They need to word salad the non technical C Suit guys in charge of making purchasing decisions into thinking they know what they are doing so they don’t cancel their subscriptions. If they were straightforward and to the point they would lose all their customers immediately.
      Also maybe for legal reasons it may be bad to flat out say “we didn’t test it”. At least now they can say “we do have testing procedures but they didn’t catch this”.

    • @gsgregory2022
      @gsgregory2022 Před měsícem +26

      @@Drazil100 what I got was "our testing procedures are a joke and didn't actually test anything". This feels worse then saying they didn't test it. They just fully admitted they built a brick wall and forgot to use mortar. Not testing would of just been a failure of not doing what you were supposed to. This is a failure of show casing you have absolutely no idea how the thing should be done.

    • @aldproductions2301
      @aldproductions2301 Před měsícem +10

      More tests doesn't mean good tests.

  • @Oler-yx7xj
    @Oler-yx7xj Před měsícem +979

    Regex in kernel mode? That somehow sounds like the weirdest thing ever

    • @6lack5ushi
      @6lack5ushi Před měsícem +36

      better super know what the Duck you’re doing!!!

    • @ZingsVideos
      @ZingsVideos Před měsícem +20

      I give it a rating of super terrible.

    • @tttm99
      @tttm99 Před měsícem +49

      We've all been there... Kernel mode. Total power over hardware. Just let's just write us a quick scripting DSL and use it internally... 🤯🤯🤯

    • @Nick12_45
      @Nick12_45 Před měsícem +4

      because it is

    • @CommanderRiker0
      @CommanderRiker0 Před měsícem +38

      There's no way you can use anything predictive with text without regex. This is like blaming a hammer because you hit a mirror instead of a nail. Proper sanity checks need to be in place (And testing)

  • @UberAffe1
    @UberAffe1 Před měsícem +167

    12 pages to say "we now properly check the length of input"

    • @DoctorMandible
      @DoctorMandible Před měsícem +1

      Allegedly

    • @rikwisselink-bijker
      @rikwisselink-bijker Před měsícem +1

      It sounds to me as if they are now test whether the 21st input is present. That suggests everything will break again when the next parameter is added.
      I don't know if this holds for the language they're using, but I always learned to use named inputs once you exceed 4 parameters (filling the rest with defaults). Apart from what was mentioned in the video and in a thousand comments, that is my concern here.

  • @aenguswright7336
    @aenguswright7336 Před měsícem +211

    I want to point out here at 4:50 and 8:10. They said they tested the "Template", they did not say they actually tested channel file 291... It really sounds like they did not, in fact, test 291... Unless this is covered later, it doesn't seem like this is a "it worked on my machine", it seems like "well we assumed it would work because we weren't 'changing code'". 21:00 and there we go..... 38:00 so basically yes. "Channel files" were not viewed as being dangerous to update by Crowdstrike because "they're not code", and so they had minimal or no testing. This is astoundingly negligent in my opinion.

    • @ThePrimeTimeagen
      @ThePrimeTimeagen  Před měsícem +100

      We figured out later. They mock channel files

    • @Turalcar
      @Turalcar Před měsícem +33

      "Regexes are not code" could be a way out of some regulations (e.g. Sarbanes-Oxley for any code that processes financial transactions in the US) that require robust processes.
      A significant portion of outages are caused by "config changes". I work on servers that hardcode most configs (paths, hosts/ports of other services etc).
      Some could call it bad engineering practice but I like that every config change has to go through the same review and release pipeline.

    • @Knirin
      @Knirin Před měsícem +18

      @@TuralcarRegular Expressions are code. Code that is interpreted at runtime by several dozen extra kilobytes of imported code with who knows how much testing.

    • @aldproductions2301
      @aldproductions2301 Před měsícem +3

      @@Turalcar I hear ya. I feel much more comfortable when build, deploy, and configuration are all in code.
      I work more with server code and usually have a full dev, staging, and release server, but that's irrelevant.

    • @JeremyAndersonBoise
      @JeremyAndersonBoise Před měsícem +7

      Pro tip for CrowdStrike: Listen-up, dummies, all input that passes through a CPU is “real code.”
      I remain flummoxed. Do the ding dongs at Crowdstrike even understand computers?
      Also; Yes, it’s also a failed mock. As much as we mocked the regex the mock needed mocked, they are at equal fault here, as is their terrible process at large.

  • @yapdog
    @yapdog Před měsícem +142

    When you distill all of that down, it's a simple data interface error made worse by lapses in good testing procedures. They're deliberately making it sound complicated so that readers will glaze over saying: _"Well... I guess it was a really tough issue. Pretty understandable."_

    • @EwanMarshall
      @EwanMarshall Před měsícem +20

      yep, 2 errors, but I don't really care about the one in the validator, linters miss things, that is why you actually test on the actual parser that is going to parse the file.

    • @kelly4187
      @kelly4187 Před měsícem +8

      “bullshit baffles brains”

    • @joshka7634
      @joshka7634 Před měsícem +11

      @@yapdog likely different teams are responsible for each of the mentioned parts. So we get Conways law applied to a RCA. I doubt it’s intentionally made to be complicated, but more a reflection of the internal complexity.

    • @yapdog
      @yapdog Před měsícem +3

      @@joshka7634 It's possible. But you wouldn't know it based on the text.

    • @JMurph2015
      @JMurph2015 Před měsícem +3

      What I have put together is that rapid response files define X regexes, where X is a standardized number of inputs for its type. Then when the interpreter tries to run each regex, if the regex specified wildcard, it doesn't evaluate it at all. If it has a non-wildcard, then it actually evaluates the regex on that input. There were no versions of that 21-long template type that used anything other than wildcards on the last input, until there _was_!

  • @lcarsos
    @lcarsos Před měsícem +140

    CrowdStrike: It was not a null dereference error! It was an off by one error!

    • @EwanMarshall
      @EwanMarshall Před měsícem +23

      of an array of pointers to structs...

    • @lcarsos
      @lcarsos Před měsícem +14

      @@EwanMarshall Hey. Hey. Who's the multibillion dollar company here?

    • @gsnyder2007
      @gsnyder2007 Před měsícem +20

      A good reminder of the axiom there are only 3 major issues in computer science, cache invalidation and off-by-one errors 😂

    • @kebien6020
      @kebien6020 Před měsícem +4

      ​@@gsnyder2007You forgot cache invalidation and exactly once delivery

  • @JMurph2015
    @JMurph2015 Před měsícem +128

    Btw, CrowdStrike just inadvertently advertised that they have an INSECURE regex parser running in kernel mode. That "latent out-of-bounds read" bug? I hardly think it's the only one in their regex engine. That thing is now where hackers go to heaven and not by dying, iykyk 😉.

    • @sorcdk2880
      @sorcdk2880 Před měsícem

      "This new attack method invades through the immune system", HIV was there first, now for the computer virus version.

  • @funkdefied1
    @funkdefied1 Před měsícem +109

    Cloudflare + Cloudstrike both got wrecked by Regexes. I can’t wait for Kevin Fang’s video on this

    • @nosam1998
      @nosam1998 Před měsícem

      Just more proof that ChatGPT is horrible. All it takes is one issue, and boom, this type of stuff happens.

    • @rnts08
      @rnts08 Před měsícem +3

      I saw his community post, it's in the works. 😂😂😂

    • @r6scrubs126
      @r6scrubs126 Před měsícem +2

      ​@@rnts08no that video wasn't about this. His video just came out today and it's about Google cloud

    • @epb9000
      @epb9000 Před měsícem +2

      I have also personally suffered from regex.

    • @outlawnation5160
      @outlawnation5160 Před měsícem

      Which incident with cloudflare?

  • @krss6256
    @krss6256 Před měsícem +77

    All unit tests, 0 integration tests.

    • @astronemir
      @astronemir Před měsícem +5

      How can you even say there are unit tests when it misses most basic things like this? This is not even unit tests. They are tests written to boost coverage.

    • @sorcdk2880
      @sorcdk2880 Před měsícem +3

      It does indicate that it was all unit tests, because this kind of behavour is consistent with the kind of problems you run into when doing high unit test focus.
      Generally one problem with tests is that you can only really test a tiny amount of the posibilities, so what you have to do is try to find some scheme of testing that gives you a good representation of all the posibilities. Realistically you are going to end up missing some corners of types of issues, it is just a question of how big and many those corners are.
      One of the issues with unit tests is that you tend to need to write new tests for every new thing, which means that each time you add something you have to do that exercise of looking for all potential issues, and eventually you are going to make some worse mistake and skip some too important corners. If you instead have a bunch of integration tests and other such tests, then those test would already cover a good deal of your corners in the first place, and the probability for missing some important corner at some point gets drastically lower.
      Basically proper unit test coverage requires that all the developers are super good at testing all the time, and I do not find safety in a process that requires everyone to be smart all the time --- at some point someone is going to make a mistake, and this is one such example.

  • @BudgiePanic
    @BudgiePanic Před měsícem +182

    I’ll just build my own regex parser that runs in the kernel, what’s the worst that could happen? 😀

    • @ZingsVideos
      @ZingsVideos Před měsícem +3

      The world ends.

    • @crazycrazy7710
      @crazycrazy7710 Před měsícem +15

      MS windows will become MS DOS for some time

    • @DevelKutta
      @DevelKutta Před měsícem +6

      @@crazycrazy7710 Upgrade.

    • @tomwright9904
      @tomwright9904 Před měsícem +4

      Windows handles fonts in the kernel don't you know

    • @autohmae
      @autohmae Před měsícem +4

      You think they build their own, I suspect they used a regex parser which has a BSD, Apache or MIT license.

  • @dragonridertechnologies
    @dragonridertechnologies Před měsícem +61

    Funny fact, they _have_ in fact done this before. 90% sure it was a Brody Robertson analysis that mentioned it, but there are some bug reports from major enterprise linux providers about Crowdstrike outages on their kernels on two prior occasions.

    • @EwanMarshall
      @EwanMarshall Před měsícem +20

      yep, mostly caught by administrators, which is why they were not big impact, it wasn't cases where clownstrike unilaterally pushed the update to all servers.

    • @slaapliedje
      @slaapliedje Před měsícem +10

      Yes, and in these cases it caused a kernel panic, which a reboot fixed. It wasn't quite as bad as this, but still one of the few things I have heard about outright crashing a Linux box without there being some hardware fault at work.

    • @progandy
      @progandy Před měsícem +4

      Linux itself has an interpreter built in for basically sandboxed kernel level scripting (ebpf). At least in one of the crashes the bug was on the kernel side in that interpreter and not in crowdstrike code. The problem here was crowdstrike not checking the kernel version before loading its code or not knowing that the old kernel version had this bug. Ironically they also have an alternative kernel driver that uses direct kernel access and that worked fine.

    •  Před měsícem

      @@progandy Does that mean the crowdstrike driver is a blob or just an out of tree driver?

    • @KnightRiderOfVoid
      @KnightRiderOfVoid Před měsícem

      out of tree module

  • @JMurph2015
    @JMurph2015 Před měsícem +60

    Only watched two minutes so far, but you can tell they really did say "we'll keep it simple for you" and then proceeded to use as much corporate jargon as humanly possible to ensure that the average person did *not* understand how much of idiots they were.

  • @tc2241
    @tc2241 Před měsícem +45

    The amount of jargon they’re using to trying and obfuscate the rc is Olympic level

  • @Cyanide300
    @Cyanide300 Před měsícem +42

    There is no substitute for real-world testing. Automated tests are fine - good even - but there is no replacement for a real test environment, because even your automated tests can have bugs. They are also software after all.
    Also, this company is so marketing focused it makes me want to throw up. Like, the very first line of this "analysis" is a fucking sales pitch. Blegh!

    • @fltfathin
      @fltfathin Před měsícem +7

      there is, that is automated real-world testing, you can literally remotely control keyboard, mouse, powercycling a computer with sub $100 electronics.

  • @EmperorShang
    @EmperorShang Před měsícem +172

    CrowdStrike: "GASLIGHT, TECH JARGON, AND DENY! WE GOING BANKRUPT!!!!"

    • @vitalyl1327
      @vitalyl1327 Před měsícem +5

      Also CrowdStrike - try to shut down ClownStrike parody site with DMCA. Because reasons.

    • @kevinmcfarlane2752
      @kevinmcfarlane2752 Před měsícem

      What have they denied? They haven't denied they messed up.

    • @EmperorShang
      @EmperorShang Před měsícem

      ​@@kevinmcfarlane2752It WaSn't NuLl

    • @1DwtEaUn
      @1DwtEaUn Před měsícem

      @@kevinmcfarlane2752 I blame MS for allowing WHQL certification for a driver that loads dynamic and non-WHQL certified code patches from files

    • @marcotroster8247
      @marcotroster8247 Před měsícem +1

      Dude the whole article could be summarized in a few sentences. But the clarity would make them look like fools 😂

  • @TinBane
    @TinBane Před měsícem +23

    So, they changed A, didn’t change B. Then they changed B, didn’t test it with A. B passed all B unit tests. A passed all A unit tests. No real regression testing or staged deployment with feedback was attempted.
    Thing is we can say skill issues, code quality, etc. But as someone who works in ops, you gotta mitigate it in ops so that juniors can owe everyone drinks for breaking the prod branch, but without causing the most costly IT outage in memory.

  • @chepossofare
    @chepossofare Před měsícem +213

    "I'll use regex here"
    Now you have 291 problems.

    • @DerpMooseFish
      @DerpMooseFish Před měsícem +3

      my current philosophy on regex is that it should be used with CTRL + F and CTRL + F only

  • @JonnOSRS
    @JonnOSRS Před měsícem +21

    This is the biggest "Worked on my machine" I have ever seen

  • @Dylan_thebrand_slayer_Mulveiny
    @Dylan_thebrand_slayer_Mulveiny Před měsícem +35

    TLDR: "Our unit tests don't actually test the units".

  • @therealmccoy7221
    @therealmccoy7221 Před měsícem +28

    Just in case you did not get that tiny little detail: That "interpreter" "interpreting" "channel files" using unlicenced "regExes" is running in kernel mode.

  • @Kane0123
    @Kane0123 Před měsícem +260

    Canaries and staged rollouts - what a novel idea.

    • @nico-s29
      @nico-s29 Před měsícem

      For a antivirus being up to date as fast as possible is key

    • @jbutler8585
      @jbutler8585 Před měsícem +37

      The 90 seconds it would have taken to deploy to test workstations while prepping for external distribution is simply too much of a delay. Uuuuugghhhhh we gotta wait for a reboot? F it, let's cowboy this and not even know if it offers the protection we're being paid for.
      Even without the bluescreen debacle, how many times did previous versions simply fail to do their job due to the lack of testing. "We tested the contents boss!" "Did you test that those contents are really getting loaded by the boot driver?" "Ehh." And if the goofball uploading it to the deployment servers simply picks the wrong file to send out? Shrug.

    • @101Mant
      @101Mant Před měsícem

      ​@@nico-s29its not anti virus its endpoint detection and response. Its looking for an attack controlled by a human, usually after someone clicked on a phishing email. Nrw attack techniques usually show up slowly, its not like a new virus rampagaing around the internet cooying itself. Its also not rew.lyblokkj for signatures like an AV but behaviours. Ive worked on two competing products and both had phased rollout to production so the new version. We never had to rush a deployment to get out new capabilities.

    • @vitalyl1327
      @vitalyl1327 Před měsícem +5

      Not to mention immutable root fs and a/b boot, which all mature OSes do, but windows somehow is incapable of...

    • @e1tep
      @e1tep Před měsícem

      Staged rollout is great, but how do you know the updated machines crashed because of your update to automatically stop the rollout from proceeding?
      I assume one rollout stage per day/week is too slow for these kinds of security updates. So you will probably have to figure it out within minutes.

  • @worldwarwitt2760
    @worldwarwitt2760 Před měsícem +105

    Crowdstrike: "We tested it"
    Skeptical Tester: "According to the code coverage? Was it fuzzed?"

    • @EwanMarshall
      @EwanMarshall Před měsícem +41

      no, they say they validated it, with a validator, which said it was valid, they never ran it with the actual driver.
      Linter says all is fine, push to prod...

    • @InZiDes
      @InZiDes Před měsícem +5

      ​@@EwanMarshall And makes some sense. A "simply" configuration file update validated with a validator step. Is like to push to prod a validated json for a shop webpage. But common is a fkn kernel driver with maximal or near maximal priority.

    • @EwanMarshall
      @EwanMarshall Před měsícem +5

      @@InZiDes yes, because regex aren't state machines or anything.... hell with backtracking and such, they are fully turing complete. Yeah, I count that as a code file too.
      And one companies web store goes down vs them all... And even then I stage and test that configuration change before applying it to live.

    • @LaundryFaerie
      @LaundryFaerie Před měsícem +1

      Sheeze. Copy editor here. I don't care what kind of job you do, GET ANOTHER PAIR OF EYES TO LOOK AT YOUR WORK. You will never regret taking the extra time to get it done properly.

    • @worldwarwitt2760
      @worldwarwitt2760 Před měsícem +3

      @@EwanMarshall There is a difference between validation and verification. It sounds to me like they only had unit tests that make sure the code does what was expected based on the ideal conditions, without any effort into exploratory testing, edge cases, security, error handling in detail, recovery, etc. Software that is responsible for more than a billion dollars of other peoples commerce in a single day should have a legally required level code coverage subject to inspection.

  • @SteveWielder
    @SteveWielder Před měsícem +53

    If I had a nickel for every time I saw a Regex bring down an entire tech ecosystem, I would have two nickels...
    Which isn't a lot, but it's weird it's happened twice...

  • @JayMaverick
    @JayMaverick Před měsícem +10

    Fascinating. I just started learning programming (with C) last week and we're now learning about array lengths and out of bounds being a dangerous error.
    If only Crowdstrike engineers also went through beginner C.

  • @RoninX33
    @RoninX33 Před měsícem +20

    This is like the greatest self-own by a security company in software history. Future CS students will learn of this and marvel.

  • @318ishonk
    @318ishonk Před měsícem +8

    The interesting thing about this is that it's the ideal "kill switch" that many people expected from the Chinese for years:
    - It makes a large amount of systems unusable, including desktops
    - It gets executed automatically, i.e. ythe usual change control and slow-start installation as you'd have on Windows server OS patches don't apply here.
    If Microsoft ever makes the same mistake with Defender they'll create even more damage.

  • @01kaskasero
    @01kaskasero Před měsícem +36

    It's not the regex. It's the LACK OF TESTS.

    • @vitalyl1327
      @vitalyl1327 Před měsícem +4

      Using regular expressions is a marker of incompetence, so the lack of tests is just a cherry on a top, and is as expected from the incompetent developers.

    • @TheRealXartaX
      @TheRealXartaX Před měsícem +8

      @@vitalyl1327 That's the biggest pile of hogwash and nonsense I've ever seen in written form.

    • @vitalyl1327
      @vitalyl1327 Před měsícem

      @@TheRealXartaX found an uneducated dimwit who have no idea how to parse in 21st century. Ever heard of PEG, dimwit?

    • @vitalyl1327
      @vitalyl1327 Před měsícem

      @@TheRealXartaX found an ignorant regex enjoyer. You people are hilarious. So primiitve, so dumb, yet so confident.

    • @vitalyl1327
      @vitalyl1327 Před měsícem +1

      @@TheRealXartaX so, you have no idea what PEG or GLR is, yet you're soooo confident.

  • @cbaesemanai
    @cbaesemanai Před měsícem +26

    I am getting myself a t shirt that reads crowdstrike[21]

    • @starryk79
      @starryk79 Před měsícem +8

      the real pro dev makes it crowdstrike[20] 🙂

    • @lunoxis8371
      @lunoxis8371 Před měsícem +3

      ​@@starryk79there's a few (weird) languages which start from one to be fair

    • @epb9000
      @epb9000 Před měsícem +3

      ​@@starryk79 accessing the 22nd would also kill it, so still technically valid. 😂

  • @NicholasMa42
    @NicholasMa42 Před měsícem +13

    They created a DSL, and treated it like it magically wasn't code. If you make a DSL, it is now code, and you have to test the actual file you use in production.

  • @Denominus
    @Denominus Před měsícem +41

    It sounds like they only unit test, because if they had actually applied the update to a machine, preferably as part of their automated testing, they would have caught the issue immediately.

    • @EwanMarshall
      @EwanMarshall Před měsícem +13

      they only "validate" with another program, not actually run the updates with the driver. Basically a linter and it missed something because it is different code.

    • @tma2001
      @tma2001 Před měsícem +11

      this is what I don't understand - I would expect a multi-billion company has a dedicated testing lab of rows of actual machines which every update passes through before going public. Never mind the devs just firing up a VM on their systems. I mean it was a deterministic error - there was no timing involved due to race conditions, thread scheduling in the kernel etc.
      So yeah just tuning any PC on was enough of a test. Bizarre!

    • @Denominus
      @Denominus Před měsícem +3

      @@tma2001 Having worked at quite a few large enterprises, I can say I have low expectations of what goes on behind the scenes at these companies. Sh*tshow is the norm, doing things well is the exception.

  • @silentdebugger
    @silentdebugger Před měsícem +4

    I remember Bing had a giant outage around ~2013 because of a single regex that started knocking out backend servers because it was so costly to evaluate

  • @Oler-yx7xj
    @Oler-yx7xj Před měsícem +40

    I just had a thought of Prime reading some Terms of Service on stream

  • @jhcato
    @jhcato Před měsícem +3

    A programmer had a problem. He though, "Ahh, I'll use a Regex to solve this." Now he has two problems.

  • @robertlenders8755
    @robertlenders8755 Před měsícem +13

    We bypassed complete driver certification by embedding a custom interpreter inside our drivers

  • @goshinbi44
    @goshinbi44 Před měsícem +46

    Regex(?) and an OOB error in a program at this scale is crazy.

    • @cock_sauce8336
      @cock_sauce8336 Před měsícem

      Call me stupid but isn't OOB error applicable only to machine learning?
      What do you mean by this ?

  • @nate_wil
    @nate_wil Před měsícem +22

    I think we figured out where Tom went after creating JDSL. He's clearly the solution architect at crowdstrike.

  • @BrazilMentionedHueHue
    @BrazilMentionedHueHue Před měsícem +8

    Tip of the day, if your opponent in blackjack is a Crowdstrike engineer, remind yourself that if you win by getting 21 he will crash.

  • @WiseWeeabo
    @WiseWeeabo Před měsícem +7

    1% is STILL DOWN? "8.5 million systems crashed".. so that's 85k servers and important infrastructure still down.. that's wild

    • @thomasdial8664
      @thomasdial8664 Před měsícem +2

      Some probably are systems that don't really matter to anyone. Others can't be found because of careless company record keeping. Most of the rest are just normal variation in the number of running systems. And some, no doubt, are systems where owners have moved from CrowdStrike Falcon to another system, or none.

  • @szirsp
    @szirsp Před měsícem +4

    23:10 Security company you can trust:
    "We don't test what we actually send out"
    "...Well, we do. We test it in production :)"

  • @mantovani96
    @mantovani96 Před měsícem +20

    This is wild! So, they write code to run at kernel level based on a Regex engine and they don’t even have end to end test running the newer version in a real Windows machine.

    • @autohmae
      @autohmae Před měsícem +5

      They tested their code with mock config data, without testing their config data before sending it to many many machines without a staged rollout process.

  • @DustinRodriguez1_0
    @DustinRodriguez1_0 Před měsícem +32

    I love regex. I know they have a bad reputation. But they have just always made sense to me. This document is hilarious, and could be replaced with "They used a * where it should have been a +"

    • @CottidaeSEA
      @CottidaeSEA Před měsícem

      Regex is nice but easily abused and I find people aren't nearly as good at it as they think. I tested a colleague by giving him a basic regex challenge and he introduced a potentially extreme performance issue in the regex. A couple of minor changes and it was fine. However, had he worked at Cloudflare, the Internet would've gone down.

    • @bigdogdman1
      @bigdogdman1 Před měsícem +2

      I love regexes too! Up to and including the fact that I built my own regex tester engine to allow me to test a multi-regex search & replace for a task I was assigned to extract address data out of a TEXTAREA field for thousands of records! Think about that for a second...
      Perfect application of regex. I got it up to about 98% accuracy. I was then fired for not completing it faster. Shoulda just said, "Good Enough" Weeks earlier...

    • @patrickjreid
      @patrickjreid Před měsícem +1

      ​@bigdogdman1 yeah, they don't care about perfect code. I got laid off for letting someone else break my code then not being able to fix it fast enough.

  • @timseguine2
    @timseguine2 Před měsícem +4

    It is important to read between the lines here. If you read closely, they never actually tested the problematic channel file, they only ran it through a content validator. And they tested the template that processes it with fake data and apparently by the seat of their pants by releasing untested channel files into the field.
    I called it already before the report came out: even the lowest bar of testing would have caught this. All they had to do was to to a trial deployment on a single computer as a smoke test. That shit can even be automated, and it is dead simple. Smoke tests always get called out as a imperfect practice, and to be clear: it is not the only thing you should be doing. But it is such a ridiculously low bar for quality control that if you aren't even doing that, then what are you even testing?

  • @alangarde2928
    @alangarde2928 Před měsícem +9

    The lack of control of consumers taking rapid response content was a shock to me. The sensor (which was the thing actually validated extensively by an external source by WHQL, etc) you could stagger on your internal deployments and so do your own automated deployment testing or canary deployments. The rapid response content files you had no control over. The argument was probably this is due to an emerging threat and you need to get it out as soon as possible and many companies wouldn't do their own even minimal testing.... but. Absolutely no control by the customer if they decided the risk of a new content update outweighed the risk of not being covered by an emerging threat whilst they spun up a few VMs. Horrifying case of we know better, trust us.

  • @MikkoRantalainen
    @MikkoRantalainen Před měsícem +6

    Bob Ross would have been proud of CrowdStrike software: it only worked as a result of happy accident. Ever.
    And it was a small miracle the whole system collapsed only now. With the engineering standards that they are demonstrating in this report, I'd have expected to see major issues years ago.

  • @saryakan
    @saryakan Před měsícem +7

    Oh my god, the content files are actually a form of DSL that is written via an UI tool and gets interpreted via regex in the kernel! 🤯

  • @altrag
    @altrag Před měsícem +5

    The problem with code reviews is they generally only review the code that changed - they don't review the code that didn't change but should have. The only way that typically gets reviewed is if the reviewers happen to have a lot of institutional knowledge and can just "know" that something's wrong.
    That runs straight in the face of the "engineers should be treated as replaceable cogs" mentality of management, and the resulting "change jobs every couple years because starting salaries generally keep up with or beat inflation better than raises" response of the engineers themselves. That leads to an industry-wide situation where basically no companies have more than a couple of engineers with domain knowledge that extends further than ~3 years, and those few engineers are usually both busy and also only sticking around because they're too unmotivated to follow the "change jobs a lot" trend and that frequently translated to being too unmotivated to bother trying to do code reviews for dozens if not hundreds of job-hopping eternal noobs.

  • @nidavelliir
    @nidavelliir Před měsícem +196

    Regex is the root of all evil

    • @redchief94
      @redchief94 Před měsícem +2

      I wouldn't go that far. I think that's still the eval(sic) command in bash. Now eval arguments sourced from regex on the other hand is a real candidate to take the title.

    • @debasishraychawdhuri
      @debasishraychawdhuri Před měsícem +3

      A lexer is all regex though

    • @gusic4529
      @gusic4529 Před měsícem +3

      @@debasishraychawdhuri you can easily make a lexer without a single line of regex

    • @AryadevChavali
      @AryadevChavali Před měsícem +3

      ​@@debasishraychawdhurialong with @gusic4529, a lot of languages aren't regular which means they CAN'T be described through regex.

    • @chrisdaman4179
      @chrisdaman4179 Před měsícem

      Most string compare tools use regex under the hood. Just because you are too stupid to understand them, that doesn't make them evil. It's just pattern matching for strings. Learn the syntax and quit crying.

  • @PhrontDoor
    @PhrontDoor Před měsícem +24

    I don't mind the reg-ex.. either you will use a real reg-ex package or you will have to roll-your-own.
    My concern is the parameter 'count' mismatch failing to be detected. How do you not VERIFY parameters?
    And once you pass FOUR parameters, you are incompetent if you are assuming positionality instead of using named-parameters.
    Once you hit 10 absolutely-required parameters for something then you have failed.

    • @tttm99
      @tttm99 Před měsícem +1

      This!

    • @Drummerx04
      @Drummerx04 Před měsícem +4

      Right, I might forgo some validation checks for my little whatever personal programs, but if I'm writing something that can brick computers, I'm skipping nothing.

    • @Knirin
      @Knirin Před měsícem +3

      First, I want an example of a 21 parameter regular expression. Second I want to why the person using said regular expression hasn’t just written a custom parser.

    • @ViewportPlaythrough
      @ViewportPlaythrough Před měsícem +3

      yap.. sounds to me like someone wanted to look smart that they obstructed their codes too much to the point that they themselves cant understand it anymore...

    • @bigdogdman1
      @bigdogdman1 Před měsícem

      ​@@Knirin I have done this. Multiple times. More than 20. Not passing the params to a function call, mind you. I was at least smart enough not to screw *that* up...

  • @MikkoRantalainen
    @MikkoRantalainen Před měsícem +8

    21:30 This sounds like "we only do unit testing with mock data" - how about doing some integration tests, too, with real Windows installs, too, before distributing the files automatically to nearly 10 million systems? Like run the actual update on real hardware running Windows, restart the Windows system and then check if the booted system can detect the attacks you're trying to guard against? That kind of testing would have shown that "oops, the system didn't come up after restart".

  • @ContagiousRepublic
    @ContagiousRepublic Před měsícem +39

    1- When you try to solve a problem with regex, now you have 2 problems.
    2- When you make your own regex or change regex type, fork both problems whether you screw early users or not.
    3- When you call regex "AI", you have one more problem, plus a class action lawsuit coming as another problem.
    4- Don't strike the crowds, they strike back! Also they know regex better than you and EPIC HAX TROLLING is garanteed.

    • @user-ai512
      @user-ai512 Před měsícem +2

      "When you call regex "AI", you have one more problem, plus a class action lawsuit coming as another problem." When did they call regex "AI" as you say.... they never did. Regex was used for certain checks not even correlating to AI. Get your facts straight bud. 🤦‍♂🤦‍♂🤦‍♂🤦‍♂

  • @hbobenicio
    @hbobenicio Před měsícem +7

    Channel File definition: Remote Kernel-Space Code Execution Delivery Channel. by CrowdStrike. From people who dont know how to test arrays, inputs, perform functional testing and canary deployments. Security Company BTW. And somehow your business really thinks this makes your system more secure overall...

  • @MikkoRantalainen
    @MikkoRantalainen Před měsícem +5

    39:30 I read this as "If Microsoft had provided us easy-to-use API for this stuff, we wouldn't have needed to create our own kernel driver". Yeah, but you did decide to write your own kernel driver but did half-assed job on it.

  • @RichardFarmbrough
    @RichardFarmbrough Před měsícem +16

    It's not that 1% are "still down", it's that 99% of those that were up a week ago are up now. This is, they say, normal.

    • @aenguswright7336
      @aenguswright7336 Před měsícem +3

      That is what they said, but I am somewhat skeptical to be honest that so the world-wide fleet could have recovered so fast

    • @betag24cn
      @betag24cn Před měsícem +2

      the part that made me upset is they used the term service down
      it is not a service, you fued millions of pcs, took weeks to recover some of those, it people had the worst month ever
      they tried to make it look like it was a website that was down for a hour, the lack of respect for customers is criminal here

  • @klaussfreire
    @klaussfreire Před měsícem +10

    Funny thing is... they did perform fuzzing tests. That should have triggered the error. Something tells me that channel files had a CRC or something to protect them from fuzzing, and they never tested cases where the CRC actually was OK but the input was garbage. Takeaway: fuzzing a CRC-protected field is a waste of time - yes, CRC works. Pat on the back. Fuzz the actual input of actual inner routines, where meaningful parameter validation should be present instead. Fuzzing the input before CRC and other integrity checks only checks your ability to detect corruption, fuzzing the input past those checks checks your ability to detect erroneous input, which is not at all the same.

  • @LegionInfanterie
    @LegionInfanterie Před měsícem +12

    When this shitstorm happened, I was on my first day of vacation. Our company was hit at 06:50 in the morning. My whole vacation ended because my employer called me back to work. We worked the entire weekend to restore more than 25,000 endpoints thanks to CrowdStrike. What a Fucked weekend.

    • @Tigrou7777
      @Tigrou7777 Před měsícem +2

      I hope you've been well compensated for this (for example, a few extra days off for your next vacation).

    • @JaydedWun
      @JaydedWun Před měsícem +1

      that's illegal in my country

    • @LegionInfanterie
      @LegionInfanterie Před měsícem +3

      @@Tigrou7777 I have a clause in my contract stating that in the event of a critical incident, my employer can call me back to work from my vacation and must compensate me for the vacation days.

  • @crazycrazy7710
    @crazycrazy7710 Před měsícem +11

    40:10 crowdstrike uses bpf programs from userspace to monitor these things in Linux. These are loaded at runtime by CS using bpf which is safe as the JIT will evaluate this and if there is any issue, bpf program will not load. so CS complaining about lack of robust security features in Windows userspace is a valid argument. But CS is at fault for not doing the boot test.

    • @EwanMarshall
      @EwanMarshall Před měsícem +8

      And they did manage to kernel panic via BPF, sure, there was a bug in linux that allowed that, but still, it was clownstrike that sent an update out to administrators to install that caused that to be found (clownstrike didn't first report it, the administrators did when they ran it on test machines).

    • @tablettablete186
      @tablettablete186 Před měsícem +4

      Just would like to say that Extended BPF (eBPF) isn't super safe, this is the reason you need root to load.
      Classic BPF (cBPF) is super safe and can be used through SECCOMP

    • @crazycrazy7710
      @crazycrazy7710 Před měsícem

      @@EwanMarshall yup you are right. There are still bugs in BPF implementation but at least there is vetting process during loading. In Windows, you are at the mercy of whatever the update comes in.

  • @Primalmoon
    @Primalmoon Před měsícem +4

    42:11 Primeagen says "inputArray[21]"
    Me: Congratulations! Since C is 0-indexed, you've actually managed to make an off-by-2 error.
    Crowdstrike: So, Prime, about that job you wanted... did you want 6 figures or 7?

  • @robstamm60
    @robstamm60 Před měsícem +8

    Okay slowly things start to make sense, they only did happy path unittesting and apparently didn't try to break it nor any system level integration tests on the channel files. In that case it's safe to assume that there are a LOT more hidden problems in that kernel driver and it's only a matter of time until someone finds a critical vulnerability in that driver 😮

    • @EwanMarshall
      @EwanMarshall Před měsícem +7

      given the channel files are not signed in any way shape or form... just finding a way to replace one is a critical vulnerability. just need to trick the code to accept one from me instead of clownstrike's servers.

  • @HollywoodCameraWork
    @HollywoodCameraWork Před měsícem +2

    I find it incredible that there's no tolerance or fail-safe built into this. This is a design failure at the cellular level.

  • @Mustardoable
    @Mustardoable Před měsícem +7

    I assume the kernel driver has to be signed by MS (therefore a long and slow process) someone thought I know... We'll put a regex engine in the kernel driver and have it read these "channel files" then we can release these files quicker and have them control the kernel driver behaviour. So much technobable BS speak for something that just executes regexs lol

  • @supercompooper
    @supercompooper Před měsícem +15

    It was a null pointer and not checking inputs. Has nothing to do with a regex. It was an input validation (or lack thereof) problem.

  • @frankhaugen
    @frankhaugen Před měsícem +3

    Always test WITH production not IN production... It's not hard. That said, I have found instances where testing in production is the only solution, but then it's most likely a customer specific impact

  • @Cafuzzler
    @Cafuzzler Před měsícem +8

    Damn man. Billion dollar security companies are fucking up regex and not checking input length, meanwhile I'm struggle with imposter syndrome because my handrolled gif library isn't working 😭.

    • @sowercookie
      @sowercookie Před měsícem +4

      Never overestimate your billion dollar company and underestimate yourself.

    • @bigdogdman1
      @bigdogdman1 Před měsícem +1

      Well, tbf, hand rolled gif libs *are* way more complicated than regexes...

  • @PseudoProphet
    @PseudoProphet Před měsícem +4

    While the immediate cause of the issue has been identified, it's worth considering whether there are underlying systemic problems that contributed to the incident.
    For example, was there sufficient time allocated for testing and code review?
    Were there adequate resources available for the development team?

  • @ianmcewan8851
    @ianmcewan8851 Před měsícem +7

    This reads like marketings idea of what a technical audit should look like.

  • @andrewshirley9240
    @andrewshirley9240 Před měsícem +14

    Regex is fine. Try building a compiler for *any* language without regex, and what you end up with won't be very pretty, and even less maintainable than regex if you can believe it. It's a powerful tool that does a very specific thing. In compilers they're critical for the tokenizer step, and in security looking for known attack signatures seems well within the realm of a valid use case for it. It still ultimately comes down to a bad testing pattern that didn't catch even the most obvious of errors.

    • @an_imminence
      @an_imminence Před měsícem +4

      What? Have you heard of recursive descent? branching character-inspection? There's no need for any regex anywhere in any compiler. In fact, there couldn't be, since regex is context free. Building a compiler with regex sounds like the most buggy, slow mess conceivable. Please don't!
      I like regex for search-and-replace or grep. But please don't build it into a shipping executable.

    • @NoodleFlame
      @NoodleFlame Před měsícem +1

      Oh dear, time for you to read up on compilers. Regex is not critical for a tokenizer, they are found using 1-2 character lookahead, its so simple. Please never use regex for a compiler.

    • @__-nt2wh
      @__-nt2wh Před měsícem

      I'd definitely argue that building your own language with a clear set of grammar rules and implementing a recursive parser that way is MUCH more robust (and probably more efficient too) than trying to write a regex expression

    • @PassifloraCerulea
      @PassifloraCerulea Před měsícem +7

      WTF?! Yes, actually, lexing aka tokenizing (phase 1) is traditionally expressed with regular expressions, like with Flex or Ragel. Recursive descent (or generated pushdown automaton) is for the parser (phase 2) which is meant to take tokens, not characters, and turns those larger pieces into the parse tree in part because it's slower than regex-based lexing. Yes of course you can tokenize with small-N lookahead, but that's literally just ad-hoc informal regex (assuming you're writing that by hand)!
      Sound like y'all need to read up on compilers yourselves.

  • @Ripcraze
    @Ripcraze Před měsícem +18

    Regex? So probably they didn't make tests for it and assumed all inputs would work just fine?

    • @chrisdaman4179
      @chrisdaman4179 Před měsícem +8

      Seriously, all these rust people act like regex isnt just pattern matching for strings. The syntax isn't even that bad when you take a minute to Google it. Easy scape goat for people who don't understand or care to Google even a little bit.

    • @axjkhl7699
      @axjkhl7699 Před měsícem

      @@chrisdaman4179sorry dude but you probably are schizophrenic to claim regex is a good idea in programming, just need to "google it"

    • @vitalyl1327
      @vitalyl1327 Před měsícem

      @@chrisdaman4179 the problems start when code monkeys start using "pattern matching for strings" to parse strings. And dim code monkeys cannot think of any proper way to parse. Very rarely people use regular expressions to answer the question "does this string match the pattern, yes or no?", they use them to parse the string or at least split it in parts. Those who do it are ignorant, incompetent and should have never been allowed anywhere near any coding.

    • @ViewportPlaythrough
      @ViewportPlaythrough Před měsícem +3

      @@chrisdaman4179 yah.. i dont get the whole regex hating crowd in here..
      its also pretty evident that the use of the word 'regex' here is to use a jargon to confuse business people and make what they are saying seem too advanced for business people to understand.. its the same as using 'algorithm' back in the day to scare off non-devs..
      i guess some coders fell for it too

    • @TheRealXartaX
      @TheRealXartaX Před měsícem +3

      @@chrisdaman4179 Well, it's more advanced. Since it's not just a direct match, but a flexible match.
      But it's very easy to use if you have the mind to understand programming in the first place. The people who struggle with regex tend to be the people who don't naturally understand programming very easily in the first place.

  • @SirLino
    @SirLino Před měsícem +8

    This is dangerous code, add lots of checks before allowing code pushes. But we need to move fast! Let’s add config pushes, surely they don’t need all those checks. Huh, this is great - iterating on the configurable bits is so fast now! Let’s move more to the config.
    The end: you get this kind of outage.

  • @anewbimproves5622
    @anewbimproves5622 Před měsícem +2

    It's absolutely wild that their testing procedures didn't (still don't?) include deploying the updates channel file to a known-good machine and seeing if it actually works in production.
    I deploy to a few hundred remote people, and even for the most basic updates I deploy a private release and install it on a production laptop to do some manual integration testing.

  • @catcatcatcatcatcatcatcatcatca
    @catcatcatcatcatcatcatcatcatca Před měsícem +14

    Template type definition file = the header file
    Template types = the cpp file
    Template instances = probably instances of a class defined in one pair of above
    AI = regex written by a human
    Rapid Response Content = we are only shipping a simple regex for a string a new exploit uses. Its like all the others. It practically changes nothing, the mock is validated by actual integration tests. Just ship it.
    They have to have some top tier engineers working on their product. That is the only plausible explanation how this hasn’t happened like twice a year at least.
    If I understood correctly, they had 12 mock-tests supposedly ”prevalidated” by their integration test. None of them caught calling the class/function with insufficient parameters. Very clearly they did not use their validation lab with wide range of systems, because the failure was practically universal.
    It is truly astonishing that the same organisation could both have such lackluster practices yet not make such a simple mistake before this one.

    • @autohmae
      @autohmae Před měsícem +2

      No, you are wrong: they have had similar mistakes in the past, even this year ! They crashed a bunch of Linux systems before in recent times

  • @briankarcher8338
    @briankarcher8338 Před měsícem +2

    The fact that they had zero Integration tests AND zero manual testing is astonishing.
    Even the worst run companies in the world have manual testing. The absolute bare minimum would have caught this problem.

  • @Baile_an_Locha
    @Baile_an_Locha Před měsícem +7

    A few points:
    1. Crowdstrike uses its own regex engine. It’s a DFA implementation. That’s why you don’t see any reference to them licensing a regex library.
    2. The regex patterns aren’t actually parsed on the endpoint. They are pre-compiled in the cloud into a dense binary parse tree.
    3. Any cybersecurity product I’ve ever worked on uses regexes (or regex-like matching) extensively.
    4. It was latent because, as the RCA states, the bug was actually introduced in Feb but it remained latent because the content wasn’t hitting that code path.

  • @ChrisCox-wv7oo
    @ChrisCox-wv7oo Před měsícem +2

    Cause writing your own parser will have fewer bugs than regex...

  • @tttm99
    @tttm99 Před měsícem +4

    I love regular expressions and assembler and I'm not ashamed...
    Because I know how to use them and when not to use them. 😂
    And valuing unit test placation over understanding... This is where we go wrong.

  • @TJackson736
    @TJackson736 Před měsícem +10

    6 days to add an if is what happens when you're agile.

    • @patrickjreid
      @patrickjreid Před měsícem +1

      @TJackson736 ... this is probably the actual source of all evil. Pushing dev for speed and not caring about anything else. It means they can't test properly without getting punished for being too slow.

  • @Novacification
    @Novacification Před měsícem +2

    They were so preoccupied with whether or not they could, they never stopped to think if they should.

  • @westwolf48
    @westwolf48 Před měsícem +5

    Regex is the devil we need, otherwise we have a bunch of people reimplementing them badly with more bugs and no standard at all. The real WTF is needing to extract parameters from a string at boot time.
    Also, in addition to whatever unit tests they're using, they critically need at least one integration test where they update their software on a VM and then try to reboot the darn thing and see if it comes up. If not, fail the build and flag it instead of releasing it.

  • @mattymattffs
    @mattymattffs Před měsícem +6

    Regex is perfectly fine and acceptable. You do have to use it correctly and safely. I realize saying that, that they probably thought they were, but the reality is the only way to use regular expression safely is in non-critical applications. Like matching product codes or colors or something stupid like that.
    Honestly, not doing a simple balance check, isn't excusable. Not in our industry. The hardest problems in programming are naming things, off by one errors, and cache invalidation.

    • @chrisdaman4179
      @chrisdaman4179 Před měsícem +6

      It's just a syntax for defining patterns for pattern matching strings. People don't even realize most of their string tools are regex under the hood. Blaming tech you don't understand has become very tiresome. It's extra annoying that prime has now tainted regex for a generation of devs who could use the tool. All because he didn't google the "scarey" syntax for the 2 minutes it would take to understand.

    • @NoodleFlame
      @NoodleFlame Před měsícem +2

      @@chrisdaman4179 What string tools are you referring to? I have never used any that contained regex.

    • @IronicHavoc
      @IronicHavoc Před měsícem +1

      ​@@NoodleFlameDon't some compilers tokenize using regex?

    • @IronicHavoc
      @IronicHavoc Před měsícem +1

      ​@@NoodleFlameIIRC a lot of grammar parsers *use* regex, but they generally search for very simple token patterns and then perform the bulk of the processing/validation after that (rather than baking that contextual logic into the regex pattern itself)

    • @NoodleFlame
      @NoodleFlame Před měsícem

      @@IronicHavoc If you can remember the name of any I'd be very interested in taking a look. I have written many myself, read numerous books and looked at source code but I don't think I ever come across an implementation that used regex in any form. I can't really see the benefits at the moment, I imagine character tracking gets more complicated.

  • @MunyuShizumi
    @MunyuShizumi Před měsícem +2

    I have casually handrolled regex one-liners for timestamp parsing with leap year and multi-format ISO8601 validation, CZcams URL parsing with optional parameters for any of the 50+ formats, etc. Each probably replaced 100+ lines of obscure code with a couple of named capture group null-checks to determine validity and access parsed data. Never failed in production.
    I would not dare use regex in a kernel module without _really_ robust error handling.

  • @worldwarwitt2760
    @worldwarwitt2760 Před měsícem +11

    "Latent" means the defect was present.

  • @VenomousCamel
    @VenomousCamel Před měsícem +4

    Zero indexing doesn't even help. It's not like someone was expecting 21 inputs and got array index 21 and thought nothing of it. They were expecting 20 inputs (index 19) and got index 20......

  • @_Safety_Third_
    @_Safety_Third_ Před měsícem +4

    Who wants to bet a sprint deadline was a part of this?

  • @MikkoRantalainen
    @MikkoRantalainen Před měsícem +3

    8:40 I think this still doesn't explain why the update file was full of zeros. I would understand if it contained some actual data but didn't match the expected runtime parser.

    • @tma2001
      @tma2001 Před měsícem +1

      that was a red herring as they explained in an earlier blog post ironically as a security feature of the kernel for new file allocations:
      Tech Analysis: Channel File May Contain Null Bytes, July 24th

  • @Tvalfager
    @Tvalfager Před měsícem +3

    Foiled yet again by the old "Have you tried turning it off and on again?"

  • @katzhunter4473
    @katzhunter4473 Před měsícem +8

    Don’t blame Regex. It’s the tool who used it incorrectly …

  • @OfficialBeeswax
    @OfficialBeeswax Před měsícem +11

    I truly hope crowdstrike gets sued into the ground and made into an example. Shoddy practices like this are utterly unacceptable, and the industry needs to learn that.

    • @mort44444
      @mort44444 Před měsícem +13

      would a $10 Uber eats gift card change your mind

  • @BrazilMentionedHueHue
    @BrazilMentionedHueHue Před měsícem +2

    What I read is, we fired the QA department and told the engineers to "build quality in" and that meant BAU when in fact it should have improved integration and e2e tests

  • @UnFiltered1776
    @UnFiltered1776 Před měsícem +3

    Key takeaway: They didn't have enough meetings.

  • @andrew_ray
    @andrew_ray Před měsícem +2

    As a software quality professional, here's my issue. They tested the Template Type. Great, wonderful. Except they only tested it with Template Instances that ignored parameter 21. Not great, but nobody's perfect. Here's the kicker, though: Did they test the July 19 Template Instance? As it appears, no, they did not. So for all their talk of bugs "evading" testing, and a "confluence of issues," the real, singular, egregious mistake that caused the outage was **failing to test the Template Instance or the updated content file that included it.** This is addressed in item 5.
    Given they go on about "content files" not being code, I'm unsurprised they don't see fit to test them. Now, there's nothing wrong with mocking _in unit tests._ Where things fall apart is when you _only_ test against the mock. That's why we have integration testing, but that would require CrowdStrike to admit that content files are code that integrates with the Template Type.

  • @CommanderRiker0
    @CommanderRiker0 Před měsícem +4

    People blaming regex as evil is pretty silly. Its a basic pattern matching tool, that was derived from the most simple logic for pattern matching. Like everything, using it wrong without testing or sanity checks will have bad outcomes.

    • @bigdogdman1
      @bigdogdman1 Před měsícem +1

      Huh. Sounds kinda like a lot of other tools... like _*every single one of them*_....

    • @CommanderRiker0
      @CommanderRiker0 Před měsícem +1

      @@bigdogdman1 Yes, exactly.

  • @NostraDavid2
    @NostraDavid2 Před měsícem +1

    Somebody didn't version their files. VERSION YOUR FILES, PEOPLE!

  • @thekwoka4707
    @thekwoka4707 Před měsícem +3

    "we used the generalized language to be less technical" "massive jargon salad"

  • @AsgerJon
    @AsgerJon Před měsícem +4

    The only detail that matter is that smooth-brained executives allowed a vulnerability to yeet all of their computers because of one mistake.