How One Line of Code Almost Blew Up the Internet

Sdílet
Vložit
  • čas přidán 20. 05. 2024
  • Sources:
    blog.cloudflare.com/incident-...
    blog.cloudflare.com/quantifyi...
    bugs.chromium.org/p/project-z...
    asamborski.github.io/cs558_s1...
    www.colm.net/open-source/ragel/
    "[CloudFlare] A Day at the CloudFlare Office" • [CloudFlare] A Day at ...
    Assumptions:
    - The graph for "email obfuscation" vs. "bug occurrence" at 2:51. This was added to illustrate that the bug was being triggered by this feature. They did not have a convenient graph that told them when the bug was being triggered.
    - The "crossroads" mentioned at 3:55 probably did not happen. Just to add drama/plot.
    - Explanation of why fhold is called within the finishing action of script_consume_attr is my best guess 7:50
    - The history behind the empty last buffer was never explained. But I assume that some existing Module A would originally feed data to the Ragel parser. Module A still existed, and still continued to output this empty last buffer, but now cf-html can stand between Module A and the existing Ragel parser. Here, cf-html would consume Module A's data + the empty last buffer with no issues, but it's output would no longer include the dummy buffer. This output can then be taken in by the Ragel parser.
    - Whether or not Cloudflare modified the compiled C code is unknown/never mentioned. There must be a reason that Ragel chooses to use == for the buffer end check rather than ≥, and semantically, == makes more sense if it checks for the buffer end with every iteration, which should make buffer overrunning impossible.
    - Technically in the strictest sense this is a "buffer over-read" as opposed to an "overflow" or "overrun" but the Wikipedia page for Cloudbleed says "overflow" so w/e
    - Whether or not this bug going unnoticed/discovered by hackers first would've "blown up the internet" is arguable
    Error corrections:
    - 13:13, the correct number is 0.06% (what is shown), but I say 0.6%
    - 13:28, the bug was possible since September (what is shown)
    Chapters:
    0:00 Exposition/useless story building stuff
    0:50 Explanation of Cloudflare and CDNs
    1:44 Implications of the bug
    2:40 Mitigation timeline
    4:46 Root cause
    10:43 Lessons learned
    12:41 Resolution
    Music by LEMMiNO:
    Nocturnal - • LEMMiNO - Nocturnal (BGM)
    Encounters - • LEMMiNO - Encounters (...
    Cipher - • LEMMiNO - Cipher (BGM)
  • Věda a technologie

Komentáře • 1,4K

  • @CrimsonGamer99
    @CrimsonGamer99 Před rokem +4740

    I love how there's a team of literal wizards out there somewhere speaking ancient runic languages who have the power to delete the internet if they get a rune wrong.

    • @jeffbrownstain
      @jeffbrownstain Před rokem +124

      Imagine the same but for all of reality 🤔

    • @CtrlAltSHIT
      @CtrlAltSHIT Před 11 měsíci +279

      @@jeffbrownstain guys... I think I have my next D&D campaign.

    • @VonKuro
      @VonKuro Před 11 měsíci +64

      I will add a wizard hat sticker to my programmation laptop in memory of this comment.

    • @girlswithgames
      @girlswithgames Před 11 měsíci +50

      these "wizards" were parsing HTML with regex lol. don't be too impressed by them. don't be fooled by all these fancy terminologies. that's an absolute rookie mistake and big fat no no lol

    • @piotrkozbial8753
      @piotrkozbial8753 Před 11 měsíci +60

      @@girlswithgames What a stupid comment. This is exactly the right thing to do. How do *you* parse HTML?

  • @ryan8742
    @ryan8742 Před rokem +7905

    Cloudflare deserves some more credit for how transparent they were about the issue

    • @guiorgy
      @guiorgy Před rokem

      Considering that Google and all the other caching services were involved, it would've come to light eventually hurting their trust even more, so they didn't have much of a choice

    • @exploitenterprise6515
      @exploitenterprise6515 Před rokem +511

      They knew google would snitch

    • @wowzerxx526
      @wowzerxx526 Před rokem +112

      They had to be

    • @txic.4818
      @txic.4818 Před rokem +266

      theyre an infosec company lmao ofc they were

    • @patryk4815
      @patryk4815 Před rokem

      look at bugbounty, they only pay max 3k$ ;) Soo it is wroth selling bug on black market

  • @recer_
    @recer_ Před rokem +8276

    It sounds like an old engineer added the buffer to avoid a leak, but didn't leave notes on it

    • @kevinfaang
      @kevinfaang  Před rokem +5682

      Pro engineering move: don't actually fix the issue, implement an undocumented hacky workaround that itself seems like a bug so that when someone else fixes the workaround they get hit with the initial issue

    • @stoyanpetkov2488
      @stoyanpetkov2488 Před rokem +966

      @@kevinfaang Now when you put it that way .. that is most likely what happened lol :D

    • @kevinfaang
      @kevinfaang  Před rokem +1453

      To be fair though, I think it is unlikely they knew about the leak, given the rarity and impactfulness. I would chalk the empty buffer masking the issue up to coincidence.

    • @brdrnda3805
      @brdrnda3805 Před rokem +62

      @@kevinfaang He was just placing an elephant in Cairo

    • @darylphuah
      @darylphuah Před rokem +365

      ​@@kevinfaang I think its more likely the author of the code knew about it being possibility, so just put it there just in-case. Otherwise there's really no reason to have an empty buffer.
      The general rule when you see weird stuff like this is, "its there for a reason".

  • @raffimolero64
    @raffimolero64 Před rokem +4410

    one thing my father always told me about iterators: "NEVER check index == end. ALWAYS check index >= end. You should never assume that your iterator is consistent and never skips over any values."

    • @thisismygascan4730
      @thisismygascan4730 Před rokem +542

      i was looking for this comment. it's not like == is any faster or more readable.

    • @MatthijsvanDuin
      @MatthijsvanDuin Před rokem +295

      That advice taken at face value is bogus since many kinds of iterators only support equality-comparison, e.g. when iterating a linked list. But it's good advice if your iterator does support a total ordering that's consistent with iteration.

    • @MatthijsvanDuin
      @MatthijsvanDuin Před rokem +162

      @@thisismygascan4730 == can actually be faster in some cases, since it makes it easier for the compiler to reason about the loop count, but yeah it _usually_ makes little to no difference

    • @randomghost1080
      @randomghost1080 Před rokem +100

      @@MatthijsvanDuin from my experience, nearly all instruction sets I've seen so far (AVR, MIPS, x86, ARM) take the same number of clock cycles for integral comparison (usually 1 or 2).

    • @MatthijsvanDuin
      @MatthijsvanDuin Před rokem +146

      @@randomghost1080 It's not about the time taken to a literal == or >= comparison, it's about enabling the compiler to do loop transformations.

  • @caltissue141
    @caltissue141 Před rokem +678

    As a dev, I'm not sure if I'm comforted because this stuff can even happen to a company like CloudFlare, or horrified for exactly that reason.

    • @fulconandroadcone9488
      @fulconandroadcone9488 Před rokem +24

      That is why we invent safety things. Sure you can operate this machine without such and such guard, but just know for every smart one that can there is dum/drunk/exhausted/stressed someone else that will get chewed up if this machine can operate with someone in eye site distance.

    • @nickstegman8494
      @nickstegman8494 Před rokem +18

      @@fulconandroadcone9488 Sometimes, the smart one is also the exhausted and stressed one (also, smart people can be drunk, though hopefully only during hobby projects). I mean, I don't know about you, but whenever I'm too sleep-deprived, the quality of my work generally declines.

    • @fulconandroadcone9488
      @fulconandroadcone9488 Před rokem +3

      @@nickstegman8494 my guess is if that if you disregard basic safety which includes disabling safety mechanism during normal operation you might not be as smart as you think you are

    • @austint1151
      @austint1151 Před 11 měsíci +13

      ​@@fulconandroadcone9488 you've never worked while severely exhausted. Not a programmer, but in any field exhaustion leads to dumb basic mistakes.
      Example, I was exhausted the other night after a 16hr shift, made some cereal for quick dinner, and put the milk in the pantry and cereal in the fridge.

    • @n1n1_37
      @n1n1_37 Před 10 měsíci +2

      We're only human, after all🗿

  • @kinghoopty9279
    @kinghoopty9279 Před rokem +532

    The more you learn of the Internet and its overall supporting structure, the more you'll realize how fragile it really is and how terrifyingly easy it is to make it crumble.

    • @TheTransitmtl
      @TheTransitmtl Před 11 měsíci +82

      It's a literal jenga tower. We all use packages that use other packages tgat are all maintained by unpaid individuals.

    • @rajatsingh2956
      @rajatsingh2956 Před 10 měsíci +60

      As an electrical power engineer, I would implore you to take a look our electric grid, and you will feel way better about the internet! 😅

    • @JoshSweetvale
      @JoshSweetvale Před 6 měsíci +10

      Hardly.
      It's easy to knock down the modern HTML standard, but the basic adress system and the pysical network are very robust.

    • @CubeInspector
      @CubeInspector Před 5 měsíci +1

      ​@@JoshSweetvaleuntil you learn about operation cyberpolygon being ran by the same people that ran event 201 right before covid

    • @JoshSweetvale
      @JoshSweetvale Před 5 měsíci +3

      @@CubeInspector There's no-one at the steering wheel.

  • @Pence128
    @Pence128 Před rokem +1112

    There are three hard problems in computing: cache invalidation, naming things, bounds checking, and hunter2.

  • @ilgattoparddo
    @ilgattoparddo Před rokem +457

    You managed to make a overly technical topic very interesting instead of boring! Never saw anything like that before.

  • @christopherg2347
    @christopherg2347 Před rokem +454

    So it was the time honored, decades old combination of:
    - working with naked pointers
    - not checking if you reached the end
    - foolishly trusting that input from a network source _isn't_ garbage
    Calling it "Cloudbleed" is fitting, as Heartbleed had all of those as well :)

    • @JorgetePanete
      @JorgetePanete Před rokem +9

      I wish HTML in browsers was actually checked before any representation to force developers to simply make them good

    • @luisderivas6005
      @luisderivas6005 Před rokem +29

      Never trust input for the en user - period. Garbage in - Garbage out. Not too long ago I fixed Non-Ascii chars bombing a piece of middle-ware, all because the commercial devs for the input system never though users would cut and past content from the Internet into a comment field. Nor did they properly set the XML declaration for the output file to specify the character set!

    • @LilacMonarch
      @LilacMonarch Před rokem +10

      *not checking if you reached or overshot the end. the check should have been >= not ==

    • @williamdrum9899
      @williamdrum9899 Před rokem +1

      More like an off-by-one error

    • @christopherg2347
      @christopherg2347 Před rokem +4

      @@williamdrum9899 This was off by a _lot_ more then one.
      And also you were off _with a pointer_ - which is why you should not work with naked pointers unless you really need to go low level.

  • @sahaakhiyat3703
    @sahaakhiyat3703 Před rokem +121

    The title was intriguing and video was so interesting that I didn't even notice that video has 54 views from such small channel with 157 subs. Good job :D

    •  Před rokem +2

      I didn't realized that too until I read this comment :D Really good content.

    • @Gabriel-wq4ln
      @Gabriel-wq4ln Před rokem +7

      2 days later and the channel has 1270 subs

    • @Anonymous-XY
      @Anonymous-XY Před rokem +3

      Now 1.46K subscribers
      Edit: Now 4.01K subscribers
      Edit2: 6.04K subscribers

    • @user-lj4lo7cx7m
      @user-lj4lo7cx7m Před rokem

      Now 2.2k

    • @user-lj4lo7cx7m
      @user-lj4lo7cx7m Před rokem

      Update: 3.2k

  • @CottidaeSEA
    @CottidaeSEA Před rokem +718

    This also shows a different common issue with loops, where the exit clause is an equal value. Sure, you might expect the incremented value to eventually always reach the desired value, but the safer thing to do in this case is check if it's higher or equal. I would likely write it as less than, but depends a bit on what the surrounding code looks like.

    • @WackoMcGoose
      @WackoMcGoose Před rokem +172

      Was gonna point out the same thing. Never assume an incrementer value will always eventually _exactly equal_ a target value, any number of things (race conditions, cosmic ray bitflips, floating-point fuckery, ancient mummy-curses, etc) could cause it to somehow "miss" the intended exit value.

    • @magnusculley6817
      @magnusculley6817 Před rokem +64

      @@WackoMcGoose Ancient mummy-curses is my new excuse for issues in code 🤣🤣

    • @WackoMcGoose
      @WackoMcGoose Před rokem +57

      @@magnusculley6817 Well, once you've ruled out race conditions (because you're using a language that enforces thread safety, or your code isn't multithreaded to begin with), cosmic rays (because your server uses ECC memory), and floating-point fuckery (because you're using integer vars), at that point it's time to look into supernatural root causes.
      ...Honestly, I'm surprised there _aren't_ more COEs containing the phrase "Return the slaaaab..."

    • @cassinihuygens1288
      @cassinihuygens1288 Před rokem +22

      Checking for equality would be the normal case. If the index is beyond the upper limit, that should trigger some form of assertion so the root bug is fixed, not masked.

    • @CottidaeSEA
      @CottidaeSEA Před rokem +35

      @@cassinihuygens1288 Sure, but preventing hell from breaking loose is more important, you can always add logging after breaking out of the loop in case shit hit the fan.
      I'm not saying you shouldn't deal with the issue, I'm saying you should ensure the code runs as expected even when failing.
      A small addition; it would be better to check if the value exceeded the expected value outside of the loop regardless for quite massive performance reasons, since a loop like this will run so many times that a simple if-statement will slow it down significantly. It is also not necessary to check it until after the loop has been exited as it couldn't have been above it prior to that. So regardless this is how you'd do it.

  • @TheStiepen
    @TheStiepen Před rokem +7

    This is a really great video. It explains the situation really well and easy to understand. I also massively appreciate how you put footnotes in the description.

  • @tc2241
    @tc2241 Před rokem +70

    Your presentation and use of case studies to make something as mundane as reviewing code somehow made for a downright entertaining watch. Subscribed!

  • @conkerconk3
    @conkerconk3 Před rokem +913

    I didn't really think a HTML/XML parser was _that_ hard to implement, but i never even thought that it basically is just a giant state machine, where different characters can change the entire state in many different ways, and managing that is nightmare inducing

    • @robrucki6695
      @robrucki6695 Před rokem +32

      you don't actually write the state machine, that code is generated

    • @arjix8738
      @arjix8738 Před rokem +78

      Any parser can be a state machine.
      The fact that it is a state machine is just an implementation detail.

    • @luck3949
      @luck3949 Před rokem +20

      A layer of abstraction is missing in the description. A finite state machine is indeed used in many modern approaches, but they are usually auto-generated from a more human-friendly parser description. So the finite state machine remains under the hood, invisible for the end-user (programmer). Do you know regexp? In classic implementations finite state machines are generated from regexp, and then used to do the parsing.

    • @MrMudbill
      @MrMudbill Před rokem +75

      You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts.

    • @cameron7374
      @cameron7374 Před rokem +6

      And the HTML parser needs to be able to turn itself off temporarily when it finds a element.

  • @noahkrause6607
    @noahkrause6607 Před rokem +20

    Well done, Kevin. This was extremely well made. Loved it!

  • @AT-rw3ou
    @AT-rw3ou Před rokem +9

    Kudos for isolating and fixing the problem successfully in short order. Really like the video, well made and fun to watch.

  • @inventorxtreme
    @inventorxtreme Před rokem +29

    I finished watching the video and thought that you would have at least 100k subs. Keep making great content and you will definitely go far!

  • @bazoo513
    @bazoo513 Před rokem +38

    Now, this was _very_ instructive, and, to this old hand who committed every imaginable blunder under the sun, very relatable. Thank you!

  • @hehe-te7ed
    @hehe-te7ed Před rokem +9

    I've been looking for videos like this. Tech/software story telling. I loved your delivery, hope to see more from you. Subscribed!

  • @nocturne6320
    @nocturne6320 Před rokem +50

    This is the reason why I always do >= than just == comparison, you never know when an accidental off by one error might occur

  • @noceur9102
    @noceur9102 Před rokem +41

    This video and channel deserves so much more recognition. I'm almost halfway through the video and it's so well put together. Good wishes and I cannot wait to see you get the views you deserve for the effort put in. ♥♥♥♥

  • @jonwallace6204
    @jonwallace6204 Před rokem +235

    As a QA software dev, my job is to write the program that runs our automated tests as well as hunt down bugs in the code the regular QA guys find. Kudos for making an entertaining video out of a bug hunt.
    I have had some looooong hunts. Requiring a very specific type of bad input is really hard to figure out, I often end up stepping through the debugger trying to think of if any of the possible branches would fubar me.

    • @chupasaurus
      @chupasaurus Před rokem +11

      I have a fun bug report for you.
      Devs on project come up to me saying that Cassandra DB rejects a basic SELECT query on data of a certain user, and they can't see the entire stack trace because of logs are masked on production. I try the query and see that JSON parser returns with "index out of bounds" while constructing the array of integer IDs from the result! By removing numbers from the end of it and looking up the source code of the parser used by the database, it turned out that the hacky ICPC-style was failing on a combination of the first 3 numbers pushing the index into thousands.

    • @hb1338
      @hb1338 Před rokem +11

      I became a much better software engineer the day I joined a small company where it was standard practice to stay at work until the bug that was blocking process was fixed. I wasn't there very long before I became very adept at identifying possible error conditions. As you suggest, a good debugger is an essential tool because it shows you all the things you didn't think about.

    • @AT-rw3ou
      @AT-rw3ou Před rokem +1

      @@chupasaurus Exhaustive scenario walk-through during code inspection could have weeded it out before it got into the final product, but it depends on the caliper of the code inspectors.

    • @bumbleguppy
      @bumbleguppy Před rokem +7

      The QA community should self publish a monthly magazine only filled with bughunts from devs out i the trenches like this video. Like veterans telling war stories at a bar, imagine all the little things you could pick up along the way over the years.

    • @effuah
      @effuah Před rokem +4

      The evil part comes, when running the debugger changes how the program is executed, for example when the bug comes from a race condition and running the debugger naturally makes the code run slower.

  • @chrisnguyen180
    @chrisnguyen180 Před rokem +2

    New favorite channel; love the high BPM background music, visuals, concise yet detailed explanation, overall format/structure, the Michael Bay explosions every 5 seconds, etc.

  • @FlygisTheFlygis
    @FlygisTheFlygis Před rokem +2

    Kevin! Just stumbled on this video and was watching until 5:00 before noticing that your channel is tiny. How are you this good at this humble size! Big ups to you my man. Thanks for the content

  • @24can22
    @24can22 Před rokem +4

    This form of content has great potential for your channel. good work

  • @deletedaxiom6057
    @deletedaxiom6057 Před rokem +6

    I just found your channel, must say I am impressed and like the content. I have a CS and Mathematics degree, and most channels don't give a deep enough dive or are just too cornflakes with water dull to pay attention. Thanks for the content

  • @unlisted9494
    @unlisted9494 Před rokem +2

    I've been on CZcams since it was Google video and I've never seen a video description like yours, good stuff, this video presentation goes tough as nails, I salute

  • @zavhytar9333
    @zavhytar9333 Před rokem +5

    I really appreciate how this guy has all his assumptions, corrections, etc, in the vid description.

  • @fehsilva4868
    @fehsilva4868 Před rokem +137

    This channel is so underated omg I can't even believe that a small channel can produce so much quality content, keep going bro, you're going long ways ❤

    • @AndersHass
      @AndersHass Před rokem +3

      Based on the voice I think it’s the same guy behind Fireship
      Edit: After listening to Fireship again I can hear a difference in the voice, and Fireship also shares a channel called Jeff Delaney which I would think is his personal channel.

    • @fehsilva4868
      @fehsilva4868 Před rokem +3

      ​@@AndersHass I don't think so, it has some similarities but I don't think it is the same guy behind fireship

  • @austingoodrich2193
    @austingoodrich2193 Před rokem +3

    I got recommended this video, first time I've seen your channel and I gotta say I enjoyed this video. I'm now subscribed 👍

  • @Ali-in-the-goblin-cave
    @Ali-in-the-goblin-cave Před rokem +1

    I was super surprised when I saw that you didn't have atleast 100k subscribers because of how high quality this video is. Great job!

  • @superironbob
    @superironbob Před rokem +37

    An empty buffer as a terminator of a buffer filled with null terminated strings is common enough pattern to be named: double null terminated buffers. But it's also weird enough that even when its used it's often not consistently produced or consumed.

    • @algorythm4354
      @algorythm4354 Před rokem +1

      Cool info! A lot of people here are assuming it was some design quirk

  • @brianreece2120
    @brianreece2120 Před rokem +6

    very entertaining and educational! your combination of humor and narration is great

  • @dooza
    @dooza Před rokem +20

    Got this in my feed with only 20 views. Usually I always skip those, but I gave it a chance. Great quality on the video, don’t think you’ll not be big one day.

  • @edge_case
    @edge_case Před rokem +1

    This channel is awesome, man. Props. You deserve far more subs.

  • @tomhekker
    @tomhekker Před rokem +1

    You just popped up in my recommendations! Great video!

  • @Droftals
    @Droftals Před rokem +5

    I work on systems that address network vulnerabilities for AWS, and you did a good job here.

  • @JGHFunRun
    @JGHFunRun Před rokem +13

    I always feel uncomfortable seeing/doing something like p == pe to check if the end has been reached, interesting seeing those fears validated. I always do p >= pe, and add assert(p

  • @TheHTMLCode
    @TheHTMLCode Před rokem +1

    Stumbled on this channel today, top quality content! Thanks 😄

  • @toshimichi
    @toshimichi Před rokem +1

    I love your editing style. nice vid!

  • @kanbekan
    @kanbekan Před rokem +8

    Such informative video! It's easy to follow and it taught me to value backwards compatibility more.
    I hope you get more views!
    Also kinda surprised to see Mr. Affable there

    • @oru
      @oru Před rokem

      mr affable is everywhere

  • @keco185
    @keco185 Před rokem +5

    If you’re ever waiting for something to increment until it equals a value, it doesn’t hurt to have an “else > value” block that throws an error to let you know something went wrong

  • @originellerNickname
    @originellerNickname Před rokem

    Got this in my algorithm, and really liked the video and the way it was narrated! :)

  • @pacigisto
    @pacigisto Před rokem +29

    This video was super good and well-made! I don't know how to describe it, but it just felt good to watch. The visuals were just so satisfying, and I especially liked 6:51. I also really appreciate the sources, assumptions, and corrections in the description! Many big CZcamsrs don't cite anything and go by the philosophy of "well you shouldn't trust me entirely anyways so it's not my fault if I misinform you." Subscribed and liked, great video!

  • @SuperShadowP1ay
    @SuperShadowP1ay Před rokem +3

    Great video! The silly sound effects were beautiful; they're what I imagine in my head when doing my own programming :)

  • @sankeethganeswaran3024
    @sankeethganeswaran3024 Před rokem +9

    wtf lol i clicked on this and watched it the whole way thru thinking it was from some big tech channel but it only has 984 views. lol good vid bro

  • @thepisewigeon
    @thepisewigeon Před rokem +2

    How did I not know of this channel before? Great video dude

  • @SlickHF
    @SlickHF Před rokem

    wonderful production quality from such a small channel
    keep it up man

  • @jennytalia8224
    @jennytalia8224 Před rokem +10

    TLDR: Buffer overrun in some legacy parser.

  • @ligamo2615
    @ligamo2615 Před rokem +4

    Absolutely incredible video. Thank you.

  • @conraddgg6800
    @conraddgg6800 Před rokem

    really enjoyed this video mate! keep it up! you can blow up to hundreds of thousands of subs with this quality

  • @NEOOOOOOON
    @NEOOOOOOON Před rokem +1

    this style of video is great. you should do more of this

  • @thetuerk
    @thetuerk Před rokem +15

    8:09 the humor in this little animation is simply sublime. Made me giggle while watching at 1 am

  • @draakisback
    @draakisback Před rokem +17

    This is where documentation becomes important. If the developer who implemented the empty buffer had explained why they needed it, maybe this wouldn't have happened or at the very least they could have figured out a way to circumvent the problem when they were first rewriting the HTML parser.

    • @ferociousfeind8538
      @ferociousfeind8538 Před rokem +7

      But there's always the chance the buffer was perhaps entirely accidental (smells like an off-by-one error to me, instantiating one too many buffers for our purposes) rather than actually covering up this other off-by-one error in the other parser

    • @draakisback
      @draakisback Před rokem +2

      @@ferociousfeind8538 naw, I don't think it was a one off. They had deliberate functionality in there were the system would chunk html docs and the final doc in the buffer was always empty. That empty buffer chunk was like a carriage return in a way. The closing tag of the HTML indicated the end of the doc but if the doc doesn't have a closing tag, the lack of characters in the following buffer would work as a flag that basically tells the parser that this is the end of the doc. It's obvious that this was done deliberately but the reason is a bit vague as to why exactly they did it this way. Hence why documentation would have been important. When they migrated from the old parser, they obviously didn't take into account the edge case where the HTML is broken and they removed the empty chunk without adding logic to handle the usecase.

    • @pvic6959
      @pvic6959 Před rokem +3

      this is why i write my thought process for most things. espically if its "clever". forget someone else. *I* need to remember what I did and why 2 months later when I look at it aagain lol

  • @EstebanGM245
    @EstebanGM245 Před rokem +1

    This video is amazing! You deserve much more popularity.

  • @TarekJellali
    @TarekJellali Před rokem

    Awesome video. Looking forward for more content like this!
    Keep it up!

  • @bomono3973
    @bomono3973 Před rokem +4

    keep making videos like this, this video was amazing, absolutely loved it.

  • @robschn
    @robschn Před rokem +4

    Dude I work in CDN and you explained everything perfectly, great work!

    • @raynjpg
      @raynjpg Před 3 měsíci

      Where do you work? Is it a worthwhile job? I'm just getting started on my Comp Sci degree and still exploring and researching my options.

  • @trevise684
    @trevise684 Před rokem

    great sound quality and recording. already better than most other channels

  • @kranhat
    @kranhat Před 7 měsíci +1

    Thank you for adding the credits, you are the best!

  • @MarkVrankovich
    @MarkVrankovich Před rokem +8

    Wow. I can't imagine the time it must have taken to make this video. Well done.

  • @TheMorphium
    @TheMorphium Před rokem +10

    Had some code break once after an update. It was the update that exposed an old bug that hadnt been caught. I can see how things like this happen.

    • @fulconandroadcone9488
      @fulconandroadcone9488 Před rokem +1

      Those can fun, what is not fun is having an issue and purposefully ignoring it until it bites you in the non of the pleasant areas.

  • @kfqfguoqf
    @kfqfguoqf Před 2 měsíci

    High-quality video. Didn't know your channel, subscribed now

  • @ali_xD_
    @ali_xD_ Před rokem

    holy shit bro this vid is so good i thought you had like 100k subs but only 6k? damn you deserve some recognition

  • @nothayley
    @nothayley Před rokem +5

    This is a fantastic video. Wonderful explanation that I think even someone who never saw a line of code might understand

  • @davidmartensson273
    @davidmartensson273 Před rokem +7

    Which is why when having range checks you should not test for equality but equal or greater/less, that way a one off error can only catch one extra char. Yes it will cost performance, but if that is unacceptable, you need to have much better understanding. The extra empty buffer seems in my opinion as a big red flag, if its explicitly added and not just a result of an one off error it probably served a purpose and should be thoroughly documented, or refactored/re-engineered into something less obfuscated.
    This is why I always encourage curiosity and ask all new devs to make sure to question any code they do not understand, if a more experiences dev cannot explain it in an understandable way its probably wrong, or at least bad code that should be rewritten :)

  • @latedriver9019
    @latedriver9019 Před rokem +1

    1.4k... I expected this channel to have at least 100k. Great video man, you're making it into the algorithm!

  • @pxlnghtt
    @pxlnghtt Před rokem

    Congrats on 4k subs, great video!

  • @ViniciusNegrao_
    @ViniciusNegrao_ Před rokem +10

    I've learned to add a >= instead of == even if you always expect the pointer to never get past the target, cuz you never know, right? This could've prevented this from happening as well

    • @Axodus
      @Axodus Před rokem +1

      Yes, only add == when you truly only want to run your code when the variable is exactly that value, if the code can accept higher values there's no point not using >=

  • @Fudwinkle
    @Fudwinkle Před rokem +6

    I had a stupid bug last month, that heavily degraded performance, also happened Friday night. I had added a new caching solution some years back (one I wrote), and after 1,5 years flawless performance, I added more usage. This tipped the scales, and the caching storage was exhausted. I then remembered I forgot to add a flush of storage in this case, so it got full and all new requests failed. I quickly added a flush, only a few hours later, but this was only a bandaid; it'll fill, flush, fill. So after some quick sleep with a fever from the flu that I had, I realized the cause (it was added to something unique, causing every call to create a cache with no hits, at 1+ mil transactions per second), so I deleted the caching at this point and it flowed again, phew.
    Cause? A design change. The caching point WAS a good non-unique place for 6 months in development, but during bug testing, someone altered it, so it became unique. And I had already done the tests for performance at scale, so it just wasn't noticed 😬
    Luckily it was hardly noticed by anyone, but it could have been truly terrible. I work in a bank, and the finance engines was grinding to a halt. A process that runs some critical financing was still running after having used 16+ hours! After my hotfix, we terminated it and restarted, and it took 2 minutes (as it should) 😳
    If I wasn't sick with a fever, I could likely have reacted faster, but thinking in that condition was as slow as a caching bottleneck 😂

  • @TelliePebble
    @TelliePebble Před rokem

    Great video!! Your channel is gonna blow up, keep it up 👍👍

  • @nemeziz_prime
    @nemeziz_prime Před rokem

    Awesome video! Great explanation. Would love to see more such videos xD

  • @mumk
    @mumk Před rokem +21

    very well made video. Can't fathom how much hours of hard work has been put into such a masterpiece! Appreciate and cheers

  • @NatiiixLP
    @NatiiixLP Před rokem +2

    Damn, this channel is popping off like crazy! Congrats!
    Sometimes, these videos can be difficult to follow, but this was not the case, and I believe it should be clear enough for most viewers. Great job!

  • @brucethebruce2250
    @brucethebruce2250 Před 7 měsíci

    very cool, extremely concise and clear. A good watch. Thank you.

  • @OrangeC7
    @OrangeC7 Před rokem +3

    10:50 That's horrifying.

  • @_GhostMiner
    @_GhostMiner Před rokem +3

    8:54 and that's why >= or

  • @jubayerabdullahjoy2582
    @jubayerabdullahjoy2582 Před rokem +1

    Awesome explainer mate, nicely done!

  • @joshuaward5004
    @joshuaward5004 Před rokem

    Really impressive response time from cloudflare. Great video👍

  • @3zachm
    @3zachm Před rokem +5

    This is kinda like a crime doc but for software engineering and I'd gladly watch much more

  • @deidara_8598
    @deidara_8598 Před rokem +6

    This is very common for how memory bugs occur in software. One programmer makes an assumption as to how the memory works, and writes their code accordingly, then some other programmer changes some other piece of code that makes it so those assumptions no longer hold, and voila, you have a bug.

  • @realpillboxer
    @realpillboxer Před rokem +1

    6:30 nails it: so many libraries and components only exist within production code bases because it's always "get it done yesterday". So you find something that solves your very specific problem/use-case and run with it, and rarely is enough internal documentation written on *why* it was used.

  • @lazareric
    @lazareric Před rokem +3

    The only problem I see with this, is how the f, didn't they have 100% test coverage, with something this important, if they had proper test use cases, they would instantly catch an issue, with the new parser implementation. It's insane that this important digital companies don't do the most basic coding practices, it's just mind boggling to me.

    • @anujmchitale
      @anujmchitale Před 11 měsíci +3

      No company can reach 100% test coverage. Seems you don't know how any important software is patched together in the real world. 😁
      It's all a patch work. No software can be defect free.

  • @Dexter101x
    @Dexter101x Před rokem +20

    I used to have a website, full of scripts that were poorly coded by me. It was full of unfinished tags i.e. those that weren't closed properly. It could have caused the whole internet to collapse if my website was visited by a lot of people

  • @SegNode
    @SegNode Před rokem

    Great video, love your storytelling style.

  • @davidrose2899
    @davidrose2899 Před 7 měsíci

    Thank you for including the artists you used in this video.

  • @jerryy147
    @jerryy147 Před rokem +4

    2:31 oh no, not the taco tuesdays!

  • @luisderivas6005
    @luisderivas6005 Před rokem +4

    This one was a double-whammy: Code Optimization kills the safety hack the old engineer put in place (never documented; probably SOP to him). And a rookie mistake in implementing boundary check conditions. Back in my good old days of Delphi, the compiler had an option called Range Checking, which would guarantee this kind of bug would never see the light of day....however, it hindered performance and most devs never used it outside of debug builds.

  • @zackdrake8735
    @zackdrake8735 Před rokem

    gotta love that lemmino amposphere! great content!

  • @SIMULATAN
    @SIMULATAN Před rokem

    Great video, so informative yet super funny at the same time!

  • @benyseus6325
    @benyseus6325 Před rokem +5

    Pointers as always being the bane of coder’s existence.

    • @williamdrum9899
      @williamdrum9899 Před rokem +1

      Corporate needs you to find the difference between this pictures:
      Pointers, array indices, references
      Assembly programmers: "They're the same picture"

  • @InsideOfMyOwnMind
    @InsideOfMyOwnMind Před rokem +3

    CZcams's algorithm must have a bug in it because I've never written a single line of code in my life yet here I am, left to conclude that anyone who has is not quite human.🤣😋

    • @Kenionatus
      @Kenionatus Před rokem +7

      Programming is like giving instructions to a hyperintelligent four year old who is doing their best to misinterpret them. It's... an acquired taste.

  • @arabidllama
    @arabidllama Před rokem +1

    The extra buffer at the end feels like a band-aid fix that never got documented

  • @MickGardner-vc4us
    @MickGardner-vc4us Před rokem +1

    Mr/Dr Fang, this channel is awesome!

  • @Yutaro-Yoshii
    @Yutaro-Yoshii Před rokem +3

    conclusion: writing parsers safely is confusing and difficult

    • @bytefu
      @bytefu Před rokem +1

      Mostly if you're using C or C++.

    • @Yutaro-Yoshii
      @Yutaro-Yoshii Před rokem +1

      ​@@bytefu Yeah. imho functional programming languages are the best kind of language to code a parser in. I've coded a json parser in scheme before and that was a breeze. All I had to do was to slice up the string and pass it down to offspring functions. Having no shared buffer and not having to think about pointers and memory life cycle puts a load off the mind.

    • @lhpl
      @lhpl Před rokem

      Not at all. Doing it with C pointers is problematic though. (And I am not sure, is HTML even parseable at all - in a strict sense - these days?)

    • @Yutaro-Yoshii
      @Yutaro-Yoshii Před rokem

      @@lhpl Do you mean "parsable" by being context sensitive? Or belonging to some parsing class like LL and LR?

  • @lebombjames3911
    @lebombjames3911 Před rokem +10

    Me, a hobbyist coder who doesn't know C, C++ or anything more difficult than JS: "Ha, early increment, what an idiot"

    • @redcrafterlppa303
      @redcrafterlppa303 Před rokem +1

      Early increment has its applications. Pre increment is not the cause of the bug. It's just that the code somewhere incremented again which caused the check to skip over the breaking condition. ++p >= pe would have been a safety net for unexpected behavior like this.

    • @txorimorea3869
      @txorimorea3869 Před rokem

      TBH it reeks of glowie code.

    • @orlandomoreno6168
      @orlandomoreno6168 Před rokem

      @@redcrafterlppa303 ++p = pe seems like it would fail on empty buffers

  • @volkruss
    @volkruss Před rokem

    Your video is great! Keep up with the great work. One constructive criticism: imo, the music is too repetitive and a bit too loud, so it can be distracting.
    Besides that, amazing content.

  • @sill
    @sill Před rokem +1

    damn you're a natural! kept my attention for 14 minutes with just the right mix of education and entertainment.

  • @NetherFX
    @NetherFX Před rokem +18

    Not to be that guy, but this is probably one of the most interesting examples why memory-safe RAII is so important, and in Rust this couldn't have happened

    • @Kenionatus
      @Kenionatus Před rokem +12

      Well, it probably wouldn't have happened if cloudflare didn't go into the c code to manually optimise it. People can still shoot themselves in the foot while using Rust.

    • @NetherFX
      @NetherFX Před rokem +1

      @@Kenionatus That's true, i guess it's not "couldn't have happened " but "was less likely to happen"

    • @good-frog
      @good-frog Před rokem +9

      >not to be that guy
      **proceeds to be that guy**

    • @mrjuxmunux778
      @mrjuxmunux778 Před rokem +4

      You are that guy

    • @quipyowert9933
      @quipyowert9933 Před rokem

      Rust programs can still crash but you would have to use the unsafe keyword.

  • @m4rt_
    @m4rt_ Před rokem +5

    A good solution is to use a memory safe language like Rust, so you don't shoot yourself in the foot by doing memory stuff.

    • @wlockuz4467
      @wlockuz4467 Před rokem +1

      Rust is not memory leak proof lol

    • @xphreakyphilx
      @xphreakyphilx Před rokem +3

      yeah, it isn't memory leak proof. But it is memory safe which would have prevented the Ragel bug if Ragel were written in Rust.

    • @m4rt_
      @m4rt_ Před rokem

      @@wlockuz4467 it is memory safe, and that is all it needs to be to prevent this kind of problem.

    • @deanjohnson8233
      @deanjohnson8233 Před rokem

      @@xphreakyphilx the video suggests that the generated code was hand edited so this doesn’t necessarily have anything to do with Ragel.

    • @wlockuz4467
      @wlockuz4467 Před rokem

      @@xphreakyphilx Rust is memory safe but if you watch the video you'll realise its a logical bug and Rust (or any other language) couldn't have caught it by itself.
      The only way to catch this would've been to test it against exact failing test case.