How Regex in C# can kill your app

Sdílet
Vložit
  • čas přidán 19. 06. 2024
  • Check out my courses: dometrain.com
    Become a Patreon and get source code access: / nickchapsas
    Hello everybody I'm Nick and in this video I will show you how a specific feature of Regex can be abused to effectively attack your server and service. This feature is called backtracking and the attack is commonly referred to as catastrophic backtracking. In this video I will explain what it is, how it works and show you how to prevent it.
    Workshops
    NDC Minnesota | 15 - 18 Nov | bit.ly/ndcminnesota2022workshop
    NDC London | 23-27 January 2023 | bit.ly/ndclondon2023
    Don't forget to comment, like and subscribe :)
    Social Media:
    Follow me on GitHub: bit.ly/ChapsasGitHub
    Follow me on Twitter: bit.ly/ChapsasTwitter
    Connect on LinkedIn: bit.ly/ChapsasLinkedIn
    Keep coding merch: keepcoding.shop
    #csharp #dotnet

Komentáře • 129

  • @nidustash6964
    @nidustash6964 Před rokem +81

    "Not everyone can do that, mainly because nobody knows how to write a Regex"
    TBH this got me off guard!! I even choked on my own saliva for a brief moment. In the end, still very educational and fantastic content as usual sir!

    • @FrederickMarcoux
      @FrederickMarcoux Před rokem +9

      But it's so true. Nobody understands Regex.

    • @mandoschMUh
      @mandoschMUh Před rokem

      That one is pure gold, I agree :D

    • @robertnull
      @robertnull Před rokem +3

      I'd say that most people can write a regular expression, but nobody can read it after than, including the author. It's easier to rewrite it than to understand it ;)

    • @GufNZ
      @GufNZ Před rokem +2

      It's not true tho - I am very fluent in Regex, in various dialects, and for everyone else there's RegexBuddy.

    • @GufNZ
      @GufNZ Před rokem +1

      @@FrederickMarcoux I do, very well.

  • @KevinInPhoenix
    @KevinInPhoenix Před rokem +15

    The is an old saying: If you have a problem that requires Regex then you now have two problems.

  • @frossen123
    @frossen123 Před rokem +23

    In the cybersecurity, abusing regex like this is a category of DoS attacks called a ReDoS

    • @rapzid3536
      @rapzid3536 Před 10 měsíci +1

      Interesting, we call it the same thing outside the cybersecurity industry.

  • @pilotboba
    @pilotboba Před rokem +39

    I know this video wasn't about email... but...
    I think MS or some other people have determined there is no way to really verify an email with a regex. I think even MS changed it so they basically look for a single @ in the string to call it valid email format.
    The way to validate it is to send an email with a confirmation link. :)

    • @Mario-cr1ik
      @Mario-cr1ik Před rokem +5

      This approach is the recommended way mentioned somewhere in the ms docs

    • @rezataba6204
      @rezataba6204 Před rokem +1

      What about the login situation? It's not common to send verification emails for logins.

    • @pilotboba
      @pilotboba Před rokem +3

      @@rezataba6204 make the account pending untill the email has been verified. Pending accounts get no access.

    • @albe8479
      @albe8479 Před rokem

      @@rezataba6204 for login for an existing verified account it does not matter. If user with email as login exists, it's all ok. Maybe just to a length check.

  • @myroslavberlad4428
    @myroslavberlad4428 Před rokem +63

    If you have a problem and you have a solution via Regex - now you have two problems

    • @LCTesla
      @LCTesla Před rokem +1

      Do people really believe that or is it about making a cute "zinger" for the uncritical masses

    • @myroslavberlad4428
      @myroslavberlad4428 Před rokem +3

      @@LCTesla yes, they do. And there are reasons for that. Hard to master, hard to debug, hard to update without breaking existing cases. It is not about the tool is bad. RegEx are actually powerfull instrument and there are nice places for its usages for sure, but it is hard to master. That is why, this saying was born

    • @LCTesla
      @LCTesla Před rokem +2

      @@myroslavberlad4428 seems to just applying the KISS principle and restricting its use to appropriate use cases counters all that. The fact that a tool can be mis-used is case against the user, not the tool.

    • @myroslavberlad4428
      @myroslavberlad4428 Před rokem

      @@LCTesla I do agree

  • @Kommentierer
    @Kommentierer Před rokem +12

    Everything I see on your channel is super interesting and special. I never knew about those issues, but it is nice to know how to fix them.
    Sharing this with my colleagues.

  • @HalasterBlackmantle
    @HalasterBlackmantle Před rokem +9

    What's the downside to using NoBacktracking? Or rather, what would be a scenario where you would not want to use it?

  • @tmhchacham
    @tmhchacham Před rokem +1

    Very nice, as usual. Keep it up!

  • @HadrielWonda
    @HadrielWonda Před rokem

    Thanks for the insight nick

  • @RayanMADAO
    @RayanMADAO Před rokem +2

    that regex visualization site is really cool

  • @Denominus
    @Denominus Před rokem +4

    Excellent video and great advice. We've fallen prey to this twice in the past. First an attack directly against one of our APIs and then during Cloudflare's global outage due to a bad regex on their side (not our fault in this case, but still an outage).
    At the time we changed the regex, but there are only a handful of people who know how to do this confidently on a complex regex. I really like these "safety net" approaches.

  • @Hamza-Shreef
    @Hamza-Shreef Před rokem

    this kinda thing has been really useful
    keep it up bro

  • @IvanRandomDude
    @IvanRandomDude Před rokem +6

    Chapas flexing with 32 cores on us mortals @4:52

    • @AcidNeko
      @AcidNeko Před rokem +2

      and rtx 4090 and 128gb of ram :)
      it can run 100 instances of Rider, or 6 instances of Visual Studio 2022

  • @DuelingTreeMike
    @DuelingTreeMike Před rokem

    Amazing find sir. I had no idea backtracking can be so dangerous. Thank you so much for creating this video.

  • @magashkinson
    @magashkinson Před rokem

    Very usefull video. Didn't know about this problem

  • @parkercrofts6210
    @parkercrofts6210 Před rokem

    Thank u for this ❤❤

  • @antonmartyniuk
    @antonmartyniuk Před rokem

    nice call on the Regex problem!

  • @matthewsheeran
    @matthewsheeran Před rokem

    Brilliant!

  • @a13w1
    @a13w1 Před rokem +1

    That timeout option is quite cool when you know how long a normal regex will take to pass even under load. Plan to use it next time If makes sense when I write regex.

  • @anon0
    @anon0 Před rokem

    ooh very cool i just started doing my phd on symbolic automata regex. glad to see it being relevant

  • @rbogdan8980
    @rbogdan8980 Před rokem

    Thanks!

  • @shingok
    @shingok Před rokem +2

    I wonder if the Source Generator version was slower because it was compiled as debug. Maybe the dynamic compiled version generate optimized version regardless of compilation mode.

  • @FunWithBits
    @FunWithBits Před rokem +11

    Regex is a super powerful. I just wish people would format it a little bit more. Usually I see regex and it is just a line of characters. RegEx code can be much easier to read when there is spacing, multiple lines, using different indenting, adding comments, etc. Programmers don't put CSharp code in a single line with no spaces or comments but in regex this is accepted. (and because its hard to read it's impossible to see any performance issues it might have)

    • @chriskruining
      @chriskruining Před rokem +2

      could you give me an example of such formatted regex? Because I always assumed it had to be a line of chars because every space and newline used to format is part of the query as far as I am aware. So I am curious how you do this, because I love clearly formatted code :D

    • @robertnull
      @robertnull Před rokem +4

      @@chriskruining There is a (?x) regex modifier than enables free-spacing mode, i.e. you can put spaces and newlines in your regex and they will be ignored, so you can make your expression multi-line, with each line containing a part that captures something significant. What's more, in this mode you can even use # comments at the end of each line!

    • @PeterK6502
      @PeterK6502 Před rokem

      @@robertnull True, but most input to be parsed is dependent on spaces, therefore this mode is useless in that situation (you could add comments however to increase readability).

    • @robertnull
      @robertnull Před rokem +2

      @@PeterK6502 Fret not, kind sir, for in free-spacing mode you just escape spaces with a backslash to make them part of the important expression and not part of the unimportant formatting :)

    • @PeterK6502
      @PeterK6502 Před rokem

      @@robertnull I did not know that, thanks for the info.

  • @5hunt3r
    @5hunt3r Před rokem +13

    just a note: don't try to validate emails. It's nearly impossible to check if a mail is valid because so many special cases exist where it looks invalid but still is valid.

    • @nickchapsas
      @nickchapsas  Před rokem +10

      The actual RFC regex is HUUUUGE

    • @humanesque
      @humanesque Před rokem +2

      Pretty much this; about the furthest you can go is checking if the domain exists; short of asking the receiving server if it will accept it. Useless checks like these are worse than the blindly copying code (which is what this RegEx is) and being surprised when it goes wrong.

    • @orterves
      @orterves Před rokem +10

      My understanding is the best way to validate an email, is to send a verification email.

    • @nooftube2541
      @nooftube2541 Před rokem +2

      @@nickchapsas the real RFC Regex does not exist 😂 Because email like the domain cannot be parsed with regex.
      Actually there 2 normal solutions: either check @ sign and symbols existence before and after, and check that email is real. But the second option does not handle localhosts...

    • @EmptyGlass99
      @EmptyGlass99 Před rokem +2

      The only 100% guaranteed way to validate an email is to force the user to respond to an email sent to them i.e. sending a validation link or one-time validation code.

  • @brianviktor8212
    @brianviktor8212 Před rokem +1

    10 seconds to check if a given string is a valid e-mail? Sounds great! I mean I could do it with a little custom algorithm with ~0.001µs, but hey, it's regex!! We all love regex, don't we guys?!
    An E-Mail is setup like this: [text]@[domain].[ending] - Either split at the @ or get the index of it. If the result is !=2 elements in the array or -1 as index, you have either no @ or more than 1. Both should return "false" for the check. After that you get the last index of "." (apparently you can have multiple dots?). If it's -1, return false. Otherwise first part is the domain, the second part is the ending. Here you can verify if it's a valid e-mail address.
    It's really simple... I thought everybody would do this? Why even bother with Regex for this?

    • @brianviktor8212
      @brianviktor8212 Před rokem

      @@billy65bob - Hmm yeah, that would require adjustments then. I've never seen those before though. In the worst case I'd have to loop through every char manually, but only once.

  • @carmineos
    @carmineos Před rokem +2

    DataAnnotations should be safe as RegularExpressionAttribute has a default timeout of 2s (at least from .NET 5, idk before)

  • @tanglesites
    @tanglesites Před rokem

    Excellent video as usual! I was wondering if anyone knows of any resources on how to scan Assemblies, I trying to build a setup for a minimal api project I am working on. I would like to pull all the classes that are using a particular interface or interfaces, register them in the IoC, so that it kind of auto-magically works. Do you Nick have any videos on this, or anyone know of anywhere I can look. Everything I have found are particular use cases. Sorry new to C#, I could figure it out I sure given enough time, just looking to speed up development a little and make the code a little more organized. Again great content. You have taught me more in the last month than I have learned in a year, and its more than beginner level, loving it.

  • @peledzohar
    @peledzohar Před rokem

    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. ~ Jamie Zawinski

  • @Victor_Marius
    @Victor_Marius Před rokem

    It happened to froze my browser tab while testing a regex for matching file paths (in JS). It wasn't because of the length of the input but more like some spaces in the input. Why does it use backtracking? Can it be avoided with the format of the regex? If you use something as simple as /w0rd/ is it still going to use backtracking?

  • @deepakkulkarni5356
    @deepakkulkarni5356 Před rokem +1

    Hey Nick, does SQL validation also increase exponentially with more records. Can you share any document link which proves the same?

    • @klekaelly
      @klekaelly Před rokem

      I thought the same thing, SQL validation uses Regex a lot

  • @casperhansen826
    @casperhansen826 Před rokem

    I use Regex for small strings with simple use cases,

  • @rumplin
    @rumplin Před rokem +5

    What a subtle way to show us that you have a RTX 4090 :)

    • @nickchapsas
      @nickchapsas  Před rokem +8

      It’s the only reason I made the video

  • @cn-ml
    @cn-ml Před rokem +2

    Thanks for the video, i already started using timeouts for regex wherever possible. However I don't fully understand what the non-backtracking option does. Why does it change the performance of the regex and what changes with the results?

    • @humanesque
      @humanesque Před rokem +1

      Non-Backtracking is basically lazy evaluation for your regular expressions, and it's implementation dependent. Unless you're using it for a throwaway match (instead of parsing, which is what regex is for), it will introduce weird, platform specific bugs and grief.

    • @cn-ml
      @cn-ml Před rokem

      @@humanesque okay thanks, so it's basically unsafe but faster

  • @nickhubbard3671
    @nickhubbard3671 Před rokem +1

    The best way to avoid issues with Regex is to not use it; and to avoid people that do use it!🙃

  • @pmashurenko
    @pmashurenko Před rokem

    Well it worth to also read RFC 2821 and very quickly it will be get clear that regular expressions are bad tool for email validation - variety of options for names and domain names is so huge that it makes almost no sense to check beyond the point that there's "@" that isn't preceded by "\" character somewhere in the middle there.

    • @billy65bob
      @billy65bob Před rokem

      Not even that is foolproof, as a @ inside quotes is also escaped. :)
      Granted, no one uses quotes in their email addresses, but it is allowed by the standard.

  • @TribalBoss
    @TribalBoss Před rokem +5

    Few years ago I had to check if an HTML string contained any email addresses using Regex. Needless to say, I had to reboot the Azure server after pushing to production 😂

  • @IAmFeO2x
    @IAmFeO2x Před rokem +7

    Great video as always! Personally I avoid Regex like the devil - it always takes so long to read and understand them in code.

    • @infeltk
      @infeltk Před rokem

      I use Regex for simple things. Everything has its purpose and limitations. And problem described in this episode is described on Microsoft leanr page net fundamentals - it is not a secret information.

  • @zedmagdy
    @zedmagdy Před rokem

    I've tried this regex with php preg_match and it works fine I don't know if it's CSharp specific or what?

  • @PeterK6502
    @PeterK6502 Před rokem

    This kind of behaviour is frequently solved by using lazy capture instead of greedy capture, for example instead of using ()+ you should use ()+?
    I can see at least one greedy capture group in the shown expression.
    You should always try to avoid greedy captures, because of backtracking.
    Use ()*? or ()+? instead of ()* or ()+

  • @zxopink
    @zxopink Před rokem +3

    What's the backdraws of nobacktracking?

    • @adassko6091
      @adassko6091 Před rokem +1

      The option can’t be used in conjunction with RegexOptions.RightToLeft or RegexOptions.ECMAScript, and it doesn’t allow for the following constructs in the pattern:
      Atomic groups
      Backreferences
      Balancing groups
      Conditional
      Lookarounds
      Start anchors (\G)

  • @coced
    @coced Před rokem

    6:36
    I felt it

  • @jerryjeremy4038
    @jerryjeremy4038 Před rokem

    Wow that's a monster computer! Too many cores

  • @masonwheeler6536
    @masonwheeler6536 Před rokem

    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

  • @attribute-4677
    @attribute-4677 Před rokem

    Which version is the NonBacktracking enum in? I'm targeting .Net framework 4.8 and it can't seem to find it (VS2022 automatically selects the language version, but even when forced to C# 8 it fails to find it).

  • @gerakore8948
    @gerakore8948 Před rokem

    I've never decided to bother with regex. I see how it can be useful but its a clustered mess. Debugging and code maintenance would be a nightmare. I've done a lot of parsing and I doubt regex would be able to handle some of the inputs I've dealt with. For instance receipts with various formats printed that are cut off mid receipt and with inconsistent headers/footers scanned in low quality into an image format and placed into a pdf on which I would have to use OCR to extract the text. If you can imagine all the text is scrambled 5's tur in into S's 1's turn into I's etc. Sometimes characters are missing and you cant really rely on identifiable tags.

  • @djupstaten2328
    @djupstaten2328 Před rokem

    These patterns overuse capturing groups. (x) should be (?:x) more often than not, i.e. non-capturing groups. It makes a ton of difference in regards to bloat and lag.

  • @pilotboba
    @pilotboba Před rokem +2

    Developer has a problem.
    Developer uses RegEx
    Developer now has 2 problems.
    :)

  • @nickandrews1985
    @nickandrews1985 Před rokem +2

    My second biggest takeaway from this video is that Nick already has himself a RTX 4090 LOL

  • @FunWithBits
    @FunWithBits Před rokem

    Thats odd. I wrote a longer comment and saw it in the comments but then it disappeared after a few minutes. Maybe the CZcams engine removed it after post-processing?

    • @nickchapsas
      @nickchapsas  Před rokem +1

      CZcams is notorious for auto deleting comments especially in programming content. I don’t delete any comments so maybe try to repost it

    • @FunWithBits
      @FunWithBits Před rokem

      @@nickchapsas -I think that happed before on other channel's also. I wish youtube would be more careful on what they delete as it had nothing negative/bad. I'll repost. Thank you for the awsome channel - I learn so much here. I also like how you consider performance as a higher priority is most of your videos.

  • @speakoutloud7293
    @speakoutloud7293 Před rokem

    Soo you got the 4090, wondering what king of games you are playing :P

  • @janneforsell525
    @janneforsell525 Před rokem

    Once again I've opened a PR during the video 😅

  • @KanashimiMusic
    @KanashimiMusic Před rokem

    I find it funny that people keep saying "nobody knows how to write RegEx", because I don't find it TOO difficult. I mean it still takes me a while to do anything remotely complex, but like, it's manageable imo. Usually I will have RegExr open in another tab, since it contains a cheat sheet with the most important features, and it quickly lets me validate that my RegEx works the way it should

    • @KanashimiMusic
      @KanashimiMusic Před rokem

      @@karlfimm I really need to start using GitHub copilot.

  • @mastermati773
    @mastermati773 Před rokem

    Validating emails is so ubiquitous that I wonder why tf Regex can't have a special symbol onyl for emails xD

  • @TonoNamnum
    @TonoNamnum Před rokem +12

    Regex are not extremely hard lol. If you study them for about a week you should be able to create very powerful stuff. And also the secret to regexes in my opinion is to separate them in little chunks.
    When you study them you definitely learn what Nick is describing. I don't regret learning/using regexes.
    I also agree that they are not the most efficient option but if you understand what you are doing it saves a lot of time.

    • @ProtectedClassTest
      @ProtectedClassTest Před rokem +7

      well, wait until you maintain other people's regex and come back here cryin hahaha

    • @RealMathewAdams
      @RealMathewAdams Před rokem

      You aren't coding for yourself, you are coding for the future. Regex can be unmaintainable if the use-case is non-trivial.

    • @TonoNamnum
      @TonoNamnum Před rokem

      @@ProtectedClassTest the crying will be for people that do not understand them like you 🤣

    • @TonoNamnum
      @TonoNamnum Před rokem

      Also this video encourages you to use them czcams.com/video/R5BcHIMZMxc/video.html and that channel has a lot of subscribers.
      I guess the bottom line is you have to understand what you are doing just like everything else.

  • @nooftube2541
    @nooftube2541 Před rokem

    I love that regex for email... and it doesn't work, because email cannot be parsed by regex.

  • @nothingisreal6345
    @nothingisreal6345 Před rokem

    My rule of thumb is: if possible avoid regex. Hard to write. Extremely hard to read for others. If you use proper typed data you will not need it. And no matter how much effort you put into testing and thinking about edge cases: there are sittlich too many times it will fail. For many strings there are alternative ways to verify them: IP address, URI, file path… very often the need to regex is based an a bad design or due to have to connect to legacy systems.

  • @dmytrk
    @dmytrk Před rokem

    In some cases, I write my own algorithm to scan the string, so I can actually debug that.

    • @McNerdius
      @McNerdius Před rokem +1

      This is why i love the new regex source generators, being able to view and step through the C# equivalent is a great learning aid for me. I comprehend the basics of regex but if a string + nontrivial regex combo doesn't pass a unit test or whatever and i can't figure out why... i can step through that particular scenario now, yay !

  • @billy65bob
    @billy65bob Před rokem +1

    2:30 that is very some bad and inefficient code for that pattern.
    I'm guessing this tool is more to break down what the various regex implementations will do in an easy to understand manner, rather than to generate something actually worth using.
    I had looked at the specification of email addresses some time ago, I wanted to know what was valid, and how sub addressing was defined.
    Just the bits in common use are very complicated, and that's before you get to all the weird emails that no one sane would use, but are actually allowed by the standard,
    such as using quotes, escaping quotes inside the quotes, double @'s, non-ascii symbols, a % to set the route, sub addressing, etc.
    What the standard allows is insane, and trying to handle it via regex is a fool's errand.
    You're way better off writing a small program (or library) for the dedicated purpose of validating emails, by having it identify fragments, and validating them as defined.

  • @ToadieBog
    @ToadieBog Před rokem

    To me, Regex has always had the smell of something confusing to use, that I never really cared for. I'm looking forward to a replacement that humans can actually read.

  • @ws_stelzi79
    @ws_stelzi79 Před rokem

    Well what is the saying "If you try to solve one problem with RegEx you have now two problems!"

  • @anonimxwz
    @anonimxwz Před rokem +1

    Regex is very easy to do tbh, the nonbacktracking option affects the result??

  • @codeforme8860
    @codeforme8860 Před rokem +2

    Does anyone acutely know how to use Regex

    • @ryanzwe
      @ryanzwe Před rokem +2

      Nope, I can't read or write it

    • @guiorgy
      @guiorgy Před rokem +1

      I think I know what my Regex expressions do, but only for a few seconds. After that, only god knows what they are for and how I wrote them

    • @RougeEric
      @RougeEric Před rokem +2

      I think it's fair to assume that anyone who's spent enough time with it can comfortably create some shorter regex and know what they're doing. But as soon as you start playing with complex nested systems and tons of lookahead stuff, even with significant practice, I have to test things extensively just to make sure they are doing what I think they're supposed to.

    • @geomorillo
      @geomorillo Před rokem +1

      regwhat?

  • @jspesh
    @jspesh Před rokem

    Nice RTX4090 & 128gb ram, bro!

  • @GaryJohnWalker1
    @GaryJohnWalker1 Před rokem

    Regex kills my brain so why not the computer too

  • @mirabilis
    @mirabilis Před rokem

    No backtracking will break the regex.

  • @tarsala1995
    @tarsala1995 Před rokem

    Wut? You already have RTX 4090? 5:00

  • @gregcyrus2739
    @gregcyrus2739 Před rokem

    Hate regex! If you re-engineer foreign code you will never know what was intended to validate for. The LIKE operator is not that flexible but I could always validate everything (maybe with a sequence of LIKE-lines - and it was human readable)

  • @theMagos
    @theMagos Před rokem +1

    128 GB RAM? Yikes...

    • @FunWithBits
      @FunWithBits Před rokem

      Maybe for video editing?

    • @nickchapsas
      @nickchapsas  Před rokem +3

      I wish I had a good reason….but I don’t….

  • @StasAbrosimov
    @StasAbrosimov Před rokem

    If you decide to solve the problem with regular expressions... You now have two problems: the original problem and the regular expression.
    It's an old joke....

  • @abhishekbagchi6052
    @abhishekbagchi6052 Před rokem +1

    Clicked so fast

  • @alirezanet
    @alirezanet Před rokem

    Nick I know regex 😊 stop saying that if you don't man 😂
    PS. just kidding ... I just can write regex but after a while only god knows what it is doing 😂😅

  • @Max_Jacoby
    @Max_Jacoby Před rokem

    nick@n.n.n.n.n.n.n.n.n.c should be a CPU benchmark.

  • @claudiufarcas
    @claudiufarcas Před rokem

    Nice seeing you in person @dotnetdays.
    Keep doing great things!
    You're awesome!

  • @katerinaandrasko3755
    @katerinaandrasko3755 Před rokem

    how about - don't do regex?... i know crazy, but with emails check if there is "@" symbol if it is, cool, accept it. applications should try to send you that email to continue with whatever you want. want to register? cool - type in the verification code? want to recover your account? cool, click on the link in your email. at the end of the day that's what truly validates your email address - you get an email.

  • @asedtf
    @asedtf Před rokem +4

    Brb gonna go try signing up to everything with nick@n.n.n.n.n.n.n.n.n.n.n.n.n.n.n.c

    • @jedimastermaniac
      @jedimastermaniac Před rokem

      lol. we still have to take into account for every action that the end user is gonan end up notories stupid bastard :D :P