The New Best Way To Search for Values in .NET 8

Sdílet
Vložit
  • čas přidán 23. 11. 2023
  • Use code BLACKFRIDAY23 and get 40% off any course and 20% off any bundle on Dometrain: dometrain.com/courses?coupon_...
    Get the source code: mailchi.mp/dometrain/vi8f453djn
    Become a Patreon and get special perks: / nickchapsas
    Hello everybody, I'm Nick, and in this video, I will introduce you to a new type added in .NET 8 called SearchValues which makes searching for value matches extremely efficient and fast. In fact it is the fastest way to do this in .NET.
    Workshops: bit.ly/nickworkshops
    Don't forget to comment, like and subscribe :)
    Social Media:
    Follow me on GitHub: github.com/Elfocrash
    Follow me on Twitter: / nickchapsas
    Connect on LinkedIn: / nick-chapsas
    Keep coding merch: keepcoding.shop
    #csharp #dotnet #dotnetconf

Komentáře • 144

  • @lordmetzgermeister
    @lordmetzgermeister Před 6 měsíci +246

    I'm adding the quote "It's small but extremely efficient" to my list of excuses in the bed.

    • @Delphi80SVK
      @Delphi80SVK Před 6 měsíci +6

      you killed it with this one :-D

    • @vytah
      @vytah Před 6 měsíci +2

      Does "efficient" mean it finishes very fast?

    • @handlez411
      @handlez411 Před 6 měsíci

      69

  • @zachemny
    @zachemny Před 6 měsíci +157

    Performance comparison with a naive HashSet approach would also be interesting.

    • @ardavaztterterian1211
      @ardavaztterterian1211 Před 6 měsíci +6

      Yes. That's what I was thinking

    • @Crozz22
      @Crozz22 Před 6 měsíci

      I did another benchmark with hashset (check foreach character in ExampleText whether it is in the hashset, return on first mismatch). And it was about the same speed as the ExampleText.AsSpan().IndexOfAnyExcept(Base64CharsArr) == -1; .. So SearchValues still insanely faster

    • @Assgier
      @Assgier Před 6 měsíci +11

      Also a solution using LINQ should be joined into the mix.

    • @lordmetzgermeister
      @lordmetzgermeister Před 6 měsíci

      @@Assgier If you deeply care about performance, you should not use LINQ.

    • @volan4ik.
      @volan4ik. Před 6 měsíci +6

      ​@@Assgier it would be less efficient than the simple loop sample

  • @keithprice3369
    @keithprice3369 Před 6 měsíci +32

    It would be helpful if, when you share cool tips like this, you included a handful of practical use cases. I watch your videos and about half the time I leave thinking, "That's cool, but when would I use this?"

    • @jannowak2413
      @jannowak2413 Před 6 měsíci +1

      But in this video he actually did? He presented how microsoft is using it. Which in my opinion is even more valuable then imaginary examples :D

    • @zirexpl6395
      @zirexpl6395 Před 6 měsíci +3

      I have the same feeling everytime i watch his videos of performance 😅

    • @jeroboam4486
      @jeroboam4486 Před 6 měsíci

      It just means you have no use for it at the moment. But in my case I can see how it could massively improve a project at work.

  • @der.Schtefan
    @der.Schtefan Před 6 měsíci +19

    What I find so crazy is that SearchValues has so many cases it handles, with so many classes, jumps, method calls, many years ago this would have been slow on .NET. But nowadays all of them are so aggressively inlined, and so JIT optimized, they use SSE3, packed instructions, SIMD, etc. that they still complete in an average of 6 CPU clock cycles (which of course is not true. it is only like this in the benchmark case, but due to hyperthreading and pipelining, speculative execution, etc. the throughput delay will be probably around 20 clock cycles).

    • @SirusStarTV
      @SirusStarTV Před 6 měsíci

      By integrating extended processor instructions that almost no one knows about behind the scene anyone gets the benefits that those instructions provide

  • @AlFasGD
    @AlFasGD Před 6 měsíci +30

    The performance improvements and tools they are adding in newer .NET versions are top-notch

  • @sam2943
    @sam2943 Před 3 měsíci

    Thanks for bringing this golden nugget to people's attention.

  • @tomchapman128
    @tomchapman128 Před 6 měsíci +1

    I love the example string you used for the valid base64 string in the benchmarks lol

  • @davestorm6718
    @davestorm6718 Před 6 měsíci

    My God! This has applications everywhere!

  • @nocturne6320
    @nocturne6320 Před 6 měsíci +13

    Missed a chance to compare it with a compiled regex, would like to see the comparison there.

    • @J_i_m_
      @J_i_m_ Před 6 měsíci +1

      Regex and performance.... it will be under the bottom

    • @nocturne6320
      @nocturne6320 Před 6 měsíci

      @@J_i_m_ Regexes can now be source generated into a function, even the Regex class has an option to compile the regex into a function at runtime, so it's never interpreted -> should be pretty fast

    • @_mihazupan
      @_mihazupan Před 6 měsíci +2

      Regex will use SearchValues internally now where it makes sense. For such a simple case you'll just pay a little bit of overhead to call through the Regex logic instead

  • @markharwood6794
    @markharwood6794 Před 6 měsíci +10

    I think the "hat" char is called a "Carat" :)

    • @denissorn
      @denissorn Před 6 měsíci

      Lol I thought it's 'carrot'?

    • @nick066hu
      @nick066hu Před 6 měsíci +4

      Caret

    • @alfflasymphonyx
      @alfflasymphonyx Před 6 měsíci

      in French, it is called Circonflex accent

    • @SteffenBauer
      @SteffenBauer Před 6 měsíci +1

      its called "Circumflex". Caret is a old name for it. en.wikipedia.org/wiki/Caret_(proofreading)

  • @MrCakeLP
    @MrCakeLP Před 6 měsíci +10

    Your benchmarks never gonna give us up.

  • @YuriBez2023
    @YuriBez2023 Před 6 měsíci +10

    I'd like to see the benchmarks for 1k, 10k, 100k, 1Mb size strings. I wonder if it scales.

    • @SteffenBauer
      @SteffenBauer Před 6 měsíci +4

      You just can download the source code and try it ;)

  • @ibrahimhussain3248
    @ibrahimhussain3248 Před 6 měsíci +1

    I do a lot of searches within List. Is there a similar approach we can use with List?

    • @gfx2
      @gfx2 Před 6 měsíci

      SearchValues only supports byte and char with .NET 8, and string with .NET 9. Getting the Span of your List and you can use SearchValues if your T is any of those 3 types

    • @AKW91
      @AKW91 Před 6 měsíci

      If it's ordered, List provides a BinarySearch function with complexity O(log N), else there is only a looping search with an average complexity O(N). If your searches on the list only requires a specific property, consider using a Dictionary instead, accessing an Element by its key has an insanely fast complexity of O(1)

  • @brandonpearman9218
    @brandonpearman9218 Před 6 měsíci +1

    When I need to save a few nano seconds, I will definitely use this.

  • @keyser456
    @keyser456 Před 6 měsíci +3

    I downloaded the sample (asking for email is kind of... ehh... but w/e) to do some comparisons to HashSet implementation (I see in the comments I'm not the only one who instantly thought of that). Rider is complaining (red squiggly error) about the initializer syntax on the character array, but dotnet builds it without errors. I have the latest .NET SDK. Are you using an EAP version of Rider maybe? I've been holding off hoping the new stuff makes it into the mainline soon.

  • @davepusey
    @davepusey Před 6 měsíci +6

    I'd like to see a benchmark of doing this with a regex as that is how I would have likely implemented that method.

    • @jongeduard
      @jongeduard Před 6 měsíci

      Maybe this story is even going to be opposite. It would not surprise me if they are using the SearchValues type inside the dotnet Regex engine as well.

  • @nvmuzrowrihk
    @nvmuzrowrihk Před 6 měsíci +12

    10:12 It's actually insane how many implementations they have depending on the characteristics of the data.

    • @kRySt4LGaMeR
      @kRySt4LGaMeR Před 6 měsíci +2

      you'd be surprised how the C++ stdlib looks like :P

    • @reubendeleon-ji6up
      @reubendeleon-ji6up Před 6 měsíci +6

      You need this because of how intrinsics work. Your cpu probably supports 128 and 256, maybe even 512. This means you need that many values preloaded into a vector (think span, but allocated for the cpu). The cpu then has a special instruction (single instruction) that will perform an operation on all 128-512 values in one tick. This is actually the craziest part. The cpu has to physically lay out each unit in logic gates somewhere physically in the cpu. So that logic (other than the optimizations for 1, 2 and 3) are just logic to create sets of those vectors efficiently. Most of intrinsic support got added in .net 6, and really utilized in .net 7. They’re just finding every last nook to stuff it in with 8.

  • @eramires
    @eramires Před 6 měsíci

    Hey Nick, any tips on "warm up" services after configuring them in DI ? I have a thread that needs to keep running on my singleton, but it is only started when a new instance of the service is created, which happens when I access a method in the controller, but I want it to happen on the API initialization process. 🤔 I saw some solutions on Stackoverflow, but I found them dirty and too coupled. Help, please. :D Also my project is .net 7 🙂

  • @cheebadigga4092
    @cheebadigga4092 Před 6 měsíci

    Thank you!! This is awesome!

  • @georgebelletty7861
    @georgebelletty7861 Před 6 měsíci

    Very cool

  • @klaxxon__
    @klaxxon__ Před 6 měsíci +5

    Do Regexes used this internally? I would expect at least a compiled regex would either use this internally, or build up something internally when checking for character ranges. Using a Regex would have been my first instinct when performing this check (not knowing about SearchValues before watching this video)

  • @erikgook598
    @erikgook598 Před 6 měsíci +8

    isn't those characters in a continuous range in ascii? You could just test for validity by casting them to ints and then do arithmetic instead of looping through an array on each character in the input.

    • @mk72v2oq
      @mk72v2oq Před 6 měsíci +7

      You don't even need to cast. "if (c >= 'A' && c

    • @erikgook598
      @erikgook598 Před 6 měsíci

      But perhaps it's more complicated if the text is not utf8 @@mk72v2oq

    • @mk72v2oq
      @mk72v2oq Před 6 měsíci +3

      @@erikgook598 it can't be UTF-8 in the first place. Dotnet strings are UTF-16 and 'char' size is fixed 2 bytes. Considering that in strings you enumerate over whole characters, not over raw bytes, this construct always works.

    • @ChyK24
      @ChyK24 Před 5 měsíci

      Actually the `SearchValues.Create` method takes this into account. Try to call `SearchValues.Create("ABCDEFGHIJKLM");` or `SearchValues.Create(@"UVWXYZabcdef[\]^_`abcd");` and see in debugger that the actual class that is instantiated is called "RangeCharSearchValues" while for `SearchValues.Create("abcABC");` it is `AsciCharSearchValues`.
      I guess the execution time of `SearchValues.Create` can be quite substantial. But this method is really meant to be called once to save the result in static field.

    • @tmnt9001
      @tmnt9001 Před 4 měsíci

      ​@@mk72v2oq actually it would still work but not for the reason you said. you can have characters that are bigger than 16 bits, which means they wouldn't fit in a single char. But that doesn't matter because all base64 characters fit in a single char and the comparison would fail in the expected situation (but this also means that it would work with 8 bit characters).

  • @williamliu8985
    @williamliu8985 Před 6 měsíci +16

    IMO, much better feature than primary constructor in net8...

    • @lordmetzgermeister
      @lordmetzgermeister Před 6 měsíci +9

      There are no primary constructors in .NET 8, that's a C# 12 feature. Note the difference between those two.

    • @TheMonk72
      @TheMonk72 Před 6 měsíci

      ​@@lordmetzgermeisterC#12 depends on and was released with dotnet 8. When you start a new project in dotnet 8 you're using C#12 by default. So they're basically the same thing as far as most people are concerned.

  • @oddikaro8236
    @oddikaro8236 Před 6 měsíci +1

    I guess the new search algorithm was part of a Microsoft interview for a junior position. Interesting way to improve .NET.

  • @tetheredsun950
    @tetheredsun950 Před 5 měsíci

    Am I missing something or is it just implemented for char and byte? It would be lovely to have it for ints.

  • @DucaTech
    @DucaTech Před 6 měsíci +1

    can we compare this to a regex?

  • @JohnTasler
    @JohnTasler Před 6 měsíci

    I would also like to see this benchmarked using AOT. Most of your performance videos, actually.

  • @fernandoluiz4593
    @fernandoluiz4593 Před 6 měsíci

    It would be nice if you also compared with regex match

  • @saniel2748
    @saniel2748 Před 6 měsíci +2

    Love seeing such a high level language taking performance probably more seriously than any of its competitors
    To me last thing to hope for would be *some* manual memory management, like Go's arena allocators which would be unsafe but also very useful for applications such as games
    Like having per frame arena that gets wiped for temporary data or arena for spawning a bunch of objects really quickly and then destroying all at once

  • @uncommonbg
    @uncommonbg Před 6 měsíci +3

    Honestly, I like this more because of the name they chose than the performance itself. The performance might come in handy, but more importantly I believe the new type will make the code read better.

  • @Agge_24
    @Agge_24 Před 6 měsíci

    Γειαα σου είμαι η μαθητρια τις μαμάς σου ❤❤❤

  • @ChaimMosheStern
    @ChaimMosheStern Před 6 měsíci

    I'm surprised you didn't try hash set of strings

  • @martinprohn2433
    @martinprohn2433 Před 6 měsíci

    I really wonder, how a realistic approach would have performed. I mean, why would you iterate over 64 chars (having 64 comparisons), when a simple expression like "ch >= 'a' && ch

    • @_mihazupan
      @_mihazupan Před 6 měsíci

      SearchValues can process multiple characters at once. You can try to benchmark your idea and you'll see it's still much slower when looking through a longer span.

  • @KPAMCATEJlb
    @KPAMCATEJlb Před 6 měsíci +4

    XcQ... so familiar symbols :D

  • @winterboltgames
    @winterboltgames Před 6 měsíci

    Are search values similar to database indices?

    • @GumbootMan
      @GumbootMan Před 6 měsíci +1

      They're similar in that they are both trying to optimise the problem "Do values A, B, C exist in data set D?". But SearchValues optimises A, B, C while a database index optimises data set D. The closest thing to a database index in .NET is probably SortedSet.

    • @winterboltgames
      @winterboltgames Před 6 měsíci

      @GumbootMan makes sense. Thanks for the explanation!

  • @kevinbro518
    @kevinbro518 Před 6 měsíci +3

    Is it faster than a source generated regex?

    • @nickchapsas
      @nickchapsas  Před 6 měsíci +3

      Yeap

    • @Jellow2202
      @Jellow2202 Před 6 měsíci +1

      @@nickchapsas actually the regex source code generator uses SearchValues and if you have larger input data, the overhead of the regex stack gets less noticeable and their performance converge. I think regex is a good middle ground, with its additional flexibility if you expect larger inputs.
      Using your benchmark code I added a generated regex using "^[A-Za-z0-9+/]*={0,3}$" which also only allows up to 3 equals at the end, and tested with a 5k character string:
      IsBase64_SearchValues: 124 ns
      IsBase64_CharArray: 179 ns
      IsBase64_Naive: 51,231 ns
      IsBase64_RegexGenerated: 147 ns

    • @carldaniel6510
      @carldaniel6510 Před 6 měsíci +1

      In .NET 8 source generated Regex uses SearchValues automatically where it's beneficial!

  • @CharlesBurnsPrime
    @CharlesBurnsPrime Před 6 měsíci

    I can scarcely believe the performance difference.

  • @kondziossj3
    @kondziossj3 Před 6 měsíci +4

    I would like to have a project where I would need to care about such optimizations
    ehh... Maybe... someday

  • @ardavaztterterian1211
    @ardavaztterterian1211 Před 6 měsíci +10

    So, why don't you mention what kind of data structure is used behind the scenes?

    • @fusedqyou
      @fusedqyou Před 6 měsíci

      Not the point of the video

    • @metaltyphoon
      @metaltyphoon Před 6 měsíci

      Seriously… it’s probably the more interesting parts

    • @stefano_schmidt
      @stefano_schmidt Před 6 měsíci

      what kind of data structure is used behind the scenes?

    • @oddikaro8236
      @oddikaro8236 Před 6 měsíci

      The internal algo probably uses several algorithms for different scenarios and all of them probably come from state-of-art papers. So, yeah, Nick is going to read 10 papers for a 10min video.

    • @ardavaztterterian1211
      @ardavaztterterian1211 Před 6 měsíci

      So what's the point of the video? Just that there is a new type which is fast? You would already know that there is a new type if you are following the release drafts.@@fusedqyou

  • @ciorchinos
    @ciorchinos Před 6 měsíci +3

    this can be used for a web crawler to collect data for an AI learning algo. would be interesting to showcase a ML algo that learns something form a given document and cross reference to some online sites.

  • @antonmartyniuk
    @antonmartyniuk Před 6 měsíci

    I knew spans have something to do with this 😅

  • @NoName-1337
    @NoName-1337 Před 6 měsíci

    I think they use a Set in the background.

  • @cheezyskipper
    @cheezyskipper Před 6 měsíci +7

    He rick rolled us.

  • @neilbroomfield3080
    @neilbroomfield3080 Před 6 měsíci +3

    Shame you didn't compare to a HashSet 🤷‍♂️

  • @wlbmonizuka
    @wlbmonizuka Před 6 měsíci

    WTF why naive realisation is not a HashSet ??

  • @Baby4Ghost
    @Baby4Ghost Před 6 měsíci

    Audio synch issues?

    • @nickchapsas
      @nickchapsas  Před 6 měsíci +5

      None that I can spot

    • @Baby4Ghost
      @Baby4Ghost Před 6 měsíci +1

      @@nickchapsas might be local on my pc, LOVE the vid though! Will be using this when appropriate. That difference is HUGE!

  • @KingOfBlades27
    @KingOfBlades27 Před 6 měsíci

    I guess using this is more efficient than using regex when doing these kinds of comparisons? 🤔

    • @nickchapsas
      @nickchapsas  Před 6 měsíci +1

      Oh yeah regex is not even close

    • @KingOfBlades27
      @KingOfBlades27 Před 6 měsíci

      @@nickchapsas Damn. Good to know. Thanks for answering! 💪

    • @davidmartensson273
      @davidmartensson273 Před 6 měsíci +1

      @@KingOfBlades27 Even if, as mentioned in an other comment, regex will use this technique when applicable, since regex supports so many other things it will add overhead compared to the raw implementation and that is very often the case, high performance code is very often not elegant or easy to read so whenever MS can hide away some of that it helps everyone else build better code that is both fast and readable :)

  • @eejdownes
    @eejdownes Před 6 měsíci

    I hate that my brain can recognize the first example string as Base64 in O(1) time from internet survival optimizations

    • @davidmartensson273
      @davidmartensson273 Před 6 měsíci +1

      The brain is extremely good at pattern matching in parallel, its one of its main features :)

  • @K1llRay64
    @K1llRay64 Před 3 měsíci

    don't be junior, explain how it works

  • @TioPew
    @TioPew Před 6 měsíci

    It looks a example made by ChatGPT transcribed into a video.

  • @XmasApple
    @XmasApple Před 6 měsíci

    Lmao I was Rick rolled

  • @qwerty01453
    @qwerty01453 Před 6 měsíci

    Nick your videos are great but please speed them up! It took 5 minutes to get the interesting point! It’s too much!

  • @Kingside88
    @Kingside88 Před 6 měsíci +2

    Maybe useful for Libraries but not for daily business

    • @RakodilGena
      @RakodilGena Před 6 měsíci +1

      Useful in any case when you have some list of chars that should not appear in username or anywhere else. Build this once, check faster always

    • @davidmartensson273
      @davidmartensson273 Před 6 měsíci +1

      As mentioned, MS uses it within their methods, so even if you do not actively use it you can still benefit from is, like faster regex or HTTP methods, or serialization, deserialization, encryption and many many more examples.

    • @Kingside88
      @Kingside88 Před 6 měsíci +1

      @@davidmartensson273 I didn't think about that, but of course you're right

  • @wboumans
    @wboumans Před 6 měsíci +2

    another feature that 'simple' devs wont ever use. The compiler needs to be smart and transform stuff to this.

    • @krccmsitp2884
      @krccmsitp2884 Před 6 měsíci +2

      What is a "simple dev" anyways? I can imagine several uses cases for two of my current projects.

    • @Dhaiky
      @Dhaiky Před 6 měsíci

      @@krccmsitp2884 Can you tell me of some of these use cases? I'm just curious where this would be used for without being an replacement for some Regex

    • @davidmartensson273
      @davidmartensson273 Před 6 měsíci +3

      Since MS is adding this under the hood in .NET every dev will benefit to some degree even if they do not use it them self.
      And also the compiler already makes many rewrites of your code for better performance as has been shown using tools that can display the "lowered" code.
      Often convenient methods gets changed into more efficient before it really starts compiling to simplify the compiler and make it easier to reuse highly optimized solutions instead of re-implementing almost the same algorithm slightly different. "while" is often changed into a for loop, foreach can also be changed into a for loop or even to use if statement and goto ;)
      Its all about if its a common type of expression that warrants the extra time, and since many developers use examples from the web like from Stack overflow or other, some patterns will be very common and MS can then try to find such and rewrite them for better performance.

    • @TheMonk72
      @TheMonk72 Před 6 měsíci

      ​@@davidmartensson273opposite is true: for is implemented as while, decompilers just recognise for-like constructs.

    • @davidmartensson273
      @davidmartensson273 Před 6 měsíci

      @@TheMonk72 yes your right, I mixed them up :). But the concept still holds true :)

  • @JackBauerDev
    @JackBauerDev Před 6 měsíci

    You cant get paid to write that... Just spend free time contributing to the runtime... Did you see this in the allow whitespace PR? I wish runtime was hiring...

  • @kumailn7662
    @kumailn7662 Před 6 měsíci

    Sorry your courses are way more expensive even after blackfriday discount.

  • @AndreViannaRJ
    @AndreViannaRJ Před 6 měsíci +28

    So... Microsoft finally discovered binary search.

    • @asedtf
      @asedtf Před 6 měsíci +2

      You jest, but have a look what switch actually compiles down to

    • @der.Schtefan
      @der.Schtefan Před 6 měsíci +32

      Its not binary search, it is SIMD, SS3, packed instructions. Before writing, you should ED-UC-ATE-YOUR-SELF.

    • @yurkoflisk
      @yurkoflisk Před 6 měsíci +9

      Binary search is irrelevant here, what it boils down to in most cases is basically boolean table lookup in one form or another, e.g. maybe the table itself is shoved into some YMM register (it's exactly 32 bytes, so can hold 256-bit mask) if possible, though the easiest way (and what I would do manually) is to just create 256-element boolean array.

    • @Kavisexe
      @Kavisexe Před 6 měsíci

      😂 Max Microsoft Hater;

  • @youssefwael8
    @youssefwael8 Před 6 měsíci

    First

  • @Agge_24
    @Agge_24 Před 6 měsíci

    Γειαα σου είμαι η μαθητρια τις μαμάς σου ❤❤❤