The New Best Way To Search for Values in .NET 8
Vložit
- čas přidán 23. 11. 2023
- Use code BLACKFRIDAY23 and get 40% off any course and 20% off any bundle on Dometrain: dometrain.com/courses?coupon_...
Get the source code: mailchi.mp/dometrain/vi8f453djn
Become a Patreon and get special perks: / nickchapsas
Hello everybody, I'm Nick, and in this video, I will introduce you to a new type added in .NET 8 called SearchValues which makes searching for value matches extremely efficient and fast. In fact it is the fastest way to do this in .NET.
Workshops: bit.ly/nickworkshops
Don't forget to comment, like and subscribe :)
Social Media:
Follow me on GitHub: github.com/Elfocrash
Follow me on Twitter: / nickchapsas
Connect on LinkedIn: / nick-chapsas
Keep coding merch: keepcoding.shop
#csharp #dotnet #dotnetconf
I'm adding the quote "It's small but extremely efficient" to my list of excuses in the bed.
you killed it with this one :-D
Does "efficient" mean it finishes very fast?
69
Performance comparison with a naive HashSet approach would also be interesting.
Yes. That's what I was thinking
I did another benchmark with hashset (check foreach character in ExampleText whether it is in the hashset, return on first mismatch). And it was about the same speed as the ExampleText.AsSpan().IndexOfAnyExcept(Base64CharsArr) == -1; .. So SearchValues still insanely faster
Also a solution using LINQ should be joined into the mix.
@@Assgier If you deeply care about performance, you should not use LINQ.
@@Assgier it would be less efficient than the simple loop sample
It would be helpful if, when you share cool tips like this, you included a handful of practical use cases. I watch your videos and about half the time I leave thinking, "That's cool, but when would I use this?"
But in this video he actually did? He presented how microsoft is using it. Which in my opinion is even more valuable then imaginary examples :D
I have the same feeling everytime i watch his videos of performance 😅
It just means you have no use for it at the moment. But in my case I can see how it could massively improve a project at work.
What I find so crazy is that SearchValues has so many cases it handles, with so many classes, jumps, method calls, many years ago this would have been slow on .NET. But nowadays all of them are so aggressively inlined, and so JIT optimized, they use SSE3, packed instructions, SIMD, etc. that they still complete in an average of 6 CPU clock cycles (which of course is not true. it is only like this in the benchmark case, but due to hyperthreading and pipelining, speculative execution, etc. the throughput delay will be probably around 20 clock cycles).
By integrating extended processor instructions that almost no one knows about behind the scene anyone gets the benefits that those instructions provide
The performance improvements and tools they are adding in newer .NET versions are top-notch
Thanks for bringing this golden nugget to people's attention.
I love the example string you used for the valid base64 string in the benchmarks lol
My God! This has applications everywhere!
Missed a chance to compare it with a compiled regex, would like to see the comparison there.
Regex and performance.... it will be under the bottom
@@J_i_m_ Regexes can now be source generated into a function, even the Regex class has an option to compile the regex into a function at runtime, so it's never interpreted -> should be pretty fast
Regex will use SearchValues internally now where it makes sense. For such a simple case you'll just pay a little bit of overhead to call through the Regex logic instead
I think the "hat" char is called a "Carat" :)
Lol I thought it's 'carrot'?
Caret
in French, it is called Circonflex accent
its called "Circumflex". Caret is a old name for it. en.wikipedia.org/wiki/Caret_(proofreading)
Your benchmarks never gonna give us up.
I'd like to see the benchmarks for 1k, 10k, 100k, 1Mb size strings. I wonder if it scales.
You just can download the source code and try it ;)
I do a lot of searches within List. Is there a similar approach we can use with List?
SearchValues only supports byte and char with .NET 8, and string with .NET 9. Getting the Span of your List and you can use SearchValues if your T is any of those 3 types
If it's ordered, List provides a BinarySearch function with complexity O(log N), else there is only a looping search with an average complexity O(N). If your searches on the list only requires a specific property, consider using a Dictionary instead, accessing an Element by its key has an insanely fast complexity of O(1)
When I need to save a few nano seconds, I will definitely use this.
I downloaded the sample (asking for email is kind of... ehh... but w/e) to do some comparisons to HashSet implementation (I see in the comments I'm not the only one who instantly thought of that). Rider is complaining (red squiggly error) about the initializer syntax on the character array, but dotnet builds it without errors. I have the latest .NET SDK. Are you using an EAP version of Rider maybe? I've been holding off hoping the new stuff makes it into the mainline soon.
I'd like to see a benchmark of doing this with a regex as that is how I would have likely implemented that method.
Maybe this story is even going to be opposite. It would not surprise me if they are using the SearchValues type inside the dotnet Regex engine as well.
10:12 It's actually insane how many implementations they have depending on the characteristics of the data.
you'd be surprised how the C++ stdlib looks like :P
You need this because of how intrinsics work. Your cpu probably supports 128 and 256, maybe even 512. This means you need that many values preloaded into a vector (think span, but allocated for the cpu). The cpu then has a special instruction (single instruction) that will perform an operation on all 128-512 values in one tick. This is actually the craziest part. The cpu has to physically lay out each unit in logic gates somewhere physically in the cpu. So that logic (other than the optimizations for 1, 2 and 3) are just logic to create sets of those vectors efficiently. Most of intrinsic support got added in .net 6, and really utilized in .net 7. They’re just finding every last nook to stuff it in with 8.
Hey Nick, any tips on "warm up" services after configuring them in DI ? I have a thread that needs to keep running on my singleton, but it is only started when a new instance of the service is created, which happens when I access a method in the controller, but I want it to happen on the API initialization process. 🤔 I saw some solutions on Stackoverflow, but I found them dirty and too coupled. Help, please. :D Also my project is .net 7 🙂
Thank you!! This is awesome!
Very cool
Do Regexes used this internally? I would expect at least a compiled regex would either use this internally, or build up something internally when checking for character ranges. Using a Regex would have been my first instinct when performing this check (not knowing about SearchValues before watching this video)
Yes
isn't those characters in a continuous range in ascii? You could just test for validity by casting them to ints and then do arithmetic instead of looping through an array on each character in the input.
You don't even need to cast. "if (c >= 'A' && c
But perhaps it's more complicated if the text is not utf8 @@mk72v2oq
@@erikgook598 it can't be UTF-8 in the first place. Dotnet strings are UTF-16 and 'char' size is fixed 2 bytes. Considering that in strings you enumerate over whole characters, not over raw bytes, this construct always works.
Actually the `SearchValues.Create` method takes this into account. Try to call `SearchValues.Create("ABCDEFGHIJKLM");` or `SearchValues.Create(@"UVWXYZabcdef[\]^_`abcd");` and see in debugger that the actual class that is instantiated is called "RangeCharSearchValues" while for `SearchValues.Create("abcABC");` it is `AsciCharSearchValues`.
I guess the execution time of `SearchValues.Create` can be quite substantial. But this method is really meant to be called once to save the result in static field.
@@mk72v2oq actually it would still work but not for the reason you said. you can have characters that are bigger than 16 bits, which means they wouldn't fit in a single char. But that doesn't matter because all base64 characters fit in a single char and the comparison would fail in the expected situation (but this also means that it would work with 8 bit characters).
IMO, much better feature than primary constructor in net8...
There are no primary constructors in .NET 8, that's a C# 12 feature. Note the difference between those two.
@@lordmetzgermeisterC#12 depends on and was released with dotnet 8. When you start a new project in dotnet 8 you're using C#12 by default. So they're basically the same thing as far as most people are concerned.
I guess the new search algorithm was part of a Microsoft interview for a junior position. Interesting way to improve .NET.
Am I missing something or is it just implemented for char and byte? It would be lovely to have it for ints.
can we compare this to a regex?
I would also like to see this benchmarked using AOT. Most of your performance videos, actually.
It would be nice if you also compared with regex match
Love seeing such a high level language taking performance probably more seriously than any of its competitors
To me last thing to hope for would be *some* manual memory management, like Go's arena allocators which would be unsafe but also very useful for applications such as games
Like having per frame arena that gets wiped for temporary data or arena for spawning a bunch of objects really quickly and then destroying all at once
Honestly, I like this more because of the name they chose than the performance itself. The performance might come in handy, but more importantly I believe the new type will make the code read better.
Γειαα σου είμαι η μαθητρια τις μαμάς σου ❤❤❤
I'm surprised you didn't try hash set of strings
I really wonder, how a realistic approach would have performed. I mean, why would you iterate over 64 chars (having 64 comparisons), when a simple expression like "ch >= 'a' && ch
SearchValues can process multiple characters at once. You can try to benchmark your idea and you'll see it's still much slower when looking through a longer span.
XcQ... so familiar symbols :D
zdarova bratan
Are search values similar to database indices?
They're similar in that they are both trying to optimise the problem "Do values A, B, C exist in data set D?". But SearchValues optimises A, B, C while a database index optimises data set D. The closest thing to a database index in .NET is probably SortedSet.
@GumbootMan makes sense. Thanks for the explanation!
Is it faster than a source generated regex?
Yeap
@@nickchapsas actually the regex source code generator uses SearchValues and if you have larger input data, the overhead of the regex stack gets less noticeable and their performance converge. I think regex is a good middle ground, with its additional flexibility if you expect larger inputs.
Using your benchmark code I added a generated regex using "^[A-Za-z0-9+/]*={0,3}$" which also only allows up to 3 equals at the end, and tested with a 5k character string:
IsBase64_SearchValues: 124 ns
IsBase64_CharArray: 179 ns
IsBase64_Naive: 51,231 ns
IsBase64_RegexGenerated: 147 ns
In .NET 8 source generated Regex uses SearchValues automatically where it's beneficial!
I can scarcely believe the performance difference.
I would like to have a project where I would need to care about such optimizations
ehh... Maybe... someday
So, why don't you mention what kind of data structure is used behind the scenes?
Not the point of the video
Seriously… it’s probably the more interesting parts
what kind of data structure is used behind the scenes?
The internal algo probably uses several algorithms for different scenarios and all of them probably come from state-of-art papers. So, yeah, Nick is going to read 10 papers for a 10min video.
So what's the point of the video? Just that there is a new type which is fast? You would already know that there is a new type if you are following the release drafts.@@fusedqyou
this can be used for a web crawler to collect data for an AI learning algo. would be interesting to showcase a ML algo that learns something form a given document and cross reference to some online sites.
I knew spans have something to do with this 😅
I think they use a Set in the background.
He rick rolled us.
Shame you didn't compare to a HashSet 🤷♂️
WTF why naive realisation is not a HashSet ??
Audio synch issues?
None that I can spot
@@nickchapsas might be local on my pc, LOVE the vid though! Will be using this when appropriate. That difference is HUGE!
I guess using this is more efficient than using regex when doing these kinds of comparisons? 🤔
Oh yeah regex is not even close
@@nickchapsas Damn. Good to know. Thanks for answering! 💪
@@KingOfBlades27 Even if, as mentioned in an other comment, regex will use this technique when applicable, since regex supports so many other things it will add overhead compared to the raw implementation and that is very often the case, high performance code is very often not elegant or easy to read so whenever MS can hide away some of that it helps everyone else build better code that is both fast and readable :)
I hate that my brain can recognize the first example string as Base64 in O(1) time from internet survival optimizations
The brain is extremely good at pattern matching in parallel, its one of its main features :)
don't be junior, explain how it works
It looks a example made by ChatGPT transcribed into a video.
Lmao I was Rick rolled
Nick your videos are great but please speed them up! It took 5 minutes to get the interesting point! It’s too much!
Maybe useful for Libraries but not for daily business
Useful in any case when you have some list of chars that should not appear in username or anywhere else. Build this once, check faster always
As mentioned, MS uses it within their methods, so even if you do not actively use it you can still benefit from is, like faster regex or HTTP methods, or serialization, deserialization, encryption and many many more examples.
@@davidmartensson273 I didn't think about that, but of course you're right
another feature that 'simple' devs wont ever use. The compiler needs to be smart and transform stuff to this.
What is a "simple dev" anyways? I can imagine several uses cases for two of my current projects.
@@krccmsitp2884 Can you tell me of some of these use cases? I'm just curious where this would be used for without being an replacement for some Regex
Since MS is adding this under the hood in .NET every dev will benefit to some degree even if they do not use it them self.
And also the compiler already makes many rewrites of your code for better performance as has been shown using tools that can display the "lowered" code.
Often convenient methods gets changed into more efficient before it really starts compiling to simplify the compiler and make it easier to reuse highly optimized solutions instead of re-implementing almost the same algorithm slightly different. "while" is often changed into a for loop, foreach can also be changed into a for loop or even to use if statement and goto ;)
Its all about if its a common type of expression that warrants the extra time, and since many developers use examples from the web like from Stack overflow or other, some patterns will be very common and MS can then try to find such and rewrite them for better performance.
@@davidmartensson273opposite is true: for is implemented as while, decompilers just recognise for-like constructs.
@@TheMonk72 yes your right, I mixed them up :). But the concept still holds true :)
You cant get paid to write that... Just spend free time contributing to the runtime... Did you see this in the allow whitespace PR? I wish runtime was hiring...
Sorry your courses are way more expensive even after blackfriday discount.
So... Microsoft finally discovered binary search.
You jest, but have a look what switch actually compiles down to
Its not binary search, it is SIMD, SS3, packed instructions. Before writing, you should ED-UC-ATE-YOUR-SELF.
Binary search is irrelevant here, what it boils down to in most cases is basically boolean table lookup in one form or another, e.g. maybe the table itself is shoved into some YMM register (it's exactly 32 bytes, so can hold 256-bit mask) if possible, though the easiest way (and what I would do manually) is to just create 256-element boolean array.
😂 Max Microsoft Hater;
First
Γειαα σου είμαι η μαθητρια τις μαμάς σου ❤❤❤