Stop using LINQ to order your primitive collections in C#

Sdílet
Vložit
  • čas přidán 7. 09. 2022
  • Check out my courses: dometrain.com
    Become a Patreon and get source code access: / nickchapsas
    Hello everybody I'm Nick and in this video I will show you a couple of new LINQ methods introduced in .NET 7 that you probably shouldn't be using. There are some even faster and more memory efficient alternatives that you can use instead, depending on your usecase.
    Don't forget to comment, like and subscribe :)
    Social Media:
    Follow me on GitHub: bit.ly/ChapsasGitHub
    Follow me on Twitter: bit.ly/ChapsasTwitter
    Connect on LinkedIn: bit.ly/ChapsasLinkedIn
    Keep coding merch: keepcoding.shop
    #csharp #dotnet

Komentáře • 105

  • @maksymkyian4920
    @maksymkyian4920 Před rokem +131

    Another important thing to mention is that Array.Sort() and List.Sort() perform unstable sort, so they not preserve the order of elements that have the same key, while LINQ .OrderBy performs stable sort, so never changes places of the equal keys. This could be useful (and even required) when you work with collections of non-primitive types (and this is probably the reason why OrderBy is using slower sorting algorithms).

    • @rafazieba9982
      @rafazieba9982 Před rokem +4

      This is the first thing that came to mind when I watched this video and the reason why i always use LINQ to sort lists.

    • @rusektor
      @rusektor Před rokem

      Thanks for clarification! 😎

    • @bilbobaggins8953
      @bilbobaggins8953 Před rokem

      I actually didn't know this. What happens if you have elements with equal keys?

    • @rafazieba9982
      @rafazieba9982 Před rokem

      @@bilbobaggins8953 With stable sort algorithms you have guarantee that their order will not change. With unstable you never know.

    • @IldarIsm
      @IldarIsm Před rokem

      They use different sort algorithms

  • @KoScosss
    @KoScosss Před rokem +29

    Glad you sorted it out

  • @kazepis
    @kazepis Před rokem +3

    Great video Nick! I was really hoping that you would run the benchmarks with the Params attribute to see what happens with 1_000, 10_000 and 1_000_000 sized collections. I would also really like to see the results from the sort using spans method. Really great video though as always! Keep them coming…

  • @tymekmajewski4718
    @tymekmajewski4718 Před rokem +4

    I think it's worth pointing out that `random.Next()` is a significant factor in the order(by) test cases and it *dominates* the sort test case.
    Without the hundred calls to `random.Next` the Sort() benchmark will be several times quicker for both the list and the array.

  • @jongeduard
    @jongeduard Před rokem +2

    I have been testing around a lot with those OrderBy and Sort methods (both Array and List indeed) in the past, and I also discovered that smart self-choosing sorting algorithm stuff on especially those Sort methods indeed. And if I am right a native code version of those algorithms is chosen as well when possibile.
    I noticed in certain situations, you're indeed better off with the Sort methods than with linq OrderBy, but in real life these cases are very limited, because very often there is the more important decision to make is at which moment you actually want to store your data in a memory buffer (like array / list) anyway, or if just enumerating once is enough.
    Personally I remember that I noticed this with file / directory enumeration if I remember it right.
    Choosing your sorting method is one problem to solve, but an actual program has a lot more factors, with filtering and buffering as well.

  • @SubVengeance
    @SubVengeance Před rokem +2

    Theos! You are my defacto no.1 resource for C# on CZcams. Enjoying the flow and straight to the point vids. I tip my hat for you sir :)

  • @pavfrang
    @pavfrang Před rokem

    Way to go Nick, thanks for the useful video!

  • @KnightSwordAG
    @KnightSwordAG Před 10 měsíci

    If you wanted to do a descending order, I believe you can chain a Reverse() to the List as well. It may be slightly slower, but given the improvements overall, it may be worth it.

  • @julkiewicz
    @julkiewicz Před rokem +2

    I believe that a static lambda like x => x with no closure will not allocate anything. It's compiled as a static method in the using class scope. It can be seen by decompiling the output managed DLL. Only delegates and lambdas that actually capture some state will be compiled to a separate class that then needs to be instantiated (which causes the allocation). No closure delegates will have their object set to null.

  • @parkercrofts6210
    @parkercrofts6210 Před rokem

    Still working as of today, ty!

  • @vamovdotnet
    @vamovdotnet Před rokem +6

    Hey Nick! I have suggestion for your next video! Since performance is your thing, why not talking about ObjectPool?
    Kind regards,
    Cristiano Dias

  • @flybyw
    @flybyw Před rokem

    Shouldn't you re-seed the Random for each Enumerable for the same values, especially with int.ToString()?

  • @tarekel-shahawy891
    @tarekel-shahawy891 Před rokem

    What aout sorting objects using a property inside them, I think that's the most important use case.
    Can we use List.Sort with IComparison In that case?

  • @RobinHood70
    @RobinHood70 Před rokem

    Behind the scenes, List just uses Array.Sort() on its internal array, so it's not surprising that they compare pretty much the same.

  • @HAMYLABS
    @HAMYLABS Před rokem +8

    In most cases, I'd say the readability of Linq sorts is more useful than the potential performance upsides of using in-place sorts. Would probably recommend starting with that then moving to in-place if performance becomes a bottleneck.

  • @namewastaken360
    @namewastaken360 Před rokem

    It seems like it would be a simple optimisation to have order call the underlying sort functions. I guess the extension method is on the interface, but it seems like it would be worth it to check the type of the collection.

    • @doorhammer
      @doorhammer Před rokem +1

      This crossed my mind as well and you can write an extension method that does just that. A problem you run into, though, is that the extension method would then mutate the underlying array, which would break the expectation that LINQ methods are pure

  • @Spirch
    @Spirch Před rokem +2

    (still watching) at 7:45 should you the random in the method too? each method wont reset the seed which mean the benchmark are ran with different set of numbers

    • @nickchapsas
      @nickchapsas  Před rokem +1

      It doesn't really matter as long as they are not sequential

  • @rely1020
    @rely1020 Před rokem

    thank you

  • @welltypedwitch
    @welltypedwitch Před rokem +11

    2:45 Wait, why does the lambda need to allocate? It doesn't capture any variables, so it seems like the compiler should be able to allocate it statically, right?

    • @nickchapsas
      @nickchapsas  Před rokem +10

      Nop. Delegates are referece types so they will always be allocated. To prevent that you can mark them as static or extract them into a static field/property and use that

    • @petrusion2827
      @petrusion2827 Před rokem +22

      @@nickchapsas I think you're wrong about this.
      - Non capturing lambdas are automatically cached in compiler generated static fields even without the static keyword (sharplab link at the end of this comment as proof).
      - The difference between allocations of OrderBy(x => x) and Order() in your tests at 5:20 is about 0.42 kB, that capture-less lambda can't possibly take up that much memory.
      - If you Benchmark OrderBy(x => x) against OrderBy(...) with a static cached delegate the memory allocation is equivalent.
      - My educated guess as to what is actually happening is that the IOrderedEnumerable, which is actually OrderedEnumerable (which implements IIListProvider.ToArray(), which Enumerable.ToArray() calls) is being smart and checking if it has been given an Identity Func (probably by comparing the reference to EnumerableSorter.IdentityFunc) and thus is able to optimize the sorting, as opposed to getting a Func that could be anything.
      Sharplab link: sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA+ABATARgLABQGADAAQY4oDchJ5OAdADICWAdgI40G0DM5WUgGFShAN6FSU8v3YAXANoBdUgGVocmABMAFBV4AeeQD5SMNgFcAtjCgBDYABsYASlEFppCR8/SMAdjNLG3snGAYAeSgtWwAhAE8dBFIAXlMEFwYAFQgAQSh7RJduTwBfQlKgA

  • @davidmaldonado5098
    @davidmaldonado5098 Před rokem

    This is the best free software Ive seen. Respect.

  • @Gallardo994
    @Gallardo994 Před rokem +5

    This comment is a friendly reminder for Nick to update Rider.

  • @NakaT88
    @NakaT88 Před 8 měsíci

    wich is your Visual Studio Theme nick?

  • @kyryllvlasiuk
    @kyryllvlasiuk Před rokem

    Do linq methods perform sort immediately, or can it be deferred or halted. Like say we order and take 100 out of a million, will the performance be the same as we array sort said million. I would benchmark it later, now I am from phone. But if someone beats me to it, you are welcome.

  • @cn-ml
    @cn-ml Před 9 měsíci

    You probably shoulnt initialize the list in the benchmark method. Also, for the seed to have an effect, shoulnt you rather set the seed and init the array in an IterationSetup method? That way every benchmark iteration operates on the same array?

  • @guiorgy
    @guiorgy Před rokem +1

    If you don't want to mutate your initial collection, how about cloning and then mutating the clone? Would that change the results in any meaningfull way?

    • @nickchapsas
      @nickchapsas  Před rokem

      It doesn't scale. You will always need to re-allocate the full collection which will get very costly

  • @jadenrogers3133
    @jadenrogers3133 Před rokem

    How are you on RC1? Site still says SDK 7.0.100-preview.7

  • @Kundrtcz
    @Kundrtcz Před rokem +1

    Order/OrderBy advantages:
    + purity (i.e. doesn't change underlying collection)
    + simplicity
    + readability
    + same for primitive/complex types
    + extension method of IEnumerable => you don't have to have a collection before and ToArray/ToList can be way down the chain of other methods
    + expression tree (allows for other optimizations etc.)
    Sort advantages:
    + faster (not always)
    + more memory efficient
    So it depends. I believe Order/OrderBy is better choice in most of the cases. Title of the video is a bit misleading.

    • @nickchapsas
      @nickchapsas  Před rokem +4

      For all cases the non-LINQ methods are objectively more perfromant. Even when they are slower for strings, the memory allocation offsets the speed in GC time so they are still faster. It's why LINQ is a huge no for any high performance software

    • @ferranis104
      @ferranis104 Před rokem

      @@nickchapsas Honest question, since I am not well versed in C#: Is high performance software usually written in C#?

    • @nickchapsas
      @nickchapsas  Před rokem +2

      @@ferranis104 Yes

    • @philosophy5561
      @philosophy5561 Před rokem

      If software really need high performance, C# is not right language. Some system level language would be better choice e.g Rust, c++, C , GO for example

  • @jdlessl
    @jdlessl Před rokem +6

    I got sick of writing IComparers every time I wanted to Sort a List, so I created a generic IComparer class that replicated LINQ's OrderBy/ThenBy/Descending chaining ability.
    //List list;
    list.Sort(new OrderBy(x => x.IntField).ThenByDescending(x => x.Otherfield);

  • @horwoodg
    @horwoodg Před rokem

    This might sound like a stupid question, but can any body tell me what IDE Nick is using? I'm currently using Visual Studio 2019. I like using it but Nick's seems more responsive than mine.

    • @horwoodg
      @horwoodg Před rokem

      @@ilya-zhidkov Thank you. I'll give it a try.

  • @xRalphy
    @xRalphy Před rokem

    Present!

  • @7th_CAV_Trooper
    @7th_CAV_Trooper Před 6 měsíci +1

    In most cases I'd rather pay the cost in time for the side-effect free operations.

  • @timothywestern6488
    @timothywestern6488 Před rokem

    Great job pointing out that Array.Sort mutates the existing array (as opposed to returning a new array, which might point item at.

  • @SteveGietz
    @SteveGietz Před rokem

    For descending, how about following the list.Sort with a list.Reverse?

    • @nickchapsas
      @nickchapsas  Před rokem +3

      Bad idea. Just use a static comparer

  • @tymekmajewski4718
    @tymekmajewski4718 Před rokem

    BTW. By *not* using a new Random(420) the arrays in are different in each benchmark.

  • @PsiHamster
    @PsiHamster Před rokem

    I don't think there is such big difference between qsort and introsort. Most likely such speed difference happens because OrderBy creates inner array of keys to sort, and then maps it into resulting IEnumerable.

    • @nickchapsas
      @nickchapsas  Před rokem +1

      There is a very noticable difference as you go on different sizes of collections, because the average of the algorithm changes

    • @PsiHamster
      @PsiHamster Před rokem

      ​@@nickchapsas ​ I think my comment were deleted because of link of github. So i will wrote without code link
      I made a benchmark that tests Array.Sort, OrderBy and simple QSort implementation
      And it has no such a big difference between Array.Sort and QSort. Array sort slightly faster than QSort on small array size, and almost same for biggest.
      But Linq sorting with OrderBy always more than twice slow and getting worse with bigger sizes.
      | Method | Size | Mean | Error | StdDev | Allocated |
      |---------- |------- |--------------:|------------:|------------:|----------:|
      | ArraySort | 100 | 2.943 us | 0.0292 us | 0.0273 us | 576 B |
      | LinqSort | 100 | 6.707 us | 0.0479 us | 0.0400 us | 2024 B |
      | QSorting | 100 | 4.114 us | 0.0640 us | 0.0568 us | 576 B |
      | ArraySort | 10000 | 488.011 us | 4.6788 us | 4.1477 us | 40176 B |
      | LinqSort | 10000 | 1,215.639 us | 10.1845 us | 9.5266 us | 160425 B |
      | QSorting | 10000 | 598.974 us | 2.3151 us | 1.9332 us | 40177 B |
      | ArraySort | 100000 | 5,830.275 us | 94.3979 us | 88.2999 us | 400220 B |
      | LinqSort | 100000 | 15,815.721 us | 307.9211 us | 354.6024 us | 1600596 B |
      | QSorting | 100000 | 6,993.457 us | 24.4451 us | 21.6700 us | 400220 B |

    • @PsiHamster
      @PsiHamster Před rokem

      QSort benchmark code
      [Benchmark()]
      public int[] QSorting()
      {
      var items = Enumerable.Range(0, Size).Select(_ => random.Next()).ToArray();
      SortArray(items, 0, items.Length - 1);
      return items;
      }
      public int[] SortArray(int[] array, int leftIndex, int rightIndex)
      {
      var i = leftIndex;
      var j = rightIndex;
      var pivot = array[leftIndex];
      while (i pivot)
      {
      j--;
      }
      if (i

  • @Bliss467
    @Bliss467 Před rokem +1

    Maybe worth pointing out by having three different arrays of different random numbers it can make for inconsistent results

    • @sgartner
      @sgartner Před rokem +1

      I was coming to mention the same thing. He made a point of that initially, then made changes that created three unique lists. However, I don't think that it invalidates his conclusions since the results are so stark.

    • @nickchapsas
      @nickchapsas  Před rokem +1

      It doesn’t make any difference for these scenarios. The only thing that really makes a difference is whether the arrays contain sequential numbers or not

  • @marcotroster8247
    @marcotroster8247 Před rokem +3

    The difference is not so significant when you're just benchmarking with 100 items. I've benchmarked it with 500k items and the difference between Array.Sort and OrderBy is a lot more significant. It's about 5ms vs. 60ms. So actually, Array.Sort is really nice, 12x faster for large loads 😄
    And I'm not sure why they even use comparison-based algorithms. Integers and strings can be sorted in linear time with radix / bucket sort. Can someone explain why they don't use those algorithms? Is there an implementation detail that I'm missing out on? 🤔

    • @mgerry1992
      @mgerry1992 Před rokem

      If I remember correctly, bucketsort uses 3 arrays of the same size: the original, another one for index-swapping, and a third one for the ordered result - which is a LOT of memory waste. But correct me if I'm wrong.

    • @marcotroster8247
      @marcotroster8247 Před rokem

      @@mgerry1992 So, you're arguing that the memory allocation of 2 additional arrays with n elements each is more costly than a factor of O(log n) more computation time? 🤔
      I mean that's log2(500k) = 19 as an additional factor. What's the factor of bucket sort or radix sort? Is it really worse than quick sort? 🤔

    • @patziwyruch
      @patziwyruch Před rokem

      @@marcotroster8247 "Is it really worse than quick sort" - Yes. I guess there is almost no usecase to sort 500k elements so therefore they dont use it. But if u have it then radix is the way to go.

    • @patziwyruch
      @patziwyruch Před rokem

      ​@@marcotroster8247 czcams.com/video/FNAUuYmkMPE/video.html here u have a funny video where sorting Algorithm are compared. (14:52 results)

    • @marcotroster8247
      @marcotroster8247 Před rokem +1

      @@patziwyruch I hope you're not being seriously. An API designer should never restrict the user in such a way. People are relying on an API to sort their data with optimal time complexity; that's the reason for using an API in the first place.
      And people do have legit reasons to do so. You're not allowed to judge as an API designer. Just do your damn job and put the radix sort in when somebody wants to sort lots of strings/ints. It's really not that hard.
      My CS theory prof once told me 25% of computation time is spend on sorting. So I'd say this really matters and people are killing our climate with such an ignorant attitude of intentionally making things worse than they needed to be.

  • @slimaneikourfaln3496
    @slimaneikourfaln3496 Před rokem +5

    You didn't run Benchmark with span example 😉 I was waiting for Benchmark Result of Span, but you end the video 😋

    • @nickchapsas
      @nickchapsas  Před rokem +1

      It is equally as fast as the array part and less memory efficient because it allocates the memory

    • @slimaneikourfaln3496
      @slimaneikourfaln3496 Před rokem

      @@nickchapsas Okey I see.
      Thank you Nick for all your hardwork

    • @autoplanet4833
      @autoplanet4833 Před rokem

      @@nickchapsas well, you could create a reference to the array, then create a span from it, sort the span, and the original reference to the array would be sorted and just return it without any additional allocations

    • @carmineos
      @carmineos Před rokem +2

      ​@@nickchapsas You are just declaring "items" as a Span but it has an implicit operator to convert the array returning from ToArray(), you could save the reference as array and then use AsSpan().Sort() to sort the array in place and then simply return "items" avoiding the ToArray().

  • @ronnyek4242
    @ronnyek4242 Před rokem

    What screen recorder do you use?

  • @pedrojose9135
    @pedrojose9135 Před rokem

    nice shirt in the thumbnail

  • @michaksiazek5905
    @michaksiazek5905 Před rokem

    to be clear - when you compare sorting methods you should compare this same data sets for each algorithm so instead of creating new collections of random items each time you should create this collection once and work on a copy of this collection

    • @nickchapsas
      @nickchapsas  Před rokem +7

      This wouldn’t work for the sort method because they mutate the collection. That’s why I’m using a random with a seed. To deterministicly create the same random data

    • @DJDoena
      @DJDoena Před rokem +1

      @@nickchapsas That's why OP said "work on a copy of this collection". You can create one randomized collection and then just create copies using the List constructor or the Linq .ToList(). Then you have different heap lists but the same content.

  • @dayv2005
    @dayv2005 Před rokem

    This comment is unrelated to this video. Hey Nick, I really hate ENUMs. Specifically when newer devs use them in service contracts and change somewhat regularly. Would you mind putting together a video on a more elegant way to handle this topic that doesn't break contacts? I see this specifically in an EntityType or EventType and the moment a new one is added everything breaks.

    • @JetsetDruid
      @JetsetDruid Před rokem

      Seems like you could solve that by versioning your interfaces and map newer enum entries to best fit from the older enum definitions or a default value for code not ready to use the latest enum definitions on the implementation of the interface.

    • @dayv2005
      @dayv2005 Před rokem

      @@JetsetDruid sure but I would argue that adding a new type to an entity could be done without it increasing the API version. A good example would be to build something like a value object that represents that type and exposing the primitive of it such as a string so you don't break a contract by introducing a new type.

    • @JetsetDruid
      @JetsetDruid Před rokem +1

      @@dayv2005 I'd rather do my logic based on an enum that a string, but that's just me I guess.
      I'd say you are on to something if the property you are representing can be changed at run time (say if we were modelling a car and you stored model as an enum) but if the change would require compile time changes anyway (say something like the fuel type) then I see no problems with enums

  • @mabakay
    @mabakay Před rokem

    .NET 7 is available and people are discovering that the methods available from .NET 1.1 can perform better :-D

  • @kyriakosch
    @kyriakosch Před rokem

    Irrelevant to the video, how do you make that popup at 8:43 appear ?

  • @chezchezchezchez
    @chezchezchezchez Před rokem

    My brain hurts

  • @stephenyork7318
    @stephenyork7318 Před rokem

    Of course this would not apply to Linq to entities.

  • @endofunk2174
    @endofunk2174 Před rokem

    Premature optimization the bane of every unproductive developer.

  • @ali.mazeed
    @ali.mazeed Před rokem

    but I guess I just have to deal with bluetooth, tNice tutorials is a big con.

  • @arjix8738
    @arjix8738 Před rokem

    I wonder what your first language is.
    I am actually confused, because ur last name sounds of Greek origin, but your accent does not match that.

    • @nickchapsas
      @nickchapsas  Před rokem +3

      I'm Greek

    • @arjix8738
      @arjix8738 Před rokem

      @@nickchapsas woah, your English accent is so nice!
      I hope my accent one day gets that good

  • @bob-xm7ny
    @bob-xm7ny Před měsícem

    If I found any of my devs spending this much time to save nanoseconds and Kb of memory we would have a discussion about "premature optimization." These videos are very fun to watch, but this isn't useful for production code. By the time you scale up to make a difference here, you've probably made some mistakes that lead you there. That's the place to focus your refactor.

  • @asteinerd
    @asteinerd Před rokem +1

    "Nick Chapsas - Champion of The Span Struct"
    No judgement, just poking fun at your regular recommendations in the use of Span. LOL
    ... and stop worrying about the pronunciation so much. English isn't your first language and the words with almost equal parts vowels as syllables are difficult for everyone.
    I'll endlessly make fun of Americans, Brits, etc., that were born into the language butchering it, but it's understandable for ESL folks to get tripped up sometimes. :D

  • @officebatman9411
    @officebatman9411 Před rokem

    This video feels like 12 minutes just to get talked down to and find out something I already knew and would have no problem finding on google if I didn’t…

    • @nickchapsas
      @nickchapsas  Před rokem +6

      Were you on gunpoint when you clicked the video?

    • @marklord7614
      @marklord7614 Před rokem +1

      You're just saying this video wasn't right for you. Based on the responses, people found it interesting. You do know that youtube allows you to skip or close the video all together right?

  • @TheMonk72
    @TheMonk72 Před rokem

    Lots of issues with this.
    When you re-write the code to generate the item list in every iteration without re-seeding the PRNG you're potentially operating on different lists of integers every time, which tends to invalidate the benchmark. Just make a copy of the array and pass it to the benchmark method using "[ArgumentSource(...)]" and a parameter. That or clone the array when needed in the "Array.Sort" benchmark and leave the original "_items" alone.
    "Array.Sort()" on strings is slow *if you don't provide a comparer.* If you pass in "StringComparer.Ordinal" (or whichever comparison you prefer) then it will run significantly faster. Without the comparer I found it performed a bit worse as indicated in your benchmarks, with the comparer it was much faster than OrderBy() or Order(). "Array.Sort()" with and without a supplied comparison on my machine was 13us (with) vs 83.17us (without). (Interestingly the StdDev was much lower for the "with" version.)
    Order() and OrderBy() also accept comparers, and perform *much* better when you use them. They still allocate more, but in terms of speed they're much faster when you explicitly provide the comparison. Moral of the story is to always specify the comparer when using any of the sorts. Not doing so will slow you down.
    Sorting spans is a tiny bit faster, and you can use "AsSpan()" to wrap an existing array without double-allocating. Something like this works just fine and is very slightly quicker (~1 SD worth in my tests) than "Array.Sort()":
    var data = _items.ToArray();
    data.AsSpan().Sort(StringComparer.Ordinal);
    return data;
    And finally, "Array.Reverse()" is blazing fast and never allocates, so "Array.Sort(...); Array.Reverse(...);" is still faster, even when you're making a copy of the array to leave the original intact.

  • @realsk1992
    @realsk1992 Před rokem

    Shouldn't it be ToList().OrderBy() to actually sort a list?

    • @nickchapsas
      @nickchapsas  Před rokem

      Oh no, because you won't be getting a list back. You'll be getting an enumerable and you'd have to re-enumerated