Don’t Use the Wrong LINQ Methods

Sdílet
Vložit
  • čas přidán 17. 12. 2023
  • Use code MICRO20 and get 20% off the brand new "Getting Started with Microservices Architecture" course on Dometrain: dometrain.com/course/getting-...
    Get the source code: mailchi.mp/dometrain/cpl-fuiefwu
    Become a Patreon and get special perks: / nickchapsas
    Hello, everybody, I'm Nick, and in this video, I will show you the difference between two LINQ-associated methods that look exactly the same but perform very differently.
    Workshops: bit.ly/nickworkshops
    Don't forget to comment, like and subscribe :)
    Social Media:
    Follow me on GitHub: github.com/Elfocrash
    Follow me on Twitter: / nickchapsas
    Connect on LinkedIn: / nick-chapsas
    Keep coding merch: keepcoding.shop
    #csharp #dotnet

Komentáře • 115

  • @mrahhal
    @mrahhal Před 4 měsíci +133

    Small correction at 8:03. The 40 bytes are not because the enumerator was allocated, in this case the enumerator a List gives back is a struct. The 40 bytes are because the struct needs to be boxed into an IEnumerator interface because foreach operated on an IEnumerable. A foreach on List doesn't allocate and is faster than a foreach on (IEnumerable)List which allocates, as a List does an explicit implementation of GetEnumerator (to hide it, because it returns an interface) and adds another GetEnumerator method that returns the struct enumerator directly - which foreach will use - which avoids boxing to an interface. It's also faster because calls are direct (static binding b/c struct methods) instead of virtual (dynamic binding through a vtable because of the interface).

    • @infeltk
      @infeltk Před 4 měsíci

      And if I would go to check list, I would use more traditional check with param list. There is no need to push LINQ everywhere. If there is created list that means that should be exist real need for early list allocation.
      It is strange, what bad is engineered C#. And there is no big sense to optimize code unless programmer is writing high speed code. But in this case it would be assumed that he/she is high skilling programmer.

    • @Lammot
      @Lammot Před 4 měsíci +14

      @@infeltk your comment does not compile.

    • @8BitsPerPlay
      @8BitsPerPlay Před 4 měsíci +4

      Thank you for this explanation. I love seeing comments like this that help explain the behind scenes allocation process of certain features in C#.

    • @wokarol
      @wokarol Před 4 měsíci +6

      To add to this. For those confused how foreach can skip the boxing when enumerating over a list, the reason it works is due to foreach using "duck typing". That is, foreach does not look if the type is IEnumerable, but instead it looks if there is a GetEnumerator method.
      But that brings an interesting point, why is .All() not optimized for it already? .Count() checks if the target type is a collection (and if it is, avoids enumeration).
      In theory, can't .All() do the same? Check if the target has an indexer (I think IReadOnlyList interface would be fine) and use the for loop in that case.

    • @mrahhal
      @mrahhal Před 4 měsíci +2

      @@wokarol Yes, in theory it can, but linq methods weren't really designed for performance. The fact that all non-materializing linq methods allocate new objects already trumps any attempt at micro optimization. Linq is for readability and convenience, not for performance (doesn't hurt to optimize though since it's part of a standard lib, but gains will always be minimal after a certain point).

  • @RadusGoticus
    @RadusGoticus Před 4 měsíci +6

    You can try this List extension method approach as a simple fallback too since it's easier to incorporate into your code rather than changing the underlying List implementation:
    namespace System.Collections.Generic
    {
    public static class ListExtensions
    {
    public static bool All(this List list, Predicate predicate)
    {
    return list.TrueForAll(predicate);
    }
    }
    }
    This can be extended to handle functions, such as Any => Exists, FirstOrDefault => Find and so on.

  • @onetoomany671
    @onetoomany671 Před 4 měsíci +6

    I would defer to the principle of least surprise: if the performance is not a (measured!) issue, use the generic .All(), otherwise do whatever weird stuff you gotta do for performance.
    I wish collections had specialized implementations of certain LINQ methods. E.g. having to use the trifecta of .Length,
    .Count and .Count() depending on what kind of collection you're working with is annoying, and it feels like specialized implementations of .Count() could exist for Lists/Arrays/etc.
    I bet someone is going to do some nasty things about this with interceptors at some point...

  • @Krimog
    @Krimog Před 4 měsíci +17

    There are already many LINQ methods that have different behaviors depending on the real type of the source enumerable. Why don't they change the All method so that it uses TrueForAll when source is a List or a T[] ?

    • @micoberss5579
      @micoberss5579 Před 4 měsíci +1

      I have the same question. Why use All when TrueForAll exists ?

    • @Krimog
      @Krimog Před 4 měsíci

      @@micoberss5579 Because when you have an IEnumerable, you don't know if TrueForAll exists or not. If they included the TrueForAll method in the All implementation (for types that have a TrueForAll method), basically, that video wouldn't have been needed. You would have the most performant algorithm automatically.

    • @TheTim466
      @TheTim466 Před 4 měsíci

      @@micoberss5579 Checking each IEnumerable whether it is a list will of course make all calls to "All" slower, but they do it in other cases...

    • @chris-pee
      @chris-pee Před 4 měsíci

      My guess is that Any is simple and fast enough, that checking the underlying type would be unnecessarily expensive.

    • @Krimog
      @Krimog Před 4 měsíci +2

      @@chris-pee The Any without predicate checks the underlying type. As for the Any with a predicate and All, I'm pretty sure checking for the underlying type would still be quicker than memory allocation.

  • @ecpcorran
    @ecpcorran Před 4 měsíci +3

    The enumerator for the List will check the _version field of the list on the MoveNext calls. That field is updated every time the list is modified so if the list is modified while you are iterating over it, you will get an InvalidOperationException exception informing you that the collection was modified. TrueForAll does not check the _version field and just directly indexes into the list. If All used TrueForAll under the covers, then that would be a behavior change. Also, is Func equivalent to Predicate? I wasn't aware that the latter existed until watching this video.

  • @lordshoe
    @lordshoe Před 4 měsíci +23

    I'm curious if this is generally true for other Linq methods with native counterparts.
    For example Exists vs Contains or Any / Where vs FindAll.
    Would the only difference be the generation of the Enumerator? I know some Linq methods do some magic underneath the hood but I always assume the native implementation to better.

    • @jongeduard
      @jongeduard Před 4 měsíci

      I have the same questions. And it's about both the methods of the List class as well as the static methods of the Array class.
      Though what I remember, although this was not with the latest DotNet version but a couple of years ago, is I tested things like Sort to be a lot faster than OrderBy.
      The code of Sort tries to use all kinds of optimized sorting algorithms when it can, depending on collection size and data type. Even an intrinsic native version that it imports from the dotnet runtime, which is written in C++.
      However LINQ methods are still really great when you have any kind of method that generically accepts any IEnumerable, which is actually a really good coding practice too. I still prefer that in most non performance critical code. This is still a lot more efficient than only accepting stored buffers like List and Array when they don't need to be created in the first place.
      But if you are writing a library, it can a good practice as well to serve different overloads, also accepting Spans and things like that.

  • @kikinobi
    @kikinobi Před 4 měsíci +12

    Maybe it could be possible to write a source generator that intercepts the LinQ method call, and forces to use the most optimized implementation according to the the type

    • @saeedbarari2207
      @saeedbarari2207 Před 4 měsíci +10

      at that point, since linq is built-in, why not just do compiler magic instead?

    • @kocot.
      @kocot. Před 2 měsíci

      static analysis catches a lot of those, so I really dont see why bother

  • @andreypiskov_legacy
    @andreypiskov_legacy Před 4 měsíci +3

    Nick, you provided the wrong explanation: allocations are due to interface as a parameter type.
    Just make two functions with foreach inside: one with list parameter, and the other with interface like IReadOnlyCollection; pass list into both and behold the allocations in the second method

  • @mightybobka
    @mightybobka Před 4 měsíci +50

    TrueForAll is not LINQ method. This is List method. Just like Add or Remove. Don't mix them.

    • @chr_kress
      @chr_kress Před 4 měsíci +4

      Nick never stated that TrueForAll is a LINQ method. On the contrary, he mentioned explicitly that TrueForAll is a method on List at 5:30.

    • @JSWarcrimes
      @JSWarcrimes Před 4 měsíci +10

      @@chr_kress right, but the video is called "Don’t Use the Wrong LINQ Methods" , so it kinda gives wrong context

    • @chr_kress
      @chr_kress Před 4 měsíci

      ​@@JSWarcrimes, @mightybobka I agree, this is a misleading title . I missed that. Sorry!

    • @kocot.
      @kocot. Před 2 měsíci

      unfortunately, feels quite clickbaity

  • @Nate77HK
    @Nate77HK Před 4 měsíci

    Thinking of IEnumerable as a linked list - counting the elements is in itself an enumeration of the list. The implementation of All() therefore could not depend on the count of elements like TrueForAll does and can't avoid the enumerator.
    Using TrueForAll on an IEnumerable would require a ToList() or ToArray() call first, which also uses the enumerator unless the thing can be pattern matched to ICollection, in which case it does a memcpy.
    Even if the thing is ICollection and then TrueForAll() could be used, this is actually not the same behavior as All() because an Enumerator does more on each MoveNext than what TrueForAll is doing in its loop body, which another commenter has already pointed out.

  • @neralem
    @neralem Před 4 měsíci +2

    So basically TrueForAll() >= All() ?
    I don't understand why MS did it this way. Lets say if All() exists before TrueForAll() - why didn't MS just replaced the Implementation of All() with the more performant one? And if TrueForAll() did exist before All(), why did they even add it in the first place?

  • @user-tk2jy8xr8b
    @user-tk2jy8xr8b Před 4 měsíci

    So the collections need an optimal fold implementation with early exit, right? Having that, one can express First(), Single(), Any(), Any(Func), All(Func), Select(Func), Aggregate(...), Skip(int), Where(Func), Take(int) and some others without need for IEnumerable. Something like `IFoldable { TAgg Fold(TAgg seed, Func f) }`. Now add a lazy `IFoldable Reverse()` that would exploit indexing and you get Last(), SkipLast(int), TakeLast(int)

  • @VeNoM0619
    @VeNoM0619 Před 4 měsíci +1

    The foreach issue strikes again.
    Wonder if C# could just have 2 paths when running any foreach. If its a simple List :run the basic - no enumerator for loops.
    Then LINQ doesnt have to be responsible for checking for List/NonList switching.

  • @ryan-heath
    @ryan-heath Před 4 měsíci +32

    They need to have some improvements left for net9 😅

    • @modernkennnern
      @modernkennnern Před 4 měsíci

      There might be some drastic improvements in LINQ for the readonly collections coming in .Net 9 if an experiment comes through.
      Specifically, they're trying to finally make `ICollection` implement `IReadOnlyCollection` (same for `IList` & `IReadOnlyList`, `IDictionary` & `IReadOnlyDictionary`
      , and `ISet` & `IReadOnlySet`)

    • @nothingisreal6345
      @nothingisreal6345 Před 4 měsíci +2

      They are very busy fixing all the serious bugs in other areas

    • @WDGKuurama
      @WDGKuurama Před 4 měsíci

      ​@@nothingisreal6345Where can we find those bugs, any examples? (Real question)

  • @kyjiv
    @kyjiv Před 4 měsíci +3

    I wonder how ConvertAll vs Select, Find vs FirstOrDefault, FindAll vs Where, Exists vs Any perform, in both List and Array types.
    I expected them to be optimised like it's done for Count(), but in this video we see it's not a rule.

    • @TheTim466
      @TheTim466 Před 4 měsíci +1

      I guess they are doing a trade-off between slowing down all calls, even those where the object really only is an IEnumerable, and gaining performance in other cases. With Count(), checking can dramatically improve performance (iterating through the whole thing vs. just reading the length, so it probably is worth it. On the other hand, in many cases the IEnumerable will be a list or array...

  • @RealCheesyBread
    @RealCheesyBread Před 3 měsíci

    Someone made LinqAF. It's Linq, but implemented entirely using structs and is nearly allocation-free (hence the AF). Apparently it's a little slower than Linq, but its performance is more consistent for cases such as game development because of the immensely reduced allocations.

  • @1nterstellar_yt
    @1nterstellar_yt Před 4 měsíci +4

    Why "x => x > 0" lambda didn't contribute to allocations? Is it some kind of C# compiler optimization?

    • @samsonho5537
      @samsonho5537 Před 4 měsíci +1

      I guess the C# compiler actually takes the managed function pointer and emit calli instead of creating the delegate object.

    • @mk72v2oq
      @mk72v2oq Před 4 měsíci +7

      Because it does not capture anything, so it compiles to a regular function.

    • @MRender32
      @MRender32 Před 4 měsíci +6

      No variable captures

    • @lordmetzgermeister
      @lordmetzgermeister Před 4 měsíci +2

      It's not a closure so the lambda is just a delegate to a BTS generated method. Otherwise the compiler would generate a class for the closure and the instance would be allocated to the heap.

  • @zwatotem
    @zwatotem Před 4 měsíci +13

    "...by using the right one, you're going to reduce the performance of said operation in half"
    No, thank you! 😂

    • @johnnyblue4799
      @johnnyblue4799 Před 4 měsíci

      I stopped the video right there looking for a comment like this and ready to write it myself if none was to be found!

  • @zwatotem
    @zwatotem Před 4 měsíci

    Hurts, that C#'s extension methods are not really methods, in a sense that they are not dynamically dispatchable. If that was the case, List could have had its own overload of All(p) and other LINQ methods with the applicable optimizations. I guess if I'm speculating, I'd also add, that they of course should be optimized to static dispatch by the compiler in all the applicable places.
    So in this example you would get exactly the same performance as TrueForAll, and in a case, where you treat the list as IEnumerable you would only get the overhead of dynamic dispatch, not the allocation.

  • @zORg_alex
    @zORg_alex Před 4 měsíci

    As a Unity developer, I always create extensions that copy linq methods optimised for array and list. That's my workaround, I'm just piggybacking that heap of extensions or make them a new. I really use a lot select and to array, so i make SelectArray method for status, lists and enumerable and roll with it.

    • @zORg_alex
      @zORg_alex Před 4 měsíci

      Also unity lacks a lot of simple things, like Array.IndexOf and other things, they are for some reason under ArrayUtility. Heaps of extensions. If I'd know that I'll write so much code just for ease of use a decade ago😂.

  • @vyrp
    @vyrp Před 4 měsíci +31

    Do you really need your own `MyList` class? Wouldn't the `foreach` on the standard `List` have worked the same?
    The problem with the `foreach` in `Enumerable.All()` is that the static type of the sequence is `IEnumerable`. That causes boxing of the enumerator and virtual calls for `MoveNext()` and `Current`.
    If the static type were `List`, the `foreach` would have been as performant as the `for`.

    • @colejohnson66
      @colejohnson66 Před 4 měsíci +6

      Was gonna comment the same thing. It's not necessarily the foreach or 40 bytes that's the issue here, but the virtual calls through an interface (which brings with it the overhead of allocating and disposing). Using IEnumerator requires method calls to MoveNext() and get_Current() every iteration. So three calls per iteration (MoveNext(), get_Current(), and predicate()) instead of two (this[int] and predicate()). I'm curious how the JIT assembly compares between the two.

  • @akeemaweda1716
    @akeemaweda1716 Před 4 měsíci

    As always, you never disappoint, always on point.
    Thanks Nick!

  • @MirrorBoySkr
    @MirrorBoySkr Před 4 měsíci

    Does it make sense to use arrays instead List in such cases?

  • @reikooters
    @reikooters Před 4 měsíci +2

    I tend to not use higher order functions that often in programming besides maybe sorting, but wanted to mention that the SonarLint extension for Visual Studio will tell you in these cases when a better alternative (type specific) that should be used is available.

  • @BrunoBsso
    @BrunoBsso Před 4 měsíci +5

    Personally I'd use .Any(x => x < 0) for that kind of evaluations, not .All() and I've never used TrueForAll(), it's like that one is not in my orbit at any time.
    Do you have (or can you make a new video about it) Any() benchkmark (pun intended)?

    • @clantz
      @clantz Před 4 měsíci

      This

    • @onetoomany671
      @onetoomany671 Před 4 měsíci +2

      .Any(predicate) vs .All(not(predicate)) is a question of code clarity, IMO. Use whichever best expresses the business rule or whatever. Hard enforcing the use of one over the other is a code smell IMO.

    • @daniellundqvist5012
      @daniellundqvist5012 Před 4 měsíci

      Except that any can stop iterating at first false, which can make a real difference

    • @chylex
      @chylex Před 4 měsíci

      @@daniellundqvist5012 so does All, it makes no difference whatsoever

    • @palecskoda
      @palecskoda Před 4 měsíci

      Any() can stop at first true and return true, All() can stop at first false and return false. What is your point,@@daniellundqvist5012?

  • @KodingKuma
    @KodingKuma Před 20 dny

    Very very very useful.

  • @Sander-Brilman
    @Sander-Brilman Před 4 měsíci +3

    and how does this method compare to the .Any method?

    • @onetoomany671
      @onetoomany671 Před 4 měsíci +1

      .Any(predicate) == .All(not(predicate)). I would be surprised if their performance differs in any way, they should both terminate at the exact same enumeration.

  • @marna_li
    @marna_li Před 4 měsíci +2

    In other words, if you are dealing with a known collection type directly, use the specialized methods if you care about performance and memory allocation. That way you escape the enumerable, especially when not needed for a query.

  • @michalidzikowski5239
    @michalidzikowski5239 Před 3 měsíci

    IDE eating 1MB of RAM during idle times - great way to use memory for blinking the cursor :D

  • @HeathInHeath
    @HeathInHeath Před 4 měsíci

    Thanks for a very useful video. I appreciate the level of detail that you provide.

  • @dev-on-bike
    @dev-on-bike Před 2 měsíci

    i think that most of business developers will not care about this so much as GC is ready and there is 64GB+ of ram on machine and cpu has a lot of cores. This is probably one of many hidden gems that u cant find easly in msdn. if I want to follow optimization path i would rather craft my own function based maybe on plinq or i could introduce SIMD. so for this is a half way only solution. but i like this kind of usefull gems. I wonder only why m$ not obsolate another slower version.

  • @Subjective0
    @Subjective0 Před 4 měsíci

    im i guess this is what an extension could fix with intercepters. Simply go in and fix all the bad usage of linq by changing the methods around?

  • @dimitar.bogdanov
    @dimitar.bogdanov Před 4 měsíci +7

    Are there more methods like this?

    • @lingfar4134
      @lingfar4134 Před 4 měsíci +2

      Any() when you have an IEnumerable and Exists() when you have a List, Array

  • @Psykorr
    @Psykorr Před 4 měsíci

    Yes! Very good analysis.

  • @optiksau1987
    @optiksau1987 Před 4 měsíci +4

    Your content is generally great, but you’re getting into micro-optimisation territory Nick. 40 bytes of allocation, or 6us extra, is typically *not going to matter*.
    I’ve worked with plenty of devs that that worrying about all of these micro-optimisations to the detriment of clean code.

    • @ghaf222
      @ghaf222 Před 4 měsíci +2

      If 1 billion C# programmers watch this video and take the advice, collectively 6 seconds of computing time will have been saved!
      I do enjoy the videos and find these things interesting to know, however I’ve also worked with developers who had to constantly change code to use the latest in thing, and it meant their code was never finished as something else would come along they wanted to use.

  • @kocot.
    @kocot. Před 2 měsíci

    TLDR; list implementations are better than generic IEnumerable methods, and btw one of the 2 methods discussed is not even LINQ o_O, so the title isn't very honest. Now, if you're interested why (foreach vs for), feel free to skip to ~ 5:00 (although reading comments is probably a better idea as the explanation from the video is arguable)

  • @kosteash
    @kosteash Před 4 měsíci +1

    What is with .Any ? Why do we need all or trueforall, if we have any 😁

    • @NickMaovich
      @NickMaovich Před 4 měsíci

      if you have any, you need to reverse condition for it to function and short-circuit. But it doesn't solve the performance problem it can bring.
      If you to use .All in frequently executed code - this will bring huge stress on GC

    • @zabustifu
      @zabustifu Před 4 měsíci

      Exists is the List-specific equivalent of Any

  • @lordicemaniac
    @lordicemaniac Před 4 měsíci

    is "All" twice as slow on larger list than 3 items?

    • @phizc
      @phizc Před 4 měsíci

      The benchmark had 10000 items, not 3.

  • @MatinDevs
    @MatinDevs Před 4 měsíci

    You definitely have to record a video about Aspirate and k8s deployment via Aspire

  • @chrisusher5362
    @chrisusher5362 Před 4 měsíci

    Not using the new UI in the latest version of Rider??
    I like it, much cleaner hiding most of the menus I never use and focusing on one I use all the time

  • @isnotnull
    @isnotnull Před 4 měsíci +2

    I don't remember I ever used this method. The faster way is to use reverse Any method. Instead of numbers.All(x => x > 0) use numbers.Any(x => x

    • @okmarshall
      @okmarshall Před 4 měsíci +2

      Would it be way faster? They should both exit on the same enumeration, which is the first negative or 0.

    • @isnotnull
      @isnotnull Před 4 měsíci

      @@okmarshall My bad

  • @serbanmarin6373
    @serbanmarin6373 Před 4 měsíci

    would be nice to get a `TrueForAny` method to get the performance gains and not have to invert the condition for existing code

  • @Dimencia
    @Dimencia Před 4 měsíci +1

    If you're concerned about the performance of foreach vs a for loop, you probably shouldn't be using C# in the first place, if performance is that important to your application...
    And it's probably worth pointing out that even if it's "twice" as fast with 10k values, doesn't mean that it'd still be twice as fast with 100k values

    • @modernkennnern
      @modernkennnern Před 4 měsíci +1

      Performance aware code is important regardless of language.
      Why make something slower if it costs you nothing.
      I agree with the idea that you shouldn't spend an exorbitant amount of time optimizing, but if simply choosing the correct data structure can have drastic effect on performance then you should be aware of it.
      Like, if you know a collection is 100 elements, specify that in the list constructor. Better performance and makes the code clearer

    • @okmarshall
      @okmarshall Před 4 měsíci +3

      Simply not true anymore. C# is very fast compared to a lot of languages, depending on your application. We can always think about optimisation, and yes, some do use C# for high-performance applications.

    • @Dimencia
      @Dimencia Před 4 měsíci +2

      @modernkennnern it doesn't cost nothing, it costs readability. There's a reason people prefer to use and read foreach loops, and a reason people use c# instead of c++

    • @modernkennnern
      @modernkennnern Před 4 měsíci

      @@Dimencia I definitely agree with that specific scenario. I never use for loops either, and despite what this video told you there's actually no perf difference between for and foreach loops anymore. Nick even has a video on this topic, I'm surprised he didn't think this through.
      All I'm saying is that I do not subscribe to the notion that "performance does not matter" that so many developers nowadays tout. It's not critical, but it _does_ matter.
      However, I'm also not advocating for using `ReadOnlySpan`s and `InlineArray`s everywhere in order to eek out that tiny bit of performance, while make the code unreadable in the process. Hence the phrase "performance aware".
      You should be aware of what you're doing, and realize that a `List` is not the only collection type available (contrary to practically all code I've ever seen at the company I work for).

  • @berzurkfury
    @berzurkfury Před 4 měsíci +4

    What a perfect example of over optimization and "clever" code for high limited improvements, msft constantly chases perf improvements......why bother learning this

  • @andreyz3133
    @andreyz3133 Před 4 měsíci +1

    allPositive must be refactored into anyNegative and use .Any() and filter if any < 0

  • @ryanzwe
    @ryanzwe Před 4 měsíci +1

    Zoomies

  • @zoiobnu
    @zoiobnu Před 4 měsíci

    hahahaha, 420 . I know what you mean

  • @unskeptable
    @unskeptable Před 4 měsíci

    It is not a Linq method man

  • @mustafasabur
    @mustafasabur Před 4 měsíci

    This level of optimization is not meant for C# code. Use C or C++ if you are counting bytes.

  • @BlyZeHD
    @BlyZeHD Před 4 měsíci

    I would have done list.Min() >= 0 xD

    • @lordmetzgermeister
      @lordmetzgermeister Před 4 měsíci

      Use list.OrderBy(x => x).First() >= 0 to get on the sigma grindset, my dude.

    • @nothingisreal6345
      @nothingisreal6345 Před 4 měsíci +2

      That is expensiv

    • @phizc
      @phizc Před 4 měsíci +1

      All and TrueForAll stops if the predicate is false, which could be first item. Min has to get the smallest value, so it has to evaluate every item.
      E.g.
      If the first value is -1, and there are 1 million items, All/TrueForAll evaluate 1 item before returning false, while Min has to evaluate all 1 million items.