Here is How Fast is LINQ, Spans, and Everything

Sdílet
Vložit
  • čas přidán 22. 05. 2024
  • Become a sponsor to access source code ► / zoranhorvat
    Join Discord server with topics on C# ► codinghelmet.com/go/discord
    Enroll course Beginning Object-Oriented Programming with C# ► codinghelmet.com/go/beginning...
    How many times have you heard that LINQ is slow? How about using an array instead? Yeah, sure, but how much does it cost to obtain an array?
    There are so many questions to answer when estimating the performance of data-crunching operations in C#. If I told you that this video will test 27 different methods of passing through data and applying an arithmetic transform to them, then you will know it is a serious matter.
    Watch this video to learn how different iteration methods compare and which iterations are available in the first place, depending on the type of data your application is processing.
    From plain number crunching to processing a query in a complex business application, this video will reveal the secrets of LINQ, spans, foreach loops, and collections.
    ⌚ 00:00 Intro
    ⌚ 00:42 Implementing number-crunching iterators
    ⌚ 03:42 Explaining the benchmark results
    ⌚ 07:12 Performance of iterators on a domain model
    ⌚ 09:50 Explaining the benchmark results
    ⌚ 12:00 Performance: the big picture
    Thank you so much for watching! Please like, comment & share this video as it helps me a ton!! Don't forget to subscribe to my channel for more amazing videos and make sure to hit the bell icon to never miss any updates.🔥❤️
    ✅🔔 Become a patron ► / zoranhorvat
    ✅🔔 Subscribe ► / @zoran-horvat
    ⭐ Learn more from video courses:
    Beginning Object-oriented Programming with C# ► codinghelmet.com/go/beginning...
    ⭐ Collections and Generics in C# ► codinghelmet.com/go/collectio...
    ⭐ Making Your C# Code More Object-oriented ► codinghelmet.com/go/making-yo...
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    ⭐ CONNECT WITH ME 📱👨
    🌐Become a patron ► / zoranhorvat
    🌐Buy me a Coffee ► ko-fi.com/zoranhorvat
    🗳 Pluralsight Courses ► codinghelmet.com/go/pluralsight
    📸 Udemy Courses ► codinghelmet.com/go/udemy
    📸 Join me on Twitter ► / zoranh75
    🌐 Read my Articles ► codinghelmet.com/articles
    📸 Join me on LinkedIn ► / zoran-horvat
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    👨 About Me 👨
    Hi, I’m Zoran, I have more than 20 years of experience as a software developer, architect, team lead, and more. I have been programming in C# since its inception in the early 2000s. Since 2017 I have started publishing professional video courses at Pluralsight and Udemy and by this point, there are over 100 hours of the highest-quality videos you can watch on those platforms. On my CZcams channel, you can find shorter video forms focused on clarifying practical issues in coding, design, and architecture of .NET applications.❤️
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    ⚡️RIGHT NOTICE:
    The Copyright Laws of the United States recognize a “fair use” of copyrighted content. Section 107 of the U.S. Copyright Act states: “Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phono records or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright." This video and our youtube channel, in general, may contain certain copyrighted works that were not specifically authorized to be used by the copyright holder(s), but which we believe in good faith are protected by federal law and the Fair use doctrine for one or more of the reasons noted above.
    #csharp #dotnet #benchmark
  • Věda a technologie

Komentáře • 52

  • @RupertBruce
    @RupertBruce Před 3 měsíci +4

    I wish my test cases were as clearly defined and have the same coverage as your benchmarks!

  • @anm3037
    @anm3037 Před 4 měsíci +16

    Feels good to see performance stuff on this channel

  • @protox4
    @protox4 Před 4 měsíci +8

    Depending on which Linq queries are used, they can generates tons of waste. It's less of a concern with modern moving GC, but it matters very much in Unity where it's still using the old non-moving Boehm GC. I recently stripped Linq out of the project at work and replaced it with manual loops and pooled arrays, and the fragmented memory dropped by 25% from that alone! Fragmented memory makes the rest of the code run slower, not just the Linq queries themselves. This matters very much for frame rates and frame spikes from GC.

    • @zoran-horvat
      @zoran-horvat  Před 4 měsíci +7

      Well, LINQ is generally not suitable for any realtime applications. Its focus is on consuming dynamic sequences of objects without collecting them.
      But your technical explanation of how Unity operates is very useful. Thank you.

    • @jongeduard
      @jongeduard Před 4 měsíci

      Unity is not regular DotNet, is always pretty far behind in versioning while it has other concerns too by being a game development platform and the all the specific additional things needed for that.
      It's really in very recent versions where the dotnet team added lots of low level CPU efficient stuff into the DotNet code base.
      The Sum function is exactly such a example where they did just that. Taking a look into it using the online Source Browser shows a TryGetSpan check followed by a call to a private implementation where they do explicit vectorization. I also know this because Nick Chapsas has talked about it in his videos.
      Technically you can write such code yourself, but it clearly goes a bit further than your typical for loop.
      I know that many compilers also vectorize code in their optimization process when producing a release build, but in DotNet this has to be either the JIT or the NativeAOT compiler and not the normal C# compiler which just generates IL. But apparently this still does not really win from explicitly written code that way.

  • @mitar1283
    @mitar1283 Před 4 měsíci +8

    Great video for real, people often talk about performance disregarding the actual optimizations that where made to Linq over the years. As we can see in this example, Linq has really come a long way.

    • @zoran-horvat
      @zoran-horvat  Před 4 měsíci +1

      The second example, with strings, is indicative, as SelectMany operates on recursive enumerators, unlike the solution with nested loops. One should be aware of such differences that may incur a measurable delay. And even that only when the problem dimension warrants that you will be able to actually notice the delay.

  • @nickbarton3191
    @nickbarton3191 Před 4 měsíci +3

    Ah the video I've been waiting for, great.
    And after watching it, you didn't disappoint.
    Thanks Zoran.

    • @nickbarton3191
      @nickbarton3191 Před 4 měsíci

      Got a question, has the aggregate with lambda to sum got similar performance to list.sum which is also Linq?
      It's just a shortcut, right?

  • @dzllz
    @dzllz Před 4 měsíci +3

    Great video! Would have been interesting with a memory comparison on the last test also. But good points overall

  • @HOSTRASOKYRA
    @HOSTRASOKYRA Před 4 měsíci +1

    Thank you very much!

  • @user-ic3vq1xi1p
    @user-ic3vq1xi1p Před 3 měsíci +1

    Many Thanks Great Man👍👍👍

  • @David-id6jw
    @David-id6jw Před 4 měsíci +1

    On the int test, you only showed the results for an array size of 10,000, so there is no feel for how things scale. With the string test, there was some interesting scaling effects.
    With N=100, there was only a slight penalty for using foreach vs span, but that grew with N=10000. Foreach went from 19% slower to 110% slower than the span solution. That's a fairly hefty penalty for a growing dataset, which hasn't even really gotten that large yet.
    With LINQ, the scaling penalty isn't nearly as bad. It went from 110% slower at N=100 to 160% slower at N=10000. It's still losing ground compared to the span implementation, but slowly enough that if the performance is adequate at low values of N, it should be relatively similar at high values of N. I suspect foreach will be slower than LINQ fairly soon if the dataset continues to grow.
    In both tests, you didn't show the memory allocation. I have suspicions about where things would go with memory, but suspicions aren't tests, and are notoriously unreliable. That's why we have tests in the first place.

    • @zoran-horvat
      @zoran-horvat  Před 4 měsíci

      Times scale linearly with N, so you can make a rough estimate. The situation changes somewhat (but not too much) for small N, like 10-100. But those times are well under 1us, so I ignored them altogether.

  • @evancombs5159
    @evancombs5159 Před 4 měsíci

    I will say, with cloud conputing often being usage based, optimizing for performance can have a real impact on the bottom line. That is changing the game when it comes to when to optimize.

  •  Před 4 měsíci

    Good video, shows that MS puts efforts in Linq. Could be more useful if .net version was mentioned.

  • @WillEhrendreich
    @WillEhrendreich Před 4 měsíci +1

    Ok, what if you wrote a native code way of doing that logic, in Odin for instance, then calling it from dotnet? What is the perf difference?

  • @SirBenJamin_
    @SirBenJamin_ Před 4 měsíci

    I have a real problem with performance when writing my code. I always write the most efficient method I can first, which as you pointed out has many side effects when compared to a much simpler (yet slower method). In most cases - my faster implementation is negligible when applied to the actual application it was designed for, and only shows speed benefits in my stress test sandboxes. Where as if I had just used Linq in the first place, I would have gotten the job done quicker. However ... having said that - you could argue that a more efficient solution in the first place means that your product is more robust. Again, ..it all comes back to knowing the requirements. "We need this feature which should on average compute in less than 100ms, and given the worst case scenario of N units, should take no longer than 1 second".

    • @logantcooper6
      @logantcooper6 Před 4 měsíci

      It depends on your definition of robust. Sounds to me like your code would not be robust against constantly changing business requirements. But again, depends on what you need.

    • @zoran-horvat
      @zoran-horvat  Před 4 měsíci +3

      In my experience on heavy-duty applications, you will know well in advance that performance requirements are changing.
      Typically, the development would receive information that acquiring a large customer is expected, or the architect would confirm a growing trend in volumes currently served. Anyway, my teams always knew the bad news weeks or months ahead of time, and we always had enough time to run the stress tests and improve parts of code that do not meet the future expectations.
      The bottom line is that it is usually sufficient to use basic reasons and avoid blunders, such as blatantly slow algorithms, and it should be alright. Try simple code first, worry where worrying is due.

  • @jongeduard
    @jongeduard Před 4 měsíci

    It's really important to consider the complete story. The general practice is to buffer things when you need to iterate over them more than one time.
    When I have both have a need for Count or Length while I also need to loop, I do a ToArray or ToList. If you forget that and iterate twice or more over the same lazy code, you're always going to face the worst performance possible.
    If you however can stay within a single iteration, you should not buffer. You can save not just CPU cycles, but especially memory as well. Being memory efficient is one of the main goals of LINQ. And higher order functions in general.
    Although Rust goes even further, where no GC exists and where heap usage is also extremely limited. The combinators are actualy zero cost and are faster than manual loops most of the time.

    • @zoran-horvat
      @zoran-horvat  Před 4 měsíci

      I have made that point clear in several other videos. In this one, the focus was on other aspects and so I said that if speed is all you want, then don't turn a sequence into a collection - it will be slower. That assumes no other requirements but the speed.

  • @ryanshaw45
    @ryanshaw45 Před 4 měsíci

    Enjoyed the video! FYI - It looks like your discord link is broken

    • @zoran-horvat
      @zoran-horvat  Před 4 měsíci

      I don't know what happens to those Discord links - they are supposed to never expire, and yet every now and then someone tells me it doesn't work anymore...

  • @EugeneS88-RU
    @EugeneS88-RU Před 2 měsíci

    How about using pointers in for loop (unmanaged unsafe code)? I think it will be top performance

  • @chudchadanstud
    @chudchadanstud Před 4 měsíci

    Do you have a vid on your vs code setup? I always wanted to use vs code for c#

    • @zoran-horvat
      @zoran-horvat  Před 4 měsíci

      I don't have such a video. Actually, I only started using VS Code consistently in my second or third attempt - I was so hooked up to VS. But now, after learning it well, I struggle when I must use VS again. VS is so bloated and slow compared to VS Code.

  • @yanivrubin7202
    @yanivrubin7202 Před 4 měsíci +2

    Great video, but from my experience, GC has alot of effect on performance. And you didn't check how much memory each method wastes. In case of linq I guess it will be higher. And for the case that the linq code is being called many times, the linq method might be the most wasteful and eventually slower (on smaller collections)

    • @zoran-horvat
      @zoran-horvat  Před 4 měsíci

      Why do you think that LINQ would "waste" memory? What objects does it create, and how many of them?
      Also, I don't think that it is your experience that GC has a lot of effect on performance, and I can tell you why. For any kind of objects used in a method, for GC to have a lot of effect, there must be a lot of those objects in the first place. That means that GC has already been measured in the benchmarks and it doesn't seem to me that that negative effect has shown up.

    • @yanivrubin7202
      @yanivrubin7202 Před 4 měsíci

      @@zoran-horvat It doesn't create many objects. But it creates them each time linq is called. So in my case, we SADLY removed the linq code from hot functions (called several thousand times per second). The GC thanked us :-) And in it improved the program (but it was long time ago I need to retest it to see how many bytes)
      Regarding which objects, you can dive to linq code and see that it 'new' several objects. Mostly for it's internal management.

  • @C00l-Game-Dev
    @C00l-Game-Dev Před 2 měsíci

    Alternatively title for this video: How to make c# go brr

  • @egr0eg
    @egr0eg Před 4 měsíci

    What if you need to iterate over the collection multiple times? Is that a case when calling ToList is acceptable?

    • @zoran-horvat
      @zoran-horvat  Před 4 měsíci +1

      Yes, it is. Actually, you should not be allowed to iterate an unknown IEnumerable multiple times, because some variants will fail if you do, and some will cost you evaluating the source twice.
      Unless you know precisely that the object at hand supports multiple iterations without adding prohibitive costs, you should collect the objects first.
      That opens another set of questions: how large is the data set you are processing. Sometimes, it is equally impossible to load an entire data source into a list either. That is where we turn to the theory of algorithms.

    • @matheosmattsson2811
      @matheosmattsson2811 Před 4 měsíci

      @@zoran-horvat This partially answers my question above. But this gives me a follow up question: Why is it that Intellisense/ReSharper often suggest that my methods should return an IEnumerable instead of a T[]? I mean I kind of get it but at the same time it makes intellisense itself more stupid, as when I use the method and DO iterate the result multiple times, it will complain. Though, I know that the result is an ARRAY and that it should be OK, so I guess calling .ToArray() in this case is basically a "down cast" but feels a bit dumb...

    • @zoran-horvat
      @zoran-horvat  Před 4 měsíci

      @@matheosmattsson2811 It is perfectly legal to return a collection from a method if that method's purpose is to construct it. Code tools may not understand the idea well, and they apply a general rule, which is also valid when there are no other constraints.

    • @matheosmattsson2811
      @matheosmattsson2811 Před 4 měsíci

      @@zoran-horvat Okay, yea I agree. But so you're saying that you see no point in stating the return type is IEnumerable when you know yourself you are always returning a T[]?
      I at least would find it very misleading if a Class Library on NuGet would return an IEnumerable which in fact always would be a collection. It kind of makes the consumer doubt a bit on how it should be used, if I understand the concept correctly

    • @zoran-horvat
      @zoran-horvat  Před 4 měsíci +1

      @@matheosmattsson2811 That is the classic omission in design. The correct decision is to return the least obligation, which is in this case IEnumetable. This conclusion comes from encapsulation - having a collection imposes costs and even limitations that may cause callers to fail, depending on the size of their call. Worse yet, the callers will lock the implementation into a decision to return that particular collection forever after. What if you discovered a more convenient, performant, efficient, or whatever other collection. Well, sorry, you won't be able to reimplement your method because the previous design has leaked out and now everybody depends on it.
      There is, then, that same special case, when the caller dictates a collection that makes it performant, or even is required to make the caller's operation feasible. In that case, that collection becomes a request, and the method must return it. In that case, the method becomes the factory of that collection. As you can see, that situation is quite different from the general case.

  • @Tymon0000
    @Tymon0000 Před 4 měsíci

    For most of my cases in unity speed of iteration doesn't matter. What matters is how much garbage is created.

    • @zoran-horvat
      @zoran-horvat  Před 4 měsíci

      When iterating many objects, most of the garbage are those objects themselves - the iteration method of choice makes little difference.

    • @defufna
      @defufna Před 4 měsíci

      if you are iterating over a list/array of objects that is already created with for loop there is no garbage, with foreach you have enumerator object as garbage.

    • @zoran-horvat
      @zoran-horvat  Před 4 měsíci

      @@defufna First, the cost of the enumerator is amortized across multiple objects in the sequence. Second, if you use an array, then there is the array object itself to count in, and if it is a list, then there is the list on top of the underlying array, which counts as two objects. You cannot count the enumerator as garbage and then forget to count collections. Anyway, the collections still count as 1/N per object, when they contain N objects.

    • @defufna
      @defufna Před 4 měsíci

      @@zoran-horvat It is amortized, but you can have a case where you need to iterate over plenty of small arrays. Arrays and Lists can be long-lived, this is especially the case in games (op mentioned Unity) where you usually load everything at the start and try to avoid any new allocations during the game.

    • @zoran-horvat
      @zoran-horvat  Před 4 měsíci

      @@defufna IEnumerable doesn't even apply to that case, so comparative analysis is not possible. IEnumerable does not warrant multiple iterations.