How IEnumerable can kill your performance in C#

Nick Chapsas

zhlédnutí 109 445

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 31. 08. 2022
The first 100 of you can use code SCHOOL2022 for 20% off courses and bundles at dometrain.com
Become a Patreon and get source code access: / nickchapsas
Hello everybody I'm Nick and in this video I will show you how IEnumerable can harm your application's performance. I will explain why it happens, what you can do about it and how to deal with it in future scenarios.
Don't forget to comment, like and subscribe :)
Social Media:
Follow me on GitHub: bit.ly/ChapsasGitHub
Follow me on Twitter: bit.ly/ChapsasTwitter
Connect on LinkedIn: bit.ly/ChapsasLinkedIn
Keep coding merch: keepcoding.shop
#csharp #dotnet

Komentáře • 233

@TheBreaded Před rokem ⁺²⁰²
I swear this is one of those things resharper has taught me with it's warnings, I rarely see it now because I know better. Great explanation of multiple enumerations.
@MicrosoftLifecam1 Před rokem ⁺⁵
Agreed - Before using Rider I had no idea lol
@ivaniliev93 Před rokem
Me too
@TonoNamnum Před rokem ⁺⁴
Visual studio also gives you the same warning.
@tonyschoborg Před rokem ⁺⁶
@@TonoNamnum there must be a setting for that. We recently just upgraded to Rider from VS and have caught a few times we missed it. I have never seen the warning until we upgraded.
@crack8160 Před rokem ⁺²
haha same, when I installed resharper I got aware of this situation.
@marcotroster8247 Před rokem ⁺⁴⁷
This enumeration style is called co-routine for those who didn't know. You basically have a function on hold that can give you the next element right when you need it 😄
Actually this is a crazy efficient way to represent e.g. endless streams like indices from 1 to n, e.g. for n=int.MaxValue this is 2^31-1 * 4 byte. Your PC would simply explode if you'd call ToList() on it because it's 8GB of data. But a co-routine like Enumerable.Range() could do that with just 2 int variables and 8 byte.
It really makes a huge difference as you can keep this little chunk of 8 byte in faster cache levels of your CPU and crank on it like crazy. A ToList() too less or too much can make your program run 2 hours instead of 1ms 😅😅😅
@fred.flintstone4099 Před rokem ⁺³
I don't think that is called a "co-routine", I think it is called an "iterator". So it is like a class that implements an interface that has a next() method, so the foreach loop calls the next method every time it loops.
@marcotroster8247 Před rokem ⁺¹
@@fred.flintstone4099 Historically speaking, when coroutines were invented, all machines were single-core. Real hardware multiprocessing wasn't even a thing until mid 2000s.
So, the resulting programs back then were fetching from something really similar to what we call an iterator nowadays. It's all about the illusion of concurrency by decoupling consumer from producer.
And honestly, most time in compute is still spent waiting. Waiting for operations to write results back, waiting for registers to be loaded from cache, waiting for jumps because of control flow, waiting for cache synchronization between processor cores, etc. Our code in C# is really just an illusion of what's actually happening.
But feel free to tell me where I'm wrong. Maybe I can learn something profound.
@fred.flintstone4099 Před rokem ⁺²
@@marcotroster8247 No, I think you're right. I think an iterator that waits might also be called a "generator". In C# you can use the "yield" keyword for iterators. There is also IAsyncEnumerable.
@rafaelm.2056 Před rokem ⁺¹¹
I been programming with C# for about 15 years and there are parts about it that still mystify me. Your example of obtaining a count via an IEnumerable reminded me of how I learned on my own a similar situation with your example. In my case I was loading over 100k records. EF was new to me and I couldn't understand why my app was taking a performance hit until I discovered the difference between IEnumerable and IQueryable. From then on it forced me to take into consideration the overall purpose of the program and how to use IEnumerable properly. You are very well versed in the programming language, more than me after working with C# for so long.
On a side note, back when I was learning programming in 1991 I asked a senior developer of our mainframe why people are sloppy with their code. He told me that it will only get worse because as computers get faster it will compensate for bad coding practices and the end result will be lazy programmers. I came from learning to program on a mainframe environment where every byte counted. We ran accounts payable and payroll for 300 employees. All of it was done on a 72 megabyte hard drive.
@marcusmajarra Před rokem ⁺⁹³
A common recurring problem among programmers is not knowing how the code they're using works. At the very least, they should understand what the API commits to doing. Deferred enumeration of IEnumerable is a great feature in C#, but if you're using any API that exposes an IEnumerable object, you should always assume that you need to enumerate at some point, unless your objective is to merely chain subsequent operations to perform on the object.
In fact, if you never actually enumerate the sequence, it will never actually execute, and this is also an easy trap to fall into. So my best advice would be to write your API according to what client code should expect. If you're returning a finite object, rather than returning IEnumerable, you should return IReadOnlyCollection or IReadOnlyList (or any read-only interface). That way, client code knows that enumeration has already been performed. If you return IEnumerable, client code should assume that enumeration will be required, and even the implementation should probably avoid enumerating to a terminal operation.
JetBrains.Annotations also has the [NoEnumeration] attribute that you can assign to an IEnumerable method parameter to indicate that your method isn't performing a terminal operation over the parameter.
@youcantsee3867 Před rokem
What do you mean by 'you should always assume that you need to enumerate at some point, unless your objective is to merely chain subsequent operations to perform on the object' ? I don't understand the part ' you need to enumerate at some point'. Could you explain it for me?
@marcusmajarra Před rokem ⁺⁹
@@youcantsee3867 it means that if you're dealing with an API that provides an IEnumerable object to you, you should assume that no actual query operation has yet to happen behind the scenes. It is only when you enumerate that the query is actually executed.
For example, if the API operation is fronting a database call, no query is run against the database until you first enumerate over the results.
This is different from working with a list or an array, which has already been materialized with contents. The enumerable object has no contents until you enumerate.
@youcantsee3867 Před rokem
@@marcusmajarra Thanks for your reply. I have one more question, so the word 'enumerate' is means calling some method like 'count()' 'toList()' or calling for each. Am I right?
@marcusmajarra Před rokem ⁺¹
@@youcantsee3867 essentially. If you're not digging into the results, you're not enumerating.
@Tekner436 Před rokem ⁺²
@@youcantsee3867 A good example would be for instance in C# you call a function that returns an IEnumerable - IEnumerable list = GetList(); - you would think that doing a foreach (var item in list) twice would use the same results from GetList(); but the IEnumerable interface doesn't actually grab any data until it is 'enumerated' in a foreach. That means each foreach of list will execute all the actions GetList() did to build the IEnumerable result. You could something like
IEnumerable query = customers.Where(c => c.Active);
List result1 = query.ToList();
List result2 = query.ToList();
It's possible that result1 differs from result2. if customers data is backed by a database, each call of ToList (or any foreach statements) on the IEnumerable will build the query, execute it, and return a new result.
@mastermati773 Před rokem ⁺¹⁷
When I started to think about it more deeply, this system is actually very very good:
If we have some enumerable thing A given for a consumer B, how could B assume that it has enough memory to hold all elements of A?
Ans: It can not, and thus it protects itself with this solution of multiple enumerations: If file read in this video was gigantic (let's assume milions of lines) then multiple enumeration IS desired!
The solution is just to use IReadOnlyList, which has enough space saved prior to the enumeration.
@SmoothSkySailin Před rokem ⁺¹
Great video! I always feel good about myself when I know exactly what the problem is and what your solution is going to be at the start of the video... It doesn't happen often, but when it does, I give myself a gold star :-) Thanks for posting such good content!
@michaellombardi3638 Před rokem
You have no idea how much this helped me today! I was looking at a problem where counting an IEnumerable with zero elements in it resulted in a significant delay and I thought I was going crazy! I had no idea that IEnumerable would be lazily evaluated. Thanks for the help! :)
@jamesmussett Před rokem ⁺²⁶
The biggest problem I have with Linq in general rather then IEnumerables is the heap allocation that takes place when evaluating queries with ToList() and the like in memory-sensitive hot paths. In almost every other scenario it's absolutely fine, but it makes my life hell when I have to do rate calculations on 100-500 messages/s.
Would be good to see you cover MemoryPool and ArrayPool at some point, those types have truly saved my bacon!
@nickchapsas Před rokem ⁺⁵⁶
I have a video about object pooling coming, probably around October or November. It's a really interesting topic
@jamesmussett Před rokem ⁺³
@@nickchapsas Perfect, I look forward to it! =)
@sealsharp Před rokem ⁺¹
@@nickchapsas Sweet!
@asteinerd Před rokem ⁺¹²
Great illustration of how/why this happens. Something I can send to my peers that get confused as to why their code is hitting an API twice when running around with IEnumerable or IQueryable.
@asdasddas100 Před rokem ⁺²
I feel like you can explain this in 1 minute if they're already experienced programmers, but if they're new this would be helpful
@emmanueladebiyi2109 Před rokem ⁺¹
Great stuff Nick. Your impact on my programming had been tremendous!
@stephajn Před rokem
This is something I knew about and have been working to pass on to others as well. Thanks for making this video. I will share this with them in the future!
@nocgod Před rokem ⁺⁵
it really is quite clean, there is even a warning (at least in visual studio) CA1851: Possible multiple enumerations of IEnumerable collection
it just requires the developer to read the warning and handle it. (or the senior developers elevate this from warning to a compilation error)
@andytroo Před rokem
the IEnumerable approach is the only sensible one in some situations, if there are too many items to fit in memory.
I find myself using a 'batchBy(int n)' approach: it turns IEnumerable to IEnumerable so that you can work on a smaller list, but if things are too big, you can take them in byte sized chunks.
It does mean something like 'count' (or other things that require global knowledge) can only be accumulated and discovered at the end of the list.
@joost00719 Před rokem
I learned this the hard way too, but it was a very important and interesting lesson to learn.
Glad you made a video on this because it is a very important feature in .NET that can make or break your application.
@MiroslavFrank Před rokem ⁺²
IReadOnlyCollection
@kenbrady119 Před rokem ⁺¹⁰
I seem to remember the LINQ documentation explicitly stating that Enumerables are lazy-evaluated. It is a feature, one that all developers should be cognizant of so that they can force one-time evaluation when appropriate.
@DanStpTech Před rokem
yes I knew and it caused me a lot of trouble. Thank you for your explanation, always appreciated.
@marna_li Před rokem
Great point! There was a riddle posted not long ago, showing what would happen if you essentially Task.Run in a LINQ query. LINQ is defered execution of code so you have to be careful. The auther of the riddle told me about a nasty bug regarding this - logging stuff in a query. Always make sure that you evaluate the query or else you might run code multiple times.
@mariorobben794 Před rokem ⁺¹⁵
My personal choice is to return an I…Collection, so that the consumer knows that the “inner” code isn’t deferred. Of course, there are situations where an IEnumerable is better, for instance when implementing repositories. But such repositories are mostly consumed from other application specific services.
@megaFINZ Před 6 měsíci ⁺²
It's fine if result is supposed to be mutable because ICollection exposes things like Add and Remove. Otherwise you'd want to return something read-only: IReadOnlyCollection or ImmutableList or similar.
@krftsman Před rokem ⁺¹
I was just giving my developers a lesson on this exact topic last week. I wish I could have just pointed them at this video! Thanks so much!
@epiphaner Před rokem
This was exactly the solution I was hoping to see because that is what I have been using for years :)
As for the return type, I always return as specific as possible while accepting as generic as possible.
Worked well for me so far!
@Max-mx5yc Před rokem ⁺¹⁰
IEnumerable - The fast-food restaurant of programming
@Arekadiusz Před rokem ⁺²
Whoa, for the past one year I was getting sometimes warnings "Possible multiple enumerations" and never knew what does it mean :V Thank you!
@harag9 Před rokem
Thanks for sharing this, didn't know about it, but I personally never use IEnumerable. However looking at colleagues code during code review I can now point this issue out to them when I spot it. Cheers.
@Victor_Marius Před rokem
Possible issue. If they're not using the count method on IEnumerable there's no problem. But should point out the multiple resource access if they intentionally iterate multiple times.
@levkirichuk Před 7 měsíci
Great and very important point Nick
@blazjerebic8097 Před rokem
Great video. Thank you for the information.
@markinman3119 Před rokem
Totally didn't know about it. Thanks Nick.
@OwenShartle Před rokem
A key phrase, which is maybe more of a LINQ term, that I was also hoping to hear was "deferred execution". Great topic to be privy to, and great video, Nick!
@MrSaydo17 Před rokem
If I hadn't already been using R# for the last 4 years I wouldn't have ever known about this. Great explanation!
@andreast.1373 Před rokem
I've seen that warning before and had no idea what it meant. That was a great explanation, thanks for the video!
@harag9 Před rokem
Is that warning just a Jetbrains warning, or does it appear in VS2022 now?
@andreast.1373 Před rokem ⁺¹
@@harag9, to be honest, I'm not sure but I believe it's only a JetBrains warning.
@harag9 Před rokem ⁺¹
@@andreast.1373 Thought it might be, never seen the warning in VS. Cheers.
@LordXaosa Před rokem ⁺²³
Materialization is not good option too. What if you file is 200GB size? Or what if there is pseudo infinite enumerable like network data or reading database cursor? So you can't always cast to list because of memory. So yes, watch you code and do what you understand. yield return is not bad if you know what you are doing.
@henrikfarn5590 Před rokem ⁺⁵
I agree! Understanding your code is the mantra - at one point in time IEnumerable was THE way to do it in my company. For large payloads IEnumerable is great but applying it everywhere is an antipattern
@battarro Před rokem ⁺⁵
Then treat it as a stream. On the scenario he gives, if the file is 200GB ReadAllLines will create a 200GB memory array of strings, so ReadAllLines is not the appropriate method to read such a large file, you have to stream it in.
@billy65bob Před rokem ⁺¹
@@battarro it would actually be well to 400GB, the file is likely ascii/utf-8, whereas the in memory representation is UCS-2, which is 16-bit (and similar to utf-16).
@DanielLiuzzi Před rokem
@@battarro ReadAllLines won't do streaming but _ReadLines_ will
@tonyschoborg Před rokem
Funny you should come out with this video. We recently upgraded to using Rider and have caught a few times we missed this when using Visual Studio. Thanks for the content as usual!
@carldaniel6510 Před rokem ⁺¹
I rolled my own "CachedEnumerable" which lazy-caches the results of an enumeration - it's a wrapper over IEnumerable which tests the underlying enumerable (e.g. is it IList) and skips the cache for enumerables that are already cached/array-based. Using it gives me the best of both worlds - lazy enumeration and automatic caching.
@nanvlad Před rokem
By introducing caching layer you lose actual source data, so if between your enumerations file/db is changed, you have to implement your own cache updating to give the latest set of items to consumers
@carldaniel6510 Před rokem ⁺¹
@@nanvlad Yep. So I don't use it there.
@stefanvestergaard Před rokem ⁺⁴
You could also adress how yield return's are dangerous in that the source list can be changed between enumerations, e.g. items removed between the .Count() and the output.
@superior5129 Před rokem ⁺⁷
A bigger problem with methods that return IEnumerable is when they take parameters like a Stream or any IDisposable.
@leahnjr Před rokem
I did not know/understand this, but now I totally do. Thanks!
@xavier.xiques Před rokem
Very useful video Nick, thanks again
@akeemaweda1716 Před rokem
I didn't know about this before and will be more careful about the usage going forward.
Thanks Nick
@stoino1848 Před rokem
I knew about that and also felt into the downsides of it. Since then I am cautious when I get an IEnumerable and check my call stack if it is used multiple times (aka enumerated).
But also I remember to have read in the official c# best practice guide to use IEnumerable as return type and parameter. (did not looked it up again).
@Max_Jacoby Před rokem ⁺¹
The most surprising fact is an amount of people who doesn't know that. If you ever used "yield" keyword it kinda obvious that IEnumerable MyMethod() returns T one by one and don't store the whole thing in memory hence the second call to this method will calculate Ts one by one again. I can see a confusion though if you always get IEnumberable from a third party and never used "yield" keyword yourself then yes, it's not obvious.
@jongeduard Před rokem ⁺¹
Basically it's very simple: LINQ is a pipeline (and chained yield returning function calls are as well). It's a series of enumerators chained together like a single expression, and will not be running until you run a loop on it, to perform actual work.
A function like Count() is a terminating operation, because it does not return an IEnumerable by itself but a computed, numeric result, meaning it has to run a loop on the the preceding expression.
And a self written foreach loop is basically another terminating operation.
ToList and ToArray are as well, they create a new collection in memory and run a loop to fill it with data.
This means that ToList and ToArray come with the disadvantage of extra memory allocations.
While not using them and repeating loops on the expression come with a time and CPU usage penalty, like basically shown in the video.
@billy65bob Před rokem ⁺¹
Count() is actually smart, if the underlying type is an ICollection, it will return ICollection.Count instead of evaluating the IEnumerable.
@Kazyek Před 9 měsíci
Also, for this specific example, or any time you want to return some kind of finite collection on which you want to be able to enumerate and get a count, you can return a ICollection.
The adventages to returning an interface type is to be able to change the underlying collection if needed without impacting the usage of the method.
For example, if eventually you have work to do on each entities and want to parallelize it, then you might want to use a ConcurrentBag instead of a List, but both would satisfy the ICollection signature, so no refactor is needed for consumers of the original function.
@KingOfBlades27 Před rokem
This multiple enumerations text occured to me as well Resharper which is the sole reason I am aware of this behavior. Really good thing to teach to people.
@abdellatifnafil Před rokem
thanks man u r the best!
@josephizang6187 Před rokem
I didn't understand this problem this way. I usually find myself consious when using EF mostly and then when jus working with IEnumerables, I tend to get sloppy with my handling this. THANK YOU Nick🙃
@urbanelemental3308 Před rokem ⁺⁴
BTW, there's a CSV competition article that covers all the CSV parsers for .NET and since the last time I looked the Sylvan.Data.Csv library was the winner and shockingly fast even when using types.
@billy65bob Před rokem
Oh, that's neat.
I'm still using TextFieldParser from the VisualBasic namespace, since it's the best one (and the only one) built into .NET itself.
@flybyw Před rokem ⁺³
When you switch to .Select(), the file is only read once while each line is selected twice; and then you could just append .ToList() to the .Select() to return a list of Customer's without splitting each line twice.
@FunWithBits Před rokem
yay - I was able to find the issue before Nick pointed it out. =) Though, I only looked for it because Nick pointed out there was a problem though...probably would not have cought it in real life.
@dolaudz3285 Před rokem
Just came across this warning a few days ago for the first time in Rider.
At first glance, it might seem like something insignificant, but in my case this saved a few seconds of execution (in scale) for some flow.
@alexanderkvenvolden4067 Před rokem
I wonder if it's worth writing my own wrapper IEnumerable implementation that takes in an IEnumerable and caches the values as it enumerates. Then add an extension method to make it Linqy, like "CacheItems", now I'd have a drop-in way to make any IEnumerable safe to multiple enumerate, while retaining the performance of not needing to convert to a list right away.
@Victor_Marius Před rokem
And caching would mean saving to a list? Well that is the same as calling ToList() when you are creating the IEnumerable but more complicated 😅. What you could do is to create an enumerable type that stores and updates the size into a single int private member while enumerating the first time and a count method that returns that size. But this would return a size of 0 for non enumerated enumerables. Or just use an out argument (like in this video it was reading the lines from a text file - just set the number of lines into that out argument or an external variable). But I would avoid enumerating more than once or even using count for type that are not supposed to have a known size before enumerating.
@alexanderkvenvolden4067 Před rokem
@@Victor_Marius Those are good points. It would end up doing the same thing as ToList. However, it would preserve most of the benefits of using an enumerable over a collection type. You wouldn't need to fully enumerate the type before using it (like you would with ToList). This would improve performance for expensive enumerations, as well in the case of a partially complete enumeration.
@Marfig Před rokem
The general advice for any caller of iterators that return IEnumerable, is to not mix cursor calls like ForEach, with aggregate functions like Count if both results are in scope of each other, unless the first call casts the result to a collection or array and the second call uses that cast result instead. Not doing that is not just a matter of performance; that's even potentially the least of our worries. The problem is instead that most likely we just introduced a potentially hard-to-find bug if the source data can be changed by a third party between both calls. But if both calls are not related and they are out of scope of each other, do not cast. That's a potentially expensive operation in itself.
@paulovictordesouza1720 Před rokem ⁺²
Oh boy, this one hit hard on me
Some time ago I've had to import a 15 billion line csv into database and IEnumerable gave me the impression that it would help me but it actually didn't 'cause of the multiple interations.
The only solution that occurred to me in that time was to slowly add some values to a list and them import, otherwise it would throw a memory exception.
Without this approach, the entire proccess would take almost 3 hours to complete. After some modifications and making it more "listy" it ended up being just some minutes.
@quantum_net219 Před rokem
This video made me subscribe 😁
@HazeTupac Před rokem
Thank you for tip, quite interesting.
One question.. Does your courses come with certificate at conclusion?
@CeleChaudary Před rokem
Thanks 👍
@ChronoWrinkle Před rokem
it never come to me that method enumerable behaves same way, but indeed it does. Nice stuff ty!
@tussle2k Před rokem ⁺¹
Yay, new video 👍
@smwnl9072 Před rokem
The beauty of IEnumerables is lazy/deferred execution.
A trap (per this video's message) if you don't have a grasp of what it is.
Lazy/deferred execution I believe was borrowed from the Functional paradigm.
The idea is that you have a set of logic/algorithm which wont be executed/evaluated
unless with explicit intention.
In C# LINQ, you express the 'intention' by calling operators like
.First()
.ToList()
.Count()
.Any() etc.
Examples of lazy LINQ operators,
.Where()
.Select()
.OrderBy() etc.
These return an IEnumerable of .
Lazy/deferred execution shines when composing/chaining functions and
when you intend to use your functions in between a "pipeline". Hence the above 3 are often used in a query chain/pipe.
Pertaining to collections, lazy evaluation passes only 1 item to each node/operator in the chain/pipe at a time.
But for eager evaluation, the whole collection is evaluated and passed down.
If there were conditions of 'early breaks', the latter won't benefit as the collection has been prematurely evaluated.
E.g. a lazy pipe/chain
products
.Where(p=> p.InStock()) // each product 'in stock', will flow down..
.Where(p=> p.Price < 3.14) // but only 1 at a time and not the full list because 'where' is lazy.
.Select(p=> p.ToShippable()) // Concatenated lazy chains act and behave as one (select is also lazy).
// I often combine multiple individual lazy operators to solve complex problems with very little concern for performance penalty.
// Shifting the order of the operators around is also quite easy as they are somewhat stand alone..
@dusrdev Před rokem
Hey Nick, I have recently switched to Rider and I am curious, which theme are you using?
@figloalds Před měsícem
I heavily use co-routines on my applications, specially for large operations, I can read 28k lines from the database, make them as objects, turn them into JSON and send them over to clients without loading 28k things in memory, then making 28k objects, then making a 28k items json array, saves a lot of RAM and avoids high-gen GC.
@Spartan322 Před rokem
Makes me think it be nice to have an enumerable type that when called statically constructs the IEnumerable once with minimal overhead when the function is called, while using ToList is clear, it would be nice to designate from the function definition without requiring to produce a list or other container explicitly. Lazy loading and enumerable reconstruction is deceptive when you're used to how containers work especially when this behavior is built into the language.
@knowledgeforfun838 Před rokem
I had to learn it the hard way when checking for a performance issue in production.
@za290 Před rokem
Thanks for this video. I don't use IEnumerable. After that video i'll still so :) but i learn why i'm not.
@TkrZ Před rokem ⁺¹
loving the Barking joke at the start 😂
@nickchapsas Před rokem ⁺¹
At least one person got it 🥲
@rpp1502 Před rokem
Great explanation, We need Zero to Hero:
Microservices in C#
@kawthooleidevelopers Před rokem
Hi Nick,
Just finish your minimal api course. What the most optimize way to connect and work with CosmoDb? Is Dapper going to work with CosmoDB? A lot of sample codes are using DbContext and Microsoft document just show how to work directly call the database and container. Is that the best way to work with it? Maybe you've done a video on it, I just can't find it. Appreciate you sharing with us.
@nickchapsas Před rokem ⁺¹
I would simply use the SDK of cosmos db directly. It’s pretty good
@kawthooleidevelopers Před rokem
@@nickchapsas thank you, brother. I will give that a go. Appreciate you help.
@Moosa_Says Před rokem ⁺⁴
Hey nick, shouldn't we just use List as return type for collections every time? and IEnumerable only in cases where we are sure that we need it ? or there are disadvantages of using List everywhere? would love to hear your thoughts. Thanks :)
@phizc Před rokem ⁺¹
Only problem would be if you don't want the user to change the items in the list (use IReadOnlyList or IReadOnlyCollection then) or for interfaces or abstract implementations, though in that case I think you can still return the List, even if the interface says IEnumerable.
Basically, the only place I would ever have an IEnumerable return or out parameter is in an interface that might need it to be that way.
Of course, if you *are* enumerating something and it doesn't make sense to return a list, do return an IEnumerable. E.g. an "infinite" list.
Example: infinite fibonacci sequence
IEnumerable Fibonacci()
{
long prev = 0;
long curr = 1;
while(true)
{
yield return curr;
var p = prev;
prev = curr,
curr = p + curr;
}
}
Of course it's not "infinite". Since it grows with a factor of 1.618, with long it'll take less than 100 steps.
Consider BigInteger for a more painful experience 😁. Or enumerating every integer.
@Moosa_Says Před rokem
@@phizc Thank you for sharing your opinion. So, I think I'm saying right that use IEnumerable only in particular cases while Lists actually have more use cases when considering real application case scenarios. I asked this question cuz I've seen a lot of the people always returning IEnumerable and then doing .ToList(); to use it. maybe they do it to maintain some level of abstraction ..?!!
@jackoberto01 Před rokem
I think it's up to the person using the code. Like Nick mentioned you might want to add in a Where clause or other instructions before enumerating. Deferred execution is a good feature of C# if you know how to use it. You can also avoid iterating the whole collection in case you use methods like First, Any or similar methods. In any case where you only need one iteration a IEnumerable works fine, if you need multiple iterations you can use ToList or ToArray first so for me an IEnumerable is best as it's flexible
@Moosa_Says Před rokem
@@jackoberto01 Yeah i think it depends on the case...but again i don't think you'll be able to use IEnumerable more than List as List cases are more in my experience.
@TheMonk72 Před rokem ⁺¹
@@Moosa_Says I deal with files that are large enough that they just don't fit in memory often enough that it's not worth writing different code just for that case. But it doesn't matter if the file fits in memory, if I don't need to access the data by index and can process it sequentially, I have no reason to load it all at once.
@co2boi Před rokem
Good stuff. Curious, at the end of the video you said "I probably wouldn't return IEnumerable, I would probably return the Type". Can you expand on that?
Also, in your example you are getting a count. Wouldn't it be better to use ICollection instead?
@borisw1166 Před rokem ⁺¹
I think what he means is an API designing question and what your intend is how to use your API. Ienumerables look like your you can filter and actually late execute the code behind that. While arrays or read-only lists express that there is no sense in filtering, because the "heavy" code is always execute, whether you filter or not. In his example filtering makes no "sense" since the file reading all the lines, creating and returning the object anyways. But ienumerable let's you think that you could filter in that and it actually makes a difference. Having an array or read-only list makes it very clear: the file is read anyways.
@siposz Před rokem
If a function actually return with a List then the function return type should List, not IEnumerable. In this way the caller exactly know, what they get back, and could consume it optimal way.
If I you see an IEnumerable return type, you don't know, what happens if you call a Count() on it. Anything could happen, for example a 5 second long lasting database call. Or it could throw FileNotFound exception.
But a List is easier to deal with. If it's not null, Count() will be ok.
@keithrobertson7579 Před rokem
I would worry about using ToList() in the general case where the result set can be large. Imagine you're connected to a database querying millions of records. Unless you restrict the query so that you KNOW the data set will be small-ish, you shouldn't use ToList(). Also, the example here uses Count. This is an issue when applied to a custom iterator which LINQ can't incorporate; but if you're just using Where, etc. on expressions which LINQ can process into the query, it should become a SELECT COUNT(*) query, which doesn't walk all the records. Worth mentioning that Count+Walk is not necessarily bad on its own; the issue is that it's being applied on a custom iterator. One should always step through code like this in the debugger to make sure it's working as expected, and take into account the possibility of a large data set.
@carducci000 Před rokem
I do actually know of this, and typically do take this into account; I'd be lying if I said I catch myself [or others] every single time :). It's one of those things you miss if you're working fast
@ILICH1980 Před rokem
good to know, did not know before
@ltklaus5591 Před rokem
I've switched to using IReadOnlyCollection or IReadOnlyList in most cases. The only time I use IEnumerable is when I don't need/want all items to be in memory at the same time, or if there could be a reason to only enumerate some of the items. If I had a CSV with 1,000,000 customer names and I wanted to know how many Nick's there are, I could read the file line by line, check if the name is Nick, increment a count, and move to the next line without storing all the names. Or if I wanted to get the address of the first 5 Nick's in the file, I could enumerate till I find the 5 Nick's, and then stope enumerating.
@lifeisgameplayit Před rokem
I havent watch vid yet Nick "Epic" Chapsas is a Legend
@lpmynhardt Před rokem ⁺¹
Hi Nick, love your channel, could you explain why the following is 20x slower on my pc? I have seen it mentioned here or there but never seen a good explanation
using System.Diagnostics;
var list = Enumerable.Range(0, 100_000_000).ToArray();
IEnumerable enumerable = list;
Stopwatch sw = Stopwatch.StartNew();
foreach (var i in enumerable)
{
var d = i + 1;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
sw.Restart();
foreach (var i in list)
{
var d = i + 1;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
The IEnumerable is much slower than enumerating an array, if I change the order around so the array is enumerated first and the IEnumerable second, the result is the same (array is much faster still)
@ryan-heath Před rokem ⁺¹
The foreach ienumerable is implemented using the iterator interface (movenext, current methodes)
The foreach array is implemented like a for loop, no method calls involved.
@lpmynhardt Před rokem
@@ryan-heath Thanks, that makes sense
@phizc Před rokem
@@ryan-heath except the compiler, or at least the JITc would know that it is a list, just presented as an IEnumerable.
For the JIT to optimize it that way it might have to upgrade to Tier 1 compilation though, and it'll only do that if it's explicitly told to, or encountered the method 30+ times.
@CrapE_DM Před rokem ⁺³
Interesting. In the languages I work with, doing something like this simply fails because the iterable can only be iterated over once, so you'll find out quickly that you need to cache the results to use them twice.
@cyril113 Před rokem
@Adam M java also
@Crozz22 Před rokem
This happens in C# for `IEnumerator`. However `IEnumerable` is really just a factory of `IEnumerator`s
@aaron4th2001 Před 5 měsíci
I questioned the system when I used a Where clause on a list and then when I modified a value on that list it suddenly got added/removed based on the where clause. When I never updated the IEnumerable collection to reflect the changes. After trial and error I debugged it and write a before and after count and my value change somehow magically reflected in the collection, I got an inkling of how this enumeration worked after watching this video I've now discovered that Count and iterating through it, reexecutes the Where clause everytime.
@chazshrawder8151 Před rokem ⁺⁷
I learned this the hard way playing with Entity Framework when it first came out years and years ago. It was not a fun or quick learning experience! Unlike this video, which was both fun and quick 👍
@MofoMan2000 Před rokem
This misunderstanding comes from the fact that enumerating with "yield return" statements effectively turns the method into a coroutine. Once the end of the method is hit, it is considered enumerated and local variables are deallocated. Then you start enumerating it again and it has to redo everything.
@user-tk2jy8xr8b Před rokem
Strangely there's no OOB class or ext method to do this better. ToList would create a list that reallocates as it grows. What would be cool is a linked list of exponentially growing blocks - takes O(n) memory and time to build, but more efficient than just a linked list. And a common rule is: "iterate multiple times - use IReadOnlyCollection"
@the_wilferine Před rokem ⁺⁶
Awesome video as always!
It’s worth noting however that the implementation of GetCustomers using Select behaves subtly differently to yield return. The call to GetCustomers itself is deferred until enumeration when using yield return whereas it’s called only once when using the Select, when it is assigned to the customers variable. Still absolutely a performance issue as the iteration over the lines still happens twice but the file is only loaded into memory once in the Select example.
@Bankoru Před rokem
IEnumerable is my favorite monad
@guiorgy Před rokem
I knew of this, though don't remember since when or why. Maybe when I was trying to work with a database once in the past 🤔
@phizc Před rokem
I've worked on IEnumerable/IEnumerator classes the last couple of days for getting the path to hard links (NTFS) of a specific file. It works, but dang is it annoying. I tried to only make an IEnumerator, but that's not good enough for "foreach". It really wants to call that shiny GetEnumerator method on IEnumerable.
After watching the video, I've decided to just get all the links at once and return an array. The NT kernel methods are set up as enumerating (GetFirst/GetNext), but there's only ever going to be less than 1024 links (hard coded in Windows), and realistically, less than 10.
Also, for my purposes, and probably everyone else's, it doesn't make sense to just get the first few, or not turn it to a List/array anyway.
There's even a winapi method to get the count, before trying to enumerate them.
@wiktormaek9973 Před rokem
Very similar topic to IQueryable and materializing query too soon. First time you will load whole big table you'll learn. Seen it in production with user profiles, works great till we've got lot of asian customers that actually can have some really common and short names/lastnames, effectively query was loading everything searched by phrase because it was mapped in the way that materialization happen at some point and then take-skipped.
@phizc Před rokem
Similar. Common pitfall is to define the variable as an IEnumerable instead of using _var_. Been watching too much NDC videos lately 😄
@masonwheeler6536 Před rokem
8:45: "Know that the warning might be there but there might not be multiple enumeration in every single one of those occasions."
Enumeration is the process of _going over the elements of the enumerable,_ not of creating it. When you have a List, LINQ's Count() can call the Count property directly and not have to enumerate the list to count it. But if you had multiple foreach loops or LINQ queries, that would indeed be multiple enumeration even if it's of a List or an array.
As the video says, the warning is confusing. The problem isn't multiple enumeration; it's multiple _generation,_ which can be a problem for more reasons than just the performance hit. If the generation of the enumerable is non-deterministic for whatever reason, (maybe you have a call to a random number generator in there, or you're querying a database twice and someone else INSERTs something into it in between your two calls,) you can end up enumerating two different sequences of values when you intuitively thought you'd be enumerating the same values twice, which can cause bugs in your code.
@goremukin1 Před rokem
I know about this feature and always take it into account. But I know too many developers who don't know about it. It's easier to list those who know
Most often I see the multiple enumeration warning on projects where people use Visual Studio. I think it's partly Microsoft's fault that they still don't warn people about the multiple enumeration possibility, so people don't care
@vertxxyz Před rokem
I feel like you should also show the rider-specific "why are you showing me this warning" link they usually build into the alt-enter menu for warnings like these (when they exist)
@nickchapsas Před rokem
This is such a good idea for a video or a short actually
@crifox16 Před rokem ⁺¹
so basically yield returning an IEnumerable is great when you work with transient data that doesn't get reused? just to know if i understood right
@doneckreddahl Před rokem
Can anybody telll me what Nick is using to show stuff like "x: "Nick Chapsas, 29"" and "splitline: string[2]" when he debugs? It seems to show the count as he debugs as well.
@nickchapsas Před rokem ⁺¹
It’s just part of the Rider debugger
@EvaldasNaujikas Před rokem
Great video, but I think it is also important to mention that calling ToList() should be done only if underlying implementation is not enumerated. For example, in your example when your IEnumerable was returning List (instead of yield), a call to ToList() would copy the same list, which increases memory usage. And for new developers, they could start thinking after the video that ToList() should always be done if they are using a method that returns IEnumerable.
@nickchapsas Před rokem ⁺¹
There are checks in ToList to prevent the extra allocation so you won’t increase the memory
@EvaldasNaujikas Před rokem
@@nickchapsas but why then rider shows additional allocation of System.Int32[] and a new array in memory? And that additional +1 only happens AFTER ToList(). See the image here: snipboard.io/XkNj4A.jpg
@EvaldasNaujikas Před rokem
And it even does the same if I use List as return type for GetNumbers. After ToList - a new array is allocated in memory.
@EvaldasNaujikas Před rokem
And just for fun, I added four ToList calls one after another. dotMemory still sees the allocation snipboard.io/e2NrTl.jpg
@stempy100 Před rokem
@@nickchapsas incorrect. .ToList() will create a new list.
@EverRusting Před rokem
I love that JetBrains catches possible multiple enumerations
BUT OH MY GOD If you don't enumerate the same IReadOnlyList multiple times it will NAG You endlessly to change to parameter to IEnumerable
Which is annoying because your parameter already conforms to IReadOnlyList then it will again nag you to change it back when you enumerate one more time
@akumaquik Před rokem
Ive know this for awhile and I always thought it was a problem with .Count(). I have creatively coded around .Count() in many projects.
@corinnarust Před rokem
I really want a Nick Chapsas for Rust!
@hero1v1 Před rokem
Resharper taught me, now i understand it
@EPK_AI_DUBS Před rokem
What happens if I want to return an empty list? You previously said it was better to use Enumerable.Empty, but I cannot do that if the method returns directly a List, right?
@phizc Před rokem ⁺¹
You can return an IList, ICollection, or a readonly version of those interfaces.
Then you can return Array.Empty(). That one also doesn't allocate, or at least, only once.
@billy65bob Před rokem ⁺³
If you must return a List, a count of 0 will wrap an Array.Empty; Not great, but the overhead isn't too bad.
Using IReadOnlyCollection or similar so you can use Array.Empty directly is preferable though.
@teckyify Před 7 měsíci
The yield in the the example does not have the same behavior as returnn list.Select()
@codefoxtrot Před 5 měsíci
Yes, I knew about this problem because Rider told me :)
@zdavzdav86 Před rokem
I could be wrong but I get the impression that most people tend to materialize `IEnumerable` instantly and that there's a lack of understanding on how `IEnumerable` works. Also, that was me some years ago...
@sergiuszzalewski1947 Před rokem ⁺²²
The rule is simple - return precise types, and accept abstracted types. If you return List, then your method's return type should be List not IEnumerable. So consumers can exactly now what is the actual type and if they want to, they can limit it to an interface implicitly.
@andreikhotko5206 Před rokem ⁺¹⁰
That's right, I follow the same approach. Just one note: there are also interfaces like IReadOnlyCollection, IReadOnlyList, IList, which I prefer to use for returning type.
@MirrorBoySkr Před rokem
What is the better to use in such cases?
ToList() or ToArray()?
@nickchapsas Před rokem
It depends on what you wanna do with the result
@MirrorBoySkr Před rokem
@@nickchapsas I just want to enumerate. So, it seems to me, ToArray() is more suit. But, I see that most of people around me use ToList().

Další v pořadí

Automatické přehrávání

C# Yield Return: What is it and how does it work?