NativeAOT in .NET 8 Has One Big Problem

Nick Chapsas

zhlédnutí 26 833

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 10. 12. 2023
Use code CLEAN20 and get 20% off the brand new "Deep Dive in Clean Architecture" course on Dometrain: dometrain.com/course/deep-div...
Check out the Complete Clean Architecture bundle: dometrain.com/bundle/from-zer...
Get the source code: mailchi.mp/dometrain/gjcpqdbkf90
Become a Patreon and get special perks: / nickchapsas
Hello everybody, I'm Nick, and in this video, I will show you how fast or slow NativeAOT is in .NET 8 and explain why that's the case.
Workshops: bit.ly/nickworkshops
Don't forget to comment, like and subscribe :)
Social Media:
Follow me on GitHub: github.com/Elfocrash
Follow me on Twitter: / nickchapsas
Connect on LinkedIn: / nick-chapsas
Keep coding merch: keepcoding.shop
#csharp #dotnet

Komentáře • 93

@DamianPEdwards Před 5 měsíci ⁺¹¹¹
Few things that cause the slightly lower performance in native AOT apps right now. First (in apps using the web SDK) is the new DATAS Server GC mode. This new GC mode uses far less memory than traditional ServerGC by dynamically adapting memory use based on the app's demands, but in this 1st generation it impacts the performance slightly. The goal is to remove the performance impact and enable DATAS for all Server GC apps in the future.
Second is CoreCLR in .NET 8 has Dynamic PGO enabled by default, which allows the JIT to recompile hot methods with more aggressive optimizations based on what it observes while the app is running. Native AOT has static PGO with a default profile applied and by definition can never have Dynamic PGO.
Thirdly, JIT can detect hardware capabilities (e.g. CPU intrinsics) at runtime and target those in the code it generates. Native AOT however defaults to a highly compatible target instruction set which won't have those optimizations but you can specify them at compile time based on the hardware you know you're going to run on.
Running the tests in video with DATAS disabled and native AOT configured for the target CPU could improve the results slightly.
@parlor3115 Před 5 měsíci ⁺⁵
My thoughts exactly
@proosee Před 5 měsíci ⁺⁴
Same old, same old - you trade startup time, executable size and memory consumption for performance like in almost every software before. But it was nice to see details, especially I didn't know that CLR is able to recompile with different settings some paths, so thank you for sharing - it is quite smart actually.
@SlackwareNVM Před 5 měsíci ⁺⁶
I'm curious, would it be possible in the future for a JIT application with Dynamic PGO that has run for a while and has made all kinds of optimizations to then create a "profile" of sorts that could be used by the Native AOT compiler to build an application that is both fast in startup time _and_ highly optimized for a given workload?
@proosee Před 5 měsíci ⁺²
@@SlackwareNVM you said it yourself - you need to create a profile, keep it updated, save it and load it on startup - there is always a trade-off, but you can always make it smarter, for sure.
@terjeber Před 5 měsíci ⁺²
I also think this type of performance testing is not particularly good. For example, the client hits the same endpoint every time, which gives the JIT compiler ample opportunity to radically tune performance for that specific code path. In theory the JIT might stop executing most of the code, since it doesn't change at all under way.
It would be far more interesting to pop some DB in there fill it with a few million records, and vary the test so that it retrieves different (or random) dataset each time. That would remove a opportunity for the runtime to optimize the code, and having a server responding with the same data on the same url over and over is nowhere near realistic.
@jimmyxu3819 Před 5 měsíci ⁺¹³
Your docker image base is different, your result can't be compared each other. Make sure you using same linux version.
@robwalker4653 Před 5 měsíci ⁺¹
This is my go to channel for all things .net. Gets to the point straight away!
@FatbocSlin Před 5 měsíci ⁺⁸
When comparing docker performance, you are comparing apples to oranges.
Docker base image does make a difference, you use Ubuntu 20.04 as base for your native image, .NET 8 SDK uses Debian 12 as base.
I have compared standard .NET docker images against one based on Clear Linux, and there was 9% difference, more than the difference you found in your test.
.NET depends on libraries included in the docker image.
@wknight8111 Před 5 měsíci ⁺²⁶
JIT is interesting because on one hand you're starting an un-optimized application and expecting to compile and optimize at runtime so people think it's going to be slower. BUT, JIT has access to all sorts of runtime statistics and runtime type information that the AOT compiler does not have. This enables some very interesting and aggressive optimizations, in theory. I don't know the full details of everything Microsoft's CLR JIT attempts to do, but the possibilities are there for JIT to be better performant, especially for long-running applications. AOT will always win for startup time and short-lived applications, but for long-running applications it's not as clear and JIT often has some advantages.
@_iPilot Před 5 měsíci
So, if we will share that statistic with the AOT compiler it will produce even more effective application code, won't it?
@NicolaiSkovvart Před 5 měsíci
@@_iPilot it seems extremely likely that Static PGO + AOT would competitive if not better than Dynamic PGO + JIT. Sadly the Static PGO experience is pretty poorly supported
@wknight8111 Před 5 měsíci ⁺⁹
@@_iPilot The problem is that you can't get runtime statistics until runtime. Everything else is just a guess, and if you guess wrong the AOT may optimize for the wrong types and make the situation worse.
@modernkennnern Před 5 měsíci
@@wknight8111you could theoretically run the app in JIT mode and then use that metadata to compile for AOT
@_iPilot Před 5 měsíci
we are in age of telemetry, so runtime data can be uploaded somewhere like logs (it is actually logs, btw) to be analyzed by external application.
@caunt.official Před 5 měsíci ⁺⁴³
4% losses on performance doesn’t really matter. What is interesting here is the actual point of bottleneck. Does NativeAOT perform better or slower with encryption algorithms? Does it perform better or slower with heap allocation? What’s exactly does affect the performance
@BlTemplar Před 5 měsíci ⁺²
AOT will always perform slower than CLR because it doesn’t have JIT and can’t optimise hotpath.
But it will consume less memory because the code is already compiled and optimised to some extent by default. CLR needs to do all that work during runtime this is why it will also consume more memory and some extra CPU resources until the code is optimised.
@nocturne6320 Před 5 měsíci ⁺⁴
@@BlTemplarAOT should absolutely be faster than JIT. If AOT is performing slower, then the compiler is garbage. If a program written in C++ is slower than one in Java, it means the C++ code is bad, not that Java is faster than C++
@BlTemplar Před 5 měsíci ⁺¹
@@nocturne6320 I am talking not about C++, I am talking specifically about AOT in C#. It’s a highly dynamic object oriented runtime which is hard to AOT. It won’t be faster than CLR in the nearest future.
@nocturne6320 Před 5 měsíci ⁺¹
@@BlTemplar True, but with smarter compilation it definitely has the potential to outclass JIT, I wonder how much the MethodImpl attribute affects the performance currently
@maxdevos3201 Před 3 měsíci ⁺²
Yes, it does! 4% matters a lot! This type of thinking is why software bloat has managed to completely undermine the hardware advancements of the last 30 years
@MatteoGariglio Před 5 měsíci
Hi Nick, thanks for your nice work and videos, very instructive and helpful. Could you do one about JIT compiler and the CLR? THANKS!
@BozCoding Před 5 měsíci ⁺¹
I'm interested in using it within chiseled docker containers :) I'm sure that more changes will happen in the future to improve these too, especially as we'll see less memory usage and probably less CPU usage.
@dimitris470 Před 5 měsíci ⁺⁶
I have a feeling that any such difference is going to be swamped by I/O latency IRL anyway
@_iPilot Před 5 měsíci
It looks like Microsoft is focused on reducing containers startup time including delivery to registries and to host machine. Actually, some huge applications can have several Gb container sizes, but when split to microservices that inevitably have duplicate container layers that leads to huge overhead on data transfer during deployment.
@emjones8092 Před 5 měsíci
Where did we land on memory consumption and cpu consumption comparisons. A smaller distribution already conserves lots of resources.
Which is one of the big points:
Scale to zero, cold boots, better memory efficiency, and smaller binaries are what I’m after.
@tarun-hacker Před 5 měsíci
Hey Nick,
You should probably check profile guided optimisation for AOT in .NET for better results 😅
@zoltanzorgo Před 5 měsíci
That was interesting! I am currently working on a project that has one component running on PLCs. Yes, it is a PLC with an embedded RT Linux on top of a 600Mhz(ish) single cored ARM. The flash is somewhat limited, and it is also coumbersome to install the runtime, because there is no app repository like you have for mainstream distributions. Hence I decided to publish to linux-arm with AOT. As it will also run the CoDeSys 3.5 PLC runtime alongside, I need to be careful not to stress the resources. I was very curious what difference I could expect. It is a somewhat different workload, but still, It is good to know that I might have to consider installing the runtime anyway.
@astralpowers Před 5 měsíci ⁺⁶
I really want to use native AOT in our AWS lambdas. In my testing using the NET7 AOT lambda template, the startup is faster, and the performance is more stable. For one application , in the normal non-AOT lambda, the performance performance deltas are all over the place, ranging from 2ms to 400ms, but the AOT version had performance that was between 1.2ms-4ms, all the while using less memory.
@Denominus Před 5 měsíci ⁺³
We are doing early experiments with .NET 8 AOT. So far the latency stability, lower resource consumption and startup time improvements, even in long running apps, dramatically swings the cost/performance ratio in AOTs favor (in our tests). The sacrifice of some theoretical techempower peak performance for perf that actually matters, is completely worth it.
We have some services that were rewritten in Go some time ago. The .NET AOT side has a ways to go before it can match that cost/perf ratio, but it’s looking promising.
@viko1786 Před 5 měsíci
The AOT might be a great idea for something like Lambda in AWS. Quick spawn, go and kill process
@raduncevkirill Před 5 měsíci ⁺⁸
I am wondering if the comparison is consistent when having different base images for the two APIs. Default one running on debian-slim and native-aot running on ubuntu. It shouldn't make a significant change, though, as Microsoft's benchmarks yield the same results.
@nickchapsas Před 5 měsíci ⁺⁶
It doesn't matter. The biggest difference is on the OS level. The only real difference btween the slim or alpine versions are image size which doesn't play a role in runtime performance
@FraserMcLean81 Před 4 měsíci
Thanks Nick. Whats your terminal plugin that shows different file types in different colors?
@lylobean Před 5 měsíci
@Nick Did you check these differences are still valid when your project uses the setting OptimizationPreference Speed, for its AoT compilation. As I think it defaults to size.
@warrenbuckley3267 Před 5 měsíci
I'm also wondering if you can specify what CPU instruction sets are available for a given target in the build settings (like you can for a C/C++ application), e.g., AVX2 or AVX512 etc.
@protox4 Před 5 měsíci
How does it compare with ReadyToRun? It's a mixture of AOT + JIT so you should get the best of both worlds in terms of speed (maybe not file size).
@wangshuo8619 Před 5 měsíci
did native aot support the reflection? Some of doc says no. Some of them say some of the reflection. I am not sure if I should migrate my code which heavily use mediatr to nativeaot. The docs are confusing
@psaxton3 Před 5 měsíci
The runtime also changed from Windows to Linux when you ran containerized. Would be interested to know the numbers on a Windows container.
@VoroninPavel Před 5 měsíci
What about comparing with ReadyToRun/CompositeReadyToRun mode?
@davidtaylor3771 Před 5 měsíci ⁺²
It is one of those things that will only be used in 2%-3% of deployments, which makes it seem not important. But those 2%-3% will probably be really important systems (with huge scale) that benefit hugely from the low latency startup time. It is great Microsoft is putting this effort in, but most teams are probably dealing with team productivity issues rather than scalability issues. But those that have a need for it will really appreciate it.
@younesskafia4189 Před 5 měsíci ⁺¹
This will also be useful for software that has to run in environments that prohibit JIT like consoles. Being able to run a game engine written in C# fully on a console is a holy grail
@parlor3115 Před 5 měsíci
@@younesskafia4189 Sounds really important
@TheAzerue Před 5 měsíci ⁺²
WIll Native AOT create any issue if it is used with other nuget packages like FluentValidatrion, MediatR, Serilog etc. ?
@VoroninPavel Před 5 měsíci
If a library is not marked as trim or AOT friendly, you'll get warnings from trim analyzer when publishing the application. Unless those warnings are disabled like it's currently with Blazor in .NET 8
@msafari1964 Před 4 měsíci
Hi, which cli u use for publish and so on?!
@magashkinson Před 5 měsíci ⁺¹
You can drag and drop csproj file from explorer to editor tab to open it
@ChristofferLund Před 5 měsíci
Big smarts
@nickchapsas Před 5 měsíci ⁺¹
You can also F4
@zwatotem Před 5 měsíci ⁺¹
I would love to hear, how exactly do these JIT optimization work. Right now this sounds like black magic to me.
@dukefleed9525 Před 5 měsíci
ok, interesting, but WHY is it happening? i suppose that JIT can better take track of the *register pressure* and in a resource constrained environment this does the difference, is this the reason? would be interesting to see what happens for a single threaded application (or anyway apps with different code path for each thread)
@jwbonnett Před 5 měsíci ⁺¹
Unfortunetly a lot of the required NuGet packages I need use reflection and will never be reflection free, so I will not be able to use AOT. Personally I would use AOT even if it has a slight drop in performance, but it's just not there for me.
@mauriciobarbosa3875 Před 5 měsíci ⁺¹
I'm wondering, is the same performance hit on a environment non WSL? wsl is known for being I/O slow with docker, what if the docker images are run on a full blown distro? just thinking
also, i think you used `dotnet publish` to publish the AOT for the docker image version and `dotnet publish -c Release` for the non-aot, isnt the default publish being to Debug on aot?
i have not coded in dotnet for a while, so sorry if i misunderstood
@nickchapsas Před 5 měsíci ⁺²
To your first question, it doesn't make any difference, it aligns with MS's full environment performance delta. Both of them are released using "dotnet publish" because in .NET 8 -c Release is the default.
@mauriciobarbosa3875 Před 5 měsíci ⁺³
@@nickchapsas
I've tried running the same benchmark on my machine, its a M3 pro base model (11core/18Gb/512Gb)
the results are actually surprising:
M3 Pro - no docker
AOT 139596.985677/s
Normal 139472.800011/s
M3 Pro - docker (colima on vz)
AOT 45329.935323/s
Normal 44474.530778/s
so running outside of WSL did impact the result, on my machine AOT is still slight faster 🤔
EDIT: (using the stress test with 100VUs for 60s as well)
@nickchapsas Před 5 měsíci
@@mauriciobarbosa3875 Were your tests hitting over 100% CPU util on the container level? Was your Macbook's CPU util less than 100% ? There are many variables. NativeAOT for this particular example will always be slower if run correctly.
@mauriciobarbosa3875 Před 5 měsíci
@@nickchapsasi've run the test again but now on my M1 from work, got similar results 🤔
@BigYoSpeck Před 5 měsíci
Requests per second is obviously useful for an application you expect to process lots of simple requests, but I would find a more useful benchmark to be how fast computationally and memory intensive requests can be processed
I currently work on an application that gets a relatively small number of requests per day, but those requests involve huge data models that then go through a lot of very time consuming processing, somewhere in the region of 15 minutes for datasets in the tens of thousands. So how does AOT compare with JIT when the responses aren't simple pieces of data but there is actually some heavy computation performed on large data?
@simonegiuliani4913 Před 2 dny
It's just a really bad benchmark the one he's using and he shouldn't generalise the results so much. If that is the benchmark we should refer to, then using .NET doesn't even make sense and we should all switch to GO
@another_random_video_channel Před 5 měsíci ⁺¹
I noticed that the based images are not the same. One in Ubuntu while the other is debian. Also the running container may have different resource constraints
@nickchapsas Před 5 měsíci ⁺²
It doesn’t make any difference, feel free to grab the code and check for yourself
@maxpuissant2 Před 5 měsíci
Is AoT somewhat safe or safer to deliver DLLs to clients without fear of decompile?
@souleymaneba9272 Před 5 měsíci
Yes. Blazor WASM already got its AOT (WASM AOT not Native AOT). These technologies are very good especially for .NET developper because IL code is easily decompiled.
@Hoop0u Před 5 měsíci
What about when hosted in IIS?
@gregoirebaranger1696 Před 5 měsíci ⁺¹
Performance is good enough in all cases, if you run into this kind of requests per seconds in prod I doubt you should be running a serverless / cloud container. I'm much more interested in the reduced resources required to run the app, (hat's the big big selling point of AOT in my opinion.
@T___Brown Před 5 měsíci ⁺⁷
I think its a new thing and MS will make it super fast each new release. But they want to see us using it before they put effort into it.
@cwevers Před 5 měsíci
You did the warmup call after the k6 test started
@nickchapsas Před 5 měsíci
It doesn't change the results, k6 takes that into account
@yoanashih761 Před 5 měsíci ⁺²
Any reason for switching from Postman to Insomnia?
@nickchapsas Před 5 měsíci ⁺¹³
I prefer the UX, it is correcly responsive and i hate Postman's forced account stuff
@Quique-sz4uj Před 5 měsíci ⁺¹
@@nickchapsas Insomnia changed and now it's quite shit like postman. It doesn't let you save your collections as files and is pushy about the account too. I prefer Bruno which is a fork of Insomnia and it saves the collection files on the file system as markdown files, which is good if you want to version control it.
@raykutan Před 5 měsíci ⁺⁴
Bruno isn't a fork if Insomnia, it's a completely different project.
It also doesn't store requests in markdown but in a special ".bru" format
@mad_t Před 5 měsíci
You wanted to ask if there's any reason for NOT switching from Postman to anything else, right?
@IncomingLegend Před 5 měsíci ⁺¹
@@nickchapsas why delete my comment? I didn't say anything bad, wtf? you're on their payroll or something?
@patfre Před 5 měsíci ⁺¹
Fun fact: the change in the csproj was bugged, it should only be on NativeAOT but was in all API templates, I reported it and got it fixed. Talking about the InvariantGlobalization
@simonegiuliani4913 Před 2 dny
Your corollary only applies to application endpoints which are not computationally intensive. It's really wrong saying "it's faster, it's slower", perhaps it should be contextualized better on the type of workload.
@BlTemplar Před 5 měsíci
AOT isn’t supposed to be faster. It offers less memory consumption fast startup but not better performance.
@jimmymac601 Před 5 měsíci
Just here for the comments from the Microsoft apologists.
@sikor02 Před 5 měsíci
I have the same CPU :) It's hard to saturate this beast
@the-avid-engineer Před 5 měsíci
Im sure the 1% of devs who are affected by the loss of 85k RPS are pushing MS to address the issue. Possibly a way to sample the JIT optimizations once they stabilize and then apply them to AOT at compile time. Kinda sounds like a form of ML
@LukasPetersen-bm4ep Před 5 měsíci ⁺³
First :D
@AhmedMohammed23 Před 5 měsíci
cpu buddies
@IllidanS4 Před 5 měsíci
To be fair I don't get why MS tries so hard to make NativeAOT the "modern thing" everything has to revolve around. Sure, you might run .NET in constrained environments or architectures where the JIT cannot run, but I really feel without it there is so much "power" that .NET has that it is losing. For quick startup they had ngen for ages, so even that point is moot.
How often do you need to run a .NET program that changes so often and needs to be restarted so quickly that ngen or JIT are actually the bottleneck? Without JIT some code has to be interpreted which has huge performance downsides. I don't see much point for pushing NativeAOT when it breaks when using full .NET features like Linq.Expressions, reflection and MakeGenericMethod, or the DLR.
@ByronScottJones Před 5 měsíci
In lambda and other on demand invocation environments, it can make a huge difference. It's not for routine Windows desktop apps.
@IllidanS4 Před 5 měsíci
@@ByronScottJones That indeed sounds like a very constrained environment, still I am not convinced that all of these improvements are just due to ditching the JIT. There could still be a way to use ngen to pre-compile a lot of what is used, or other tricks ‒ for example you can run .NET in WebAssembly where I have seen people running a warm-up code that runs JIT on a few important methods, runs some static constructors etc. and then takes the whole memory image, so you essentially end up with a pre-compiled image without any effort on .NETs part.
@ilanb Před 5 měsíci ⁺²
I don't think NativeAOT will be adopted quickly.. it's like nullables, too much of a pain in the ass to use, and the benefits aren't worth the work IMO
@juliendebache8330 Před 5 měsíci ⁺¹⁴
How is nullable references a pain in the ass? Pretty straight forward to use and not having to worry about NREs anymore is quite nice.
@EraYaN Před 5 měsíci ⁺³
I mean if you are using serverless it might be worth it pretty quickly.
@protox4 Před 5 měsíci
@@juliendebache8330 It's a pain in the ass to convert older huge projects. It's just fine for new projects.

Další v pořadí

Automatické přehrávání

The 2 New Web Application Types Added in .NET 8