Sparse voxel octree modification and benchmarking [Voxel Devlog #3]

Douglas

zhlédnutí 10 441

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 25. 07. 2024
In this video, I talk about how I designed and optimized an algorithm for modifying sparse voxel octrees in realtime. I then compare the performance of my voxel code in C# and C++.
Music used in the video:
Bill Brown - Windows XP
Vindata - Good 4 Me
Hry

Komentáře • 41

@hollyy238 Před 2 lety ⁺¹¹
I've had a similar experience with switching from C# to C++ for my dual contouring project. The recursive contouring part of the algorithm showed no difference in performance between the languages, but everything else was at least twice as fast. I'm not sure what it is about C# that's so slow because I didn't have a working profiler at the time but I think it might be accessing fields and array items like you suggest. I know it wasn't any calculations I was doing, or the cost of constructing objects so it's got to be something more general.
@juresrovin4174 Před rokem ⁺²
One language is compiled and other uses JIT. C++ does not use garbage collector.
@faultboy Před rokem ⁺²
@@juresrovin4174 That is correct but not the answer, depending what you are doing Performance could be similiar. He is more talking about how to optimize C# functions, but without Profiler this is cumbersome. The garbage collector is also only sometimes the Problem.
@sjoerdev Před 2 lety ⁺¹
awesome
@Conlexio Před 2 lety ⁺⁷
do you feel like your performance is being limited by the c# language? i know most approcaches would write the core in c++ or similar and then have c# interop for stuff other than the renderer/meshing code
edit: OH NEVERMIND, you directly talked about it right after i posted that T.T
@decemberfrostpaindine7987 Před 2 lety
Motivated! I wuv u
@kirillgurevich8671 Před 2 lety ⁺³
Can you please share your shader code, it would be very interesting to have a look =)
@ChipboardDev Před 2 lety ⁺¹
When's the next devlog? Love the progress!
@DouglasDwyer Před 2 lety ⁺³
Very soon - I've been busy at university, and so I haven't been able to work on this project as often. Nonetheless, I have some very interesting techniques regarding rasterization-based voxel rendering, which work even on integrated GPUs, that I'm excited to share. I'm currently working on re-adding realtime lighting to the aforementioned technique. The semester should be over in a few weeks, so look out for a video then! :)
@philippst.8543 Před 2 lety ⁺¹
This has probably been already commented, but I believe a big contibutor to the speed difference is the fact that C# is a garbage collected language.
Which means any C# program is constantly running a "second program" in the background that makes sure all your variables get deallocated.
This is also, I believe, why C# has higher memory usage.
I tried writting a 3d game in C# and ended up with 600mb of ram, this was partly my bad code, but also just C#.
I since moved to C.
Thanks for the videos btw. I think I might now actually understand what an octree is :)
@DouglasDwyer Před 2 lety ⁺³
I'm glad to hear you enjoyed the video! If enough people are interested, I might also make a video at some point just explicitly explaining the tech (like octrees). I try to do that in each devlog, but I also have to focus on the new things I've added, so I don't go into as much detail.
It's true that C# is a garbage collected language, but in this case, the collector is not the bottleneck. Unless you are using server-side GC, the GC only runs when a large number of allocations have been made, meaning it shouldn't slow down all program execution - just cause occasional hiccups. This means that, when writing games in C#, one has to be careful to avoid allocations in order to not trigger the GC. My algorithms do that - they don't create allocations and only employ stack-based datastructures (like structs), which shouldn't trigger the GC. To verify this empirically, I used the System.GC class to turn off the garbage collector during benchmarking. It did not make any significant difference. That said, I would agree that allocations are something for which to look out when using managed languages.
@philippst.8543 Před 2 lety ⁺²
@@DouglasDwyer Those specific explanations sound nice
It makes sense the GC would behave this way, learn something new every day :)
@diadetediotedio6918 Před rokem
The GC in C# can run in parallel if you specifically tell it to do so, otherwise it should run in the middle of your code execution flow. The C# performance issue against these low-level languages is more the amount of optimizations the JIT can do to maintain an acceptable startup speed versus the time spent optimizing low-level code, the JIT needs to be fast. so optimizations are minimal, this makes the code slower than simply compiling everything specifically for the target platform. C# however can effectively generate code as much or faster than C/C++ using different runtimes and code generation methods, for example we have Unity's HPC# with Burst which is capable of generating highly optimized assembly that runs at speeds absurdly high at the cost of having to have an explicit target runtime. Eventually we might see something like this in traditional C# with the common .NET 6+ environment, with the new AOT features we should probably see slow, gradual improvements in the speed of non-JIT code if the .NET team is really interested in bringing more speed for the language (several little things add up to this, for example the lack of an escape analysis as in Java is a very complicated factor for C#, the reference objects are always allocated on the heap while in Java the lifetime of the objects is scoped they are stacked, if we solve this this issue should raise a good performance in the language; likewise if it is possible to remove GC completely in critical performance scenarios this can generate huge performance benefits, currently we can do this only in a strictly limited way and with various restrictions).
@darkengine5931 Před rokem ⁺¹
@@diadetediotedio6918 I think the problem is more object overhead in managed languages. The C# optimizer actually does a pretty awesome job at instruction selection and register allocation on par with GCC when only involving PODs in every single one of my tests (and I'm a C++ programmer), and in spite of its much faster JIT compilation speed. Java as well in my tests (including analyzing the disassembly).
For example, C# actually comes very close to C++ in Debian benchmarks that don't involve any user-defined objects, like 0.71 secs for the best C++ entry of spectral-norm and 0.9 secs, not too far behind, for the best C# entry (and it actually beats all the other C++ entries).
Where the C# benchmarks get substantially worse is when they involve objects which I don't think is directly the fault of GC, but the general overhead of managed, polymorphic objects (loss of spatial locality, size inflation by an additional 64-bits on 64-bit architectures + up to 7 bytes of padding if the structure required less than 64-bit alignment, etc). For example, with the binary-tree benchmark, the fastest C++ version is about 5 times faster than the fastest C# version since the C# version uses objects for the tree nodes. The C++ version also uses objects but C++ has zero-overhead objects unless they require a vtable with the introduction of virtual functions which isn't needed there, and it also uses an efficient arena allocator in the C++ version to allocate the tree nodes contiguously.
The C# version could probably get very close to the C++ version if it stored all the nodes as structs stored contiguously in an array and just used indices to point to them in the tree without involving objects and object references.
Even 8 bytes of overhead per branch node is actually a ton and a performance killer for an SVO/SVDAG (my SVDAG only uses 4 bytes and 32-bit alignment in C++ so if I turned it into an object in C#, it'd be 4 times its size at 16 bytes with 8 bytes for the metadata and 4 additional bytes of padding and I'd be looking at 4+ times the compulsory cache misses and probably at least double the non-compulsory), and all objects require that 8 byte metadata and 64-bit alignment overhead in managed languages on 64-bit architectures the last time I checked. That object overhead is usually not a big deal with complex aggregates like an Image or Tree, but is a very big deal for teeny objects like a Pixel or Voxel Node.
@Tyradius Před 28 dny
Wonder how Rust would perform in comparison. 😊
@emmanueldolderer1225 Před 2 lety
Where did you find a way to modify the octree? That is, paste stuff in without regenerating the octree. I don't fully understand your example so I wanted some additional context but I can't find anything on the topic anywhere.
@DouglasDwyer Před 2 lety ⁺¹
I designed the algorithm myself, so I unfortunately don't have any additional resources that I can provide at this time. If there is enough interest, I'd like to make a video or produce a written description of the procedure in the future.
To clarify, the approach I currently take does indeed create a new octree. When an octree modification occurs, a new buffer is allocated and the contents rewritten. I represent octrees as packed byte buffers, so I believe this to be the best approach for the data structure. The alternative approach, trying to dynamically modify and resize the octree buffers, would probably just result in lots of copying anyway. As such, octree modification does require a full reallocation of the data structure.
That said, this approach has the benefit of conserving memory and total number of allocations while still being very fast. The point I make in the video is that the procedure I have designed does not take the naive approach of "regenerating" the octree from scratch. That would mean manually sampling every voxel location and running the function that converts a flat array to an octree. Instead, when I create the edited octree, I work from the top down. Taking advantage of the sparse data structure, I don't "recreate" target suboctants which do not overlap with the source - I simply copy the suboctant contents to the new buffer verbatim. Similarly, if a target suboctant is fully inside a homogenous source suboctant, the target's material is overwritten and I don't examine the target's voxels any further. This results in octree modification that is multiple orders of magnitude faster than generating an octree from a flat array, or generating by sampling every voxel.
@clemdemort9613 Před 2 lety
Could you elaborate on your rasterization and raymarching algorithm I'm not exactly sure what you meant?
@DouglasDwyer Před 2 lety
Because older hardware has trouble with ray tracing, I've begun to experiment with rendering voxels using the standard triangle-drawing methods on GPUs. In an upcoming video, I intend to discuss these alternate approaches, which include essentially using ray marching in the rasterization pipeline :)
@clemdemort9613 Před 2 lety
@@DouglasDwyer you do know that vulkan only supports more recent cards?
The oldest supported is the GTX 600 series so there isn't any "old" hardware that you'll have to worry about since vulkan doesn't support it anyway.
If you want compatibility with old hardware OpenGL3.3 is the only way to go... :/
@DouglasDwyer Před 2 lety
@@clemdemort9613 Yeah, when I said older hardware I specifically meant lower-end GPUs (including integrated GPUs) and graphics cards from 2010+. Even on modern low-end hardware (like my 2020 Intel UHD Graphics), the ray marcher shown in the video struggled for a number of reasons I hope to talk about in another video. Wanting to have a deliverable even for computers without fancy graphics cards, I'm exploring other options.
As for Vulkan, I've actually moved to targeting OpenGL ES 3 so that the project can be used in browsers with WebGL 2.0 and Emscripten. It's definitely not ideal, but in the future we will hopefully see WebGPU standardized, and that can be a modern web-friendly alternative to Vulkan.
@clemdemort9613 Před 2 lety
@@DouglasDwyer fair enough yeah, I'm all for extra performance! :D
I personally will stick to raymarching in my own projects since this way I can have more realistic lighting and effects.
Your project has this Voxlap feel which I kinda like too so you might not need super realistic lighting if that's the style you're looking for. Anyways good luck with your implementation !
@TomTom-du5qv Před 2 lety ⁺¹
@@clemdemort9613 Look up "VXGI 2 minute paper" on youtube. It's a combination of rasterization and then computing the GI lighting using voxel representation of polygonal objects. I think it's a good mid-tier approach for those people who can't get the newest video cards with RTX which are extremely hard to get right now for a decent price (Pre-corona).
@Conlexio Před rokem
i’m curious why you decided to go with an octree per object instead of modifying the original terrain octree. was it to retain information that would have been lost due to tree intersections (in case they were to relocate later on?)
also was thinking if having “layers” of octrees would reduce complexity at all. eg- instead of 1 terrain layer and 8 tree layers you just have 1 terrain layer and 1 tree layer than new trees are added too, with the assumption the trees would not intersect
@Conlexio Před rokem
i suppose due to the nature of octrees the intersection checks between octrees isn’t too expensive, wondering if it would scale better to have just 1 layer with 250 objects instead of 250 octree layers that need to diff check
@DouglasDwyer Před rokem ⁺²
Completely agree - a singular octree where all of the objects are merged is the ideal. That's what I implemented in this video - when each tree is placed, it becomes part of the same octree as the terrain.
Of course, the trees are originally their own octrees, because the tree data has to come from somewhere. But when they enter the scene, they become one big octree.
@darkengine5931 Před rokem
@@Conlexio A benefit of using a separate octree per object is that you can transform those objects and instance them. Like you could put 1,000,000 trees in the world made from the same tree model and only have to store a single octree for all one million of them in spite of scaling and rotating and translating each tree differently in the world (each tree only needs to store a model matrix and a pointer to the octree it instances). To avoid having to linearly loop through every single one of those trees to perform frustum culling for a rasterizer or perform ray/voxel intersections in a raytracer, you can create one loose octree or spatial hash for the objects in the world (this spatial index would not store voxels at the leaves, but only objects).
@delphicdescant Před 2 lety ⁺⁴
Yeah... When C# devs tell you how C# is "pretty much" as fast as C++ these days, I think what they mean is "it isn't *orders of magnitude* slower anymore."
And that's cool and all, but when you're working on a real-time system, C++ being "merely" twice as fast ends up mattering lol.
Luckily for me, I like C++ anyway.
@diadetediotedio6918 Před rokem
Well, if your goal is pure and raw speed you have significantly less problematic languages than C++ like Rust or even C, a bliss for C# devs who can now write safe, readable code and leave the performance-critical parts (such as voxels) for more performant languages that are difficult to write and maintain.
@diadetediotedio6918 Před rokem
And as for C#'s performance, it's pretty close to C/C++ and low-level languages, the biggest problem with the language in terms of performance is simply a matter of the many small decisions the .NET team makes about optimization, the lack of an escape analysis suitable for example, or the possibility of disabling GC at strategic moments, the way interoperation is done and even the various runtime checks to ensure code security, in addition to other important things like the need to make the JIT fast (and consequently not being able to optimize the code as well as a static compiler could), with some optional changes we would see absurd performance gains that would put C# as a serious competitor of low-level languages, losing only in the possibility of using it in embedded and the like (which is possible, but I wouldn't recommend it due to being quite hacky).
@delphicdescant Před rokem
@@diadetediotedio6918 The "problems" that rust "fixes" are greatly overstated, and the additional workload of conforming your program architecture to rust's unyielding ownership demands are greatly understated.
Rust was an exciting proposition five or eight years ago (and I was excited too), but after the industry has had some experience with the reality of using the language, the largest factor behind its adoption turns out to just be evangelism. Beginning new projects in C or C++ is just as reasonable as beginning them in Rust.
The so-called "problematic" nature of C++ just isn't anywhere near as big a deal as rust evangelists claim it is. Generally these evangelists *never were* highly experienced with C++, and are only aware of the talking points academically.
@diadetediotedio6918 Před rokem ⁺¹
@@delphicdescant
1. It's not "underestimated", it's simply something done to make the software more solid regarding its own security, naturally I didn't say , that would make no sense, it would be too much work and no experienced programmer would tell you that the first thing to change in a system should be the language (it doesn't matter if it's Java, C#, C, C++ or Rust). Rust is something to build systems for, to adapt your programming style to things that are coming out (which is exactly why Linus Torwalds is seriously considering using the language for new Linux features). Rust effectively makes programming safer as it eliminates multiple vectors of human error with a solid, opinionated compiler, unlike in C++ where you can literally do anything by default (and yes, I've programmed in C++).
(2 and 3). Certainly not, Rust is being massively adopted by companies like AWS and alternatives to C++ are popping up every day, why do you think they are favoring the emergence of new languages like Rust, Go, Carbon, Zig and the like? The reason is extremely obvious, C++ is the elephant in the room that no one has the courage to explain how bad it is, naturally because the C++ fandom is an absurdly disgusting thing and full of arrogant people who are unable to recognize the dozens of flaws in their language itself. One of the most used kernels in the world for servers and mobile devices is literally programmed in C, the supposed predecessor of C++, so you can see how "C++ is a reasonable choice".
@delphicdescant Před rokem
@@diadetediotedio6918 It doesn't take courage to find bad things to say about C++. Everyone does that. And I said "understated," not "underestimated."
A minority of companies are adopting rust, yes. It remains a fad choice, propelled largely by team leads and architects that have been bitten by the rust evangelism bug.
There is no C++ fandom. C++ programmers are nearly ubiquitously aware of the flaws of the language, and used to working around them. Rust evangelists, on the other hand, are called evangelists for a reason. They are like some combination of vegans and jehova's witness. *They* truly are unable to see the flaws of the language they espouse, and every time someone in a management position is won over by Rust's academic talking points, then a new company "adopts" the language.
Are they better off for it afterward? Maybe, maybe not. If you ask the evangelist, of course they are! But the reality *I've* experienced is that Rust's safety mechanisms have a minor, often even imperceptible, effect on overall productivity. And the cost is more constraint on architecture.
Teams should feel free to use Rust. It's neat. But it's not vastly better, nor vastly worse, than any of its competition. Language choice was never the most significant factor in success.
@vladyslavkryvoruchko Před 2 lety ⁺¹
Use GLSL to run engine on videocard, it will work quite faster because vid-cards were made to render graphics.
@SerBallister Před 2 lety
Maybe easier to get a C++ version running first, it's a real pain debugging complicated shader code compared to C++
@Enter_channel_name Před 10 měsíci
This is why I basically never use C# and exclusively use C and C++
@Im0rb Před 2 lety
Hello youtuber
@_k1r1t0_ Před 2 lety ⁺¹
Hi bro, i am working on similar project. I think it would be useful for us to work together. How do you like the offer?
@DouglasDwyer Před 2 lety
Hi Codas,
Thanks for reaching out. I'm glad you liked the video! Right now, this project is just a hobby/learning experience for me, so unfortunately I'm not looking to partner with anyone else at the moment. In addition, the project is also undergoing structural redesign and is not in a collaboration-ready place (I am rewriting in Rust using WPGU). I hope you understand.
That said, I am more than willing to share advice and techniques - that's why I am making CZcams videos! I had a quick look at your channel, and must say that the path tracing project looks very cool. Path tracing is not a technique that I have explored, so I would be interested to see how you develop it. Continue posting updates on your project, and let me know if there's anything that you want to discuss!

Další v pořadí

Automatické přehrávání

Drawing MILLIONS of voxels on an integrated GPU with parallax ray marching [Voxel Devlog #4]