Drawing MILLIONS of voxels on an integrated GPU with parallax ray marching [Voxel Devlog #4]

Douglas

zhlédnutí 29 975

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 9. 05. 2022
Over the course of this video, I describe a technique I've developed called parallax ray marching that can be utilized to draw detailed voxel volumes on integrated graphics hardware. Parallax ray marching sidesteps the overhead incurred from generating and processing the large numbers of triangles typically needed to render voxels. This is accomplished by perspective-aware ray marching in the fragment shader that adds additional details to cheap bounding boxes. I discuss the benefits and drawbacks of the approach, and in addition describe my vision for the engine's future.
Music used in the video:
Peyruis - Swing
Chase - Deep thinking music
AdhesiveWombat - 8 Bit Adventure
Hry

Komentáře • 139

@AngeTheGreat Před 2 lety ⁺⁷⁸
This seems like an exciting new field with a lot of people making advances. Maybe we'll finally get a decent voxel engine with so many people working on it. It looks like you're making great progress, nice video!
@ragskola Před 6 měsíci ⁺³
hello engine man
@xXParzivalXx Před 2 lety ⁺⁵⁷
This is so cool! I've been learning graphics programming and have been dabbling in both rasterization and raymarching projects, so I understand how awesome it is to come up with your own rendering technique. Keep up the amazing work!
@asp-uwu Před 2 lety ⁺⁶²
This is wonderful stuff! A very clever new technique you've developed, amazing that it can run on integrated GPUs! I am not sure the specifics of your GL environment (or if you're using GL at all, I'm sure vulkan has something similar though), but GLSL allows you to specify that you'll only _increase_ a fragment's depth using layout(depth_greater) out float gl_FragCoord, which allows it to still do early depth tests as it knows you'll only ever push fragments further away rather than move them closer to the camera. Maybe you've already tried this and it didn't work for you, but if not maybe give that a try!
@DouglasDwyer Před 2 lety ⁺²¹
Thanks for the interest! I am currently targeting OpenGL ES 3.0 in order to make things run both in the browser and on the desktop. You are absolutely spot on regarding the fragment depth - I only ever increase it, so the GL_ARB_conservative_depth extension would be applicable here. I did experiment with it, and found that drawing with immediate depth modification was only slightly worse than drawing with deferred depth, when the extension was enabled. The main issue is that the extension is not available on the web, so I cannot rely on it for performance. Nonetheless, I do enable it on desktop when doing immediate depth writing, which is still utilized to draw smaller dynamic objects.
@zanzlanz Před 2 lety ⁺¹⁰
This is really interesting!! It crossed my mind when you mentioned raymarching, but I didn't expect you'd actually go through with a bunch of raymarched chunks - I'm really impressed you were able to work through those implementation issues and still came out with a fast system. With shadows, no less! Can't wait to see how this develops. Much luck!
@sjoerdev Před 2 lety ⁺⁶
This is extremely underrated
@majorsquidgaming6965 Před 2 lety ⁺⁹
I am obsessed with voxel engine devlogs, and you are awesome at them. You just gained another sub
@hi_its_jerry Před rokem
that's so damn clever! i am fascinated by this project, really cool stuff you came up with there!
@liamrealest Před 2 lety
So smart and so much potential! will be following this project from now on
@Mavrik9000 Před rokem
Brilliant techniques! I can't wait for more.
@ChipboardDev Před 2 lety ⁺¹
Wow! What a genius solution! I'm impressed!
@chucksneedmoreland Před 2 lety
Fantastic work! Not many people explain their techniques as clearly as you do.
@Dannnneh Před 2 lety ⁺³
I appreciate the approach these videos have, with a clear, orderly presentation and proper articulation, as opposed to other channels who insert silly/comedic things which I find annoying. And I notice you also credit the music in the description, which itself is more than what a lot of channels manage to do.
@VoidloniXaarii Před rokem
This was so interesting! Thank you very much! Made me remember old bsp tree b stuff and crystal space oct tree implementations, thank u
@JoshuaBarretto Před 2 lety ⁺¹
Nice work! Also good to hear that you've chosen Rust, I can promise that you won't regret the move in the slightest. Games like Teardown use a very similar technique to this, but with the inclusion of a deferred rendering system. Taken in combination, both can allow for a lot of effects such as shadow ray marching and other depth-based techniques like SSR without too much extra overhead.
@timbomb374 Před 8 měsíci
Thats a crazy sneaky trick. Can't even tell when looking at it
@adamhelberg9228 Před 2 lety ⁺²
This is great, you'll have 1k+ subscribers in no time.
@williamist Před 2 lety
woah this is cool! can't wait to see what this game becomes :D
@jordanserres-gonzalez9634 Před 2 lety
Been watching since the 2nd episode and i can say that I LOVE THIS! because this is my dream project and i fit very well into the "rust is the best programing language" stereotype so once i saw that .toml file i literally freaked out
@Alexey_Pe Před 2 lety
This is amazing! I subscribed and turned on notifications, looking forward to new videos
@sonryle5738 Před rokem
This is so cool :D making the game in a browser is really smart too. Here is a 907th sub
@TristanPopken Před 2 lety
Great video!
@Batiman85 Před 2 lety
Keep up the good work ur soooo underrated
@humorousfool215 Před 2 lety ⁺⁶
This is super cool! I’m working on my own voxel engine using only rasterization and I’m starting to run into problems with large scenes so I might look into this if it gets too bad. If you wrote a paper or just a more detailed explanation I expect people would be pretty interested.
@kurtkuehnert Před 2 lety ⁺¹
Wow, this is really interesting! I am definitely looking forward to see how this project turns out. I am currently building a large scale terrain plugin with rust, wgpu, and the bevy game engine. I can really recommend this tech stack. With wgpu you could get compute shader support back, which might be useful for generating the octtrees. Thank you for taking the time for uploading these awesome videos.
@DouglasDwyer Před 2 lety ⁺³
WebGPU is an awesome technology, and I will continue to monitor its development in the future. It was something that I considered using when beginning this iteration of the engine, but I ultimately shied away from it because the technology was too new. One major problem with WebGPU currently is how limited WGSL is - it doesn't have features like arrays in vertex outputs yet. Once the standard is finished, the API is well-documented, and the technology is not a moving target, I may consider porting to it.
@tritoner1221 Před 2 lety
this is amazing!
@ludothe Před 2 lety
Sounds perfect!!!
@JoseRomagueraM Před 2 lety
Really cool video!! I'm developing also my own voxel game and the technique to render far away chunks its really similar to this "parallax ray marching". But it needs more optimization, its not fast enough for my dedicated GPU :P. I keep forward in your project, for sure I will learn a lot from you!
@bingusbongus9807 Před 10 měsíci
very cool, so much information on voxel renderes, i wonder where the limits are
@jtbgames6440 Před 2 lety
Whoa that was cool.
@shlagon3554 Před 2 lety
Vert underrated channel
@dottedboxguy Před 2 lety
eey another voxel project, that's nice
@keptleroymg6877 Před rokem
Good man!
@HarhaMedia Před rokem
That's very cool.
@Bebeu4300 Před rokem
Parallax Raymarching is honestly a really clever solution
@MESYETI Před 2 lety
cool engine
@axu6207 Před 2 lety ⁺³
I would love to see this solution compared with greedy meshing, it's probably slower but I'd love to know by how much. Criterion RS is a good tool to measure performance.
@AntonioNoack Před 2 lety
Most shown 8x8x8 cubes here are relatively planar, so you could compute a plane per 8x8x8 cube, and start your ray on it (intersected with the AABB)
and for visualizing depth, you could use fract(log2(depth)). It's less clear about absolute differences, but it shows local differences well :) By adjusting the logarithm, e.g. log2()*a (a e |R), you can make it look a little different
@Bruiserjoe Před 2 lety
This is so fucking cool
@nou5440 Před 2 lety
subbed
@netcore10 Před 2 lety ⁺¹
So much potential! Have you considered phones in the future, too?
@DouglasDwyer Před 2 lety ⁺¹
I've thought about phones, but the goal of targeting integrated GPUs and the web on desktop already provides plenty of challenge for me. Given that phone graphics is even more limited than desktop iGPUs, I don't want to constrain my tech to such a small feature set. In the future, if the game does manage to run on phones, it should not be that difficult to port.
@abdou.the.heretic Před rokem
Ken Silverman would be proud
@diligencehumility6971 Před rokem
C# can also be complied to Web Assembly
@MeltedIce_ Před 2 lety
Wow!
@Cheesecannon25 Před rokem
I wonder how much these techniques could be applied to distant chunks in Minecraft
The Distant Horizons mod uses typical LODs with greedy meshing, on top of 1 pixel textures
@GamerSaga Před měsícem
what is the size of each voxel block in terms of pixels? and is each block one color?
@YuHayate Před 4 měsíci
now make the voxels like 1.6 times bigger and boom, you can render a larger area (in size) with just as many voxels
@eirik6502 Před 10 měsíci
Maybe you already have explained it in one of your videos, but i am wondering out of curiosity why you use ray marching in favor of ray tracing or good old triangle rasterization?
@mnxs Před 2 měsíci
performance; he was quite clear about targeting toasters in web environments
@two2fiv67 Před 2 lety
This is great! How did you create the world shown in the video, did you manually place each voxel object?
@DouglasDwyer Před 2 lety ⁺²
The world was created parametrically (I described the voxel volume using code) with some sine/cosine curves by iterating over every voxel, and then generating a voxel octree. I should mention, of course, that the world is just a placeholder. I have some much more powerful techniques I plan to apply for actual world generation. If you're curious, the actual code is below. It was executed for each voxel in the volume:
// arr is an array of voxels
if (y as f32) < 10.0 * (f32::sin(x as f32 / (25.6 * 3.0)) + f32::sin(z as f32 / (25.6 * 5.0))) + 30.0 + x as f32 / 5.0
{
if ((y as f32) < 10.0 * (f32::sin(x as f32 / (25.6 * 3.0) + 2.0) + f32::sin(z as f32 / (25.6 * 5.0)) + 2.0) + 15.0 + z as f32 / 5.0)
{
arr[[x, y, z]] = 1;
}
else
{
arr[[x, y, z]] = 2;
let d: f32 = rand::thread_rng().gen();
if d as f32 > 0.9 && (y as f32) < 214.0
{
arr[[x, y + 1, z]] = 2;
if (d as f32 > 0.9875)
{
arr[[x, y + 2, z]] = 2;
}
}
}
}
if (z - 100) * (z - 100) + (x - 120) * (x - 120) < 20 * 20 && y < 200 {
arr[[x, y, z]] = 3;
}
@adicsbtw Před 7 měsíci
If you aren't already doing it, could you use the bounding boxes that you use for rendering as an early discard to discard entire chunks of the rendered image and save on performance that way? Also, what about dynamically sized bounding boxes that can change size to cover large areas with a single block to reduce rendering overhead for large solid structures
@DouglasDwyer Před 7 měsíci
The bounding boxes can't quite be used for early-discard because the bounding boxes are maximal. That is, they surround all solid voxels and in addition may contain some transparent air sections too. Hence, rendering the bounding boxes early would occlude portions of the screen that should actually be visible.
In this iteration of the engine, I chose to go with fixed-size bounding boxes for two reasons. One, it was much simpler to program and maintain the rendering pipeline. Two, having a fixed-size bounding box allows one to limit the maximum number of ray-steps, which is important for achieving good performance on low-end devices with the parallax ray marching technique.
@adicsbtw Před 7 měsíci
@@DouglasDwyer That's fair
@ImpossibleEvan Před rokem ⁺¹
How do computers store the voxel's data? Is it like just 1 massive array?
@DouglasDwyer Před rokem
I store the data in a data structure called a sparse voxel octree, which compresses regions of similar voxels to save on space and allow for more efficient editing. I talk more about octrees in my first few devlogs :)
@delphicdescant Před 2 lety ⁺³
Have you considered setting up a discord server or something? I know there are already quite a few voxel game dev servers, but it would be cool to participate in one where the focus is on *raymarched* or *ray-based* voxel rendering specifically, since a lot of the time the regular voxel game dev kinds of servers seem to be 90% focused on meshing and data structures for meshing.
@DouglasDwyer Před 2 lety ⁺¹
It's definitely something I might do in the future! For now, I'm hoping to continue working on the engine and making these devlogs. If they gain enough traction, a Discord server would be a wonderful way to interact with viewers and the greater voxel community.
@mrpedrobraga Před 2 lety ⁺¹
Amazing! I didnt know Rust could compile to WASM.
I am working on a voxel sprite editor and I implemented raymarching within a box as my first approach, really, and i am addopting the name Parallax Raymarching.
As i mentioned, im working on a voxel sprite editor and am planning to recreate a simple renderer for it;
Is there a place where we can chat, discuss tech and whatnots?
@DouglasDwyer Před 2 lety
Thanks for your interest in my work. Rust's cross-platform capabilities are one of its major strengths, in my view.
I'm always more than happy to talk about coding! I am of course willing to talk here in the CZcams comments, and can also be reached on Reddit and Discord. On Reddit, I'm u/The-Douglas, so please DM me there if there's anything I can do for you :)
Some people have asked about starting a Discord server, which is something I might try in the future, but I'm not ready for that quite yet.
@ShotgunLlama Před rokem
How do you pass voxel data to the GPU? Do you use either an octree or just a 3d array that's set per invocation on a box-bounded object?
@DouglasDwyer Před rokem
Great question! Let me refer you to an answer I made on a separate video:
The data required for rendering chunks is twofold, and corresponds to the parallax ray marching implementation described in this devlog. First, face data needs to be stored for each bounding box upon which parallax ray marching is performed. There are typically maybe around 3,000 to 6,000 faces that get stored, with vertex position and a per-quad pointer to a materials array. That brings me to the second part, which is the materials array. Surface-visible voxels (only those on the surface!) are uploaded to a 3D texture which is referenced in the VBO indices. On the CPU, data is stored in an octree, so a conversion process is necessary for this step.
@ShotgunLlama Před rokem
@@DouglasDwyer is that 3d texture just sparsely filled with the surface voxels leaving all the empty space in between? Or are they compressed or consolidated in some way?
Also re:depth values, you mention you can render non overlapping regions with an early z test, but in the video show overlapping objects. Does having overlap require you not use that approach?
@DouglasDwyer Před rokem ⁺²
So, the voxels for each 8x8x8 bounding box (only the AABBs on the visible surface of the volume) are copied into the 3D texture. In texture memory, the voxel boxes are tightly packed, meaning the large empty spaces (or opaque, non-visible solid portions) of the volume are not stored in texture memory. This allows me to render a great number of volumes without an insane amount of VRAM.
You are exactly right about the depth - to solve this, multiple passes are employed. The first pass draws the large static potion of the scene with an early z-test, allowing for optimizations. The second pass draws smaller dynamic objects with an immediate depth write to allow for objects to intersect properly.
@clemdemort9613 Před 2 lety
Woah this is pretty neat! You seem to know your stuff too, how long have you been a graphics programmer?
@DouglasDwyer Před 2 lety ⁺¹
I started programming in Unity3D eight years ago. I was just a kid then, learning the very basics, but consequently I've been exposed to 3D graphics since the beginning. I've only really gotten into shaders and OpenGL in the past 3 years or so, but it's incredibly gratifying to play with. Even now, I learn something new almost daily!
@brian9498 Před 8 měsíci ⁺¹
How do you send the voxel data to the shaders to render them on the surface of the bounding boxes? I'm trying to create a similar effect.
@DouglasDwyer Před 8 měsíci ⁺¹
The voxel data for each 8x8x8 bounding box is packed tightly into a single, large 3D texture. The offset into this 3D texture is encoded as a vertex attribute for the bounding box geometry. Then, the 3D texture is sampled in the fragment shader using the offset for each bounding box.
@brian9498 Před 8 měsíci
@@DouglasDwyer In my implementation I have a 'Voxel', a bounding box, and a 'Voxel Chunk', which is a 32x32x32 group of bounding boxes. So you have a large 3D texture for every 'Voxel Chunk' and each bounding box inside the 'Voxel Chunk' has an offset of where it is inside that 3D texture, and then you send the 3D texture to the fragment shader and then do the ray marching inside each bounding box?
@DouglasDwyer Před 8 měsíci
@@brian9498That's almost correct. The way you describe it, it sounds like the suggestion is to create an entire 3D texture and upload the entire voxel volume to it for each voxel chunk. While that might work for prototyping, it's wasteful in terms of memory (because there is a lot of empty space in the 3D texture that doesn't correspond to a bounding box) and also bad for performance (because you will have to change the texture binding between each chunk draw call).
Instead, I have one single 3D texture for everything. When a voxel chunk is created, I allocate a region of that 3D texture and tightly pack the voxel data only for the visible bounding boxes into it. This ensures a low memory footprint and allows reusing the same texture in all draw calls. Let me know if you have any more questions!
@brian9498 Před 8 měsíci
@@DouglasDwyer Thanks for the explanation!
@Kelvin-hh1ci Před 2 lety
That's a really neat way to render everything! With chunks that big though does it take a while to generate them? I've been experimenting with voxels myself and it takes about a minute to generate a 2048^3 scene.
@DouglasDwyer Před 2 lety
Each chunk is stored in an octree, which is utilized during the mesh buffer generation to maximize performance. This allows me to generate the draw data for the volumes at real time speeds.
You might also be asking about generating the chunk data itself. Currently, the way I generate the volume is to create a flat array of voxel materials, and then convert that to a voxel octree. This generation process does take 500+ milliseconds, and is not suitable to run in real time. I hypothesize, however, that octree-based world generation could be made much faster by generating the octrees directly. The bottleneck in generating these chunks is simply iterating over every single voxel, since there are so many. If an octree-based world generation algorithm were used (that, say, filled in a large suboctant with a single material when appropriate instead of iterating over each voxel in the suboctant) this would be much faster. I plan to explore this in the future.
@Kelvin-hh1ci Před 2 lety
@@DouglasDwyer Just came up with a solution a few minutes ago that might solve the issue of having to iterate through every voxel. Since you can have large sections in a sparse octree that are marked as being the same block, you can place big blocks instead of individual small ones, so you can determine the size of the block you want to place by rounding the distance from the SDF to the closest power of 2. Then you can skip the blocks that are inside that area.
@notsoren665 Před 2 lety
Da BEVY
@Conlexio Před 2 lety
this is a cool technique you really had to think outside the box for! do you think your technique will have any more problems like the depth buffer when it comes to more visual effects and interactivity? ex. outlining an object, measuring distances between voxels, long distance geometry updates
@DouglasDwyer Před 2 lety ⁺¹
That's a wonderful question. All in all, this technique is nice because it doesn't really break any steps of the rendering pipeline. I can still write to the depth buffer immediately, without requiring a second pass, for smaller objects. As long as there aren't too many, this doesn't result in performance degradation. As such, I don't anticipate that things like object outlining will be significantly different.
I can tell you about a few other issues that I had to overcome. One major thing that this technique broke was texture filtering. GPUs typically calculate texture gradients to select LODs and filtering modes by taking texture coordinate derivatives between adjacent pixels, but this doesn't work for me, because I call discard in my fragment shader. As such, in order to ensure that far away textures are linearly sampled and use the proper MIP levels, I manually calculate texture gradients by hand (either using a heuristic or a formula for the exact derivatives) and then call textureGrad.
One other problem that occurred with this technique was that there is a disconnect between the NDC-space voxel coordinates and ray marching voxel coordinates. This meant that, when viewed at the right angle, there were sometimes "seams" where the ray marcher thought voxels didn't exist, but the rasterizer did. To compensate for this, I scale all bounding boxes slightly so that the face edges overlap.
All in all, I feel as though I've thought about this technique for long enough that I have addressed the major challenges. Now, I need to continue working to implement things like backface culling and LODs which will allow me to increase rendering distance.
@Conlexio Před 2 lety
@@DouglasDwyer cool, thanks for the detailed reply! :)
i wonder if there was still any optimizations to be done with the standard vulkan compute route
either way your new technique is really cool and i think well suited for a voxel environment
i look forward to more updates on this :p
@Conlexio Před 2 lety
also great that it works on low end hardware because that is really the goal optimization should be measured against not just fps on modern hardware
@Jkauppa Před 2 lety
paint the texture id and uv with z-depth distance, ray intersection normal, flat shaded with uv, in a buffer, then render color for the whole buffer
@Jkauppa Před 2 lety
msaa is only high res rendering antialized down sampled resolution filter
@TSK_Silver Před 2 lety ⁺¹
if you want some good video music (non copyright) i recommend "Bad Snacks" "David Cutter Music" and "Harris Heller"
@DouglasDwyer Před 2 lety ⁺²
Thanks - I'll be sure to check those out! I've just started making videos, so I don't yet have a library of video music which I can use offhand.
@TSK_Silver Před 2 lety ⁺¹
@@DouglasDwyer well you have good production quality, not very many newer content creators have that, so i wish you the best of luck
@devwckd Před 2 lety
Hey could you link us resources you used to learn this? I really want to learn it but I don't know how to start
@DouglasDwyer Před 2 lety ⁺¹
Great question! I've been working in game development and graphics for a number of years now, so there wasn't necessarily one tutorial I followed. Nonetheless, I can try to give some general direction for how to learn game and graphics programming. In truth, the most important part is to pick a project and stick to it. Even if you never complete it, the knowledge learned is invaluable!
As for resources to game programming, it depends upon your skill level. If you're newer to game programming, I can't recommend Unity 3D enough. Unity, Unity, Unity! It is an extremely powerful yet intuitive tool, and the documentation is top notch. There are also a ton of tutorials out there (Brackeys, for example), so learning Unity is only a Google search away. If you're looking for something more involved, and want to learn graphics programming, I recommend learnopengl.com (as well as the OpenGL documentation). It provides the basic steps for setting up raw 3D graphics applications. Of course, these are only a couple resources - I have viewed many, many more webpages while developing this project. Let me know if there are specific things you want and I can post resources. As I stated above, though, I highly recommend picking a specific project (like, for example, building a simple 2D platformer) and then structuring your Googling and learning around that goal.
@neurenix5201 Před rokem
Hi, I'm new to graphics programming and I wondered why your implementation of parallax ray marching is more efficient than than simply just raymarching the whole scene. At least on how I understood raymarching it seems that in the end the same work is performed, well more work actually now that you also need to generate those bounding boxes. Is there something I missed?
@DouglasDwyer Před rokem ⁺¹
Sure, no problem. There's a big difference between parallax ray marching and "pure" ray marching - that is, where the ray begins.
From my experience the biggest bottleneck in GPU-accelerated ray marching is memory. Every ray step requires loading data from a buffer or a texture, and memory latency becomes a big concern as each ray diverges. Integrated GPUs are especially bad at memory accesses. Naively marching voxel grids can easily lead to hundreds of steps and memory accesses (although there are techniques to reduce step count, like distance fields). This makes pure ray marching slow.
With parallax ray marching, the bounding boxes are first rasterized, which is nearly instantaneous. Then, a ray is started on the surface of each bounding box. Since the bounding boxes are tight - conforming closely to the voxel surface - only a few ray steps are needed to hit a solid voxel. In the best case, it's a single ray step. In the worst case, it's 24. For the scenes in this video, I would guess that it averages around 4 steps. This means only 4 texture fetches, which is much better than pure marching! This is where the speedup lies - integrated GPUs are good at rasterization, but not at random-access memory fetches. So, we minimize memory fetches.
@neurenix5201 Před rokem
@@DouglasDwyer Thank you for your input! I believe I understand your concept a bit better, I want to create a more detailed CZcams video about your concept and want to talk about some of my own ideas. Would it be ok with you that I use your concept as a basis for my video, I will of course credit you.
@DouglasDwyer Před rokem
@@neurenix5201 that sounds great! Feel free to use whatever you like. Let me know when you post it - I would love to take a look.
@doltBmB Před 7 měsíci
1. ray tracing and ray marching are not the same
2. msaa does not "blur" edges, but oversamples them
@AllExistence Před rokem
Why are shadows jittering?
@TheQxY Před 10 měsíci ⁺¹
Any plans to utilise the new WebGPU library for native GPU support on the web?
@DouglasDwyer Před 10 měsíci ⁺¹
Yes - now that the technology is more mature (and browsers are enabling it by default) - I am very much looking forward to switching over to WGPU! There are lots of nice performance gains to be made, and I would have access to new features like compute shaders.
@TheQxY Před 10 měsíci
Cool, would love to see it! Not that much content can be found of people implementing it yet.
@exsolutusdev5118 Před 2 lety
The way you've explained your rendering technique reminds me of how rendering is done in Teardown. Have you seen any of the dev talks for that, and if so is that what gave you the idea to rasterize first, then ray march?
@delphicdescant Před 2 lety ⁺¹
Teardown wasn't the first to do it either, nor does Teardown use it in the same way.
Techniques like this have been discussed in the voxel dev community for years.
It's a pretty straightforward idea that comes up naturally.
@dominicstocker5144 Před 2 lety
@@delphicdescant is the implementation in this video more efficient?
@delphicdescant Před 2 lety
@@dominicstocker5144 No idea since I don't have the source.
@No-vv7rp Před 2 lety
Hey Douglas - I'm impressed watching your progress on this engine over time! I noticed that you have some custom depth logic in order to keep early z tests while being able to modify the depth in the fragment shader. Could you go in a little more detail as to how this works? I saw someone else commented about only ever increasing the depth. This makes sense to me if you render the front faces of each chunk - the depth will only ever increase as a ray traverses through the chunk, no problems there. However, what if the camera is inside the chunk? Assuming we're not culling faces, then what we see technically is the back face of the mesh representing the chunk. Then, our assumption that the depth will only increase is no longer true. Does your method deal with this problem? I'm working on my own voxel engine currently, and this problem has left me rather stumped.
@DouglasDwyer Před 2 lety
Haha - you are absolutely right! This is a problem that I gloss over in the video. The method in the video does not have a natural solution for rendering when the camera is inside a bounding box. However, I do have a way to deal with this problem that I intend to implement in the future.
Let me try to explain a bit more about how I do the rendering. You seem to already grasp the idea since you've asked such an insightful question. I break the voxel world up into 8x8x8 voxel "bounding boxes," and then render faces for the outsides of those bounding boxes. The small bounding box size reduces the number of necessary ray steps and allows this to run on low-end hardware. All back faces are culled for performance reasons, so when the camera goes inside a bounding box, the content just disappears. This little annoyance COULD be solved by rendering with back faces instead of front faces, which is what the game Teardown does. However, rendering with back faces makes culling much more difficult, and would require many more triangles on the whole to render properly.
Instead, I propose to solve this problem by adding in a traditional raymarching step to the rendering pipeline. On the CPU side, I will detect the AABBs that intersect the camera's near clipping pane. Then, I will probably draw a fullscreen quad or something similar, and perform standard raymarching for those AABBs, skipping the rasterization step. Because the camera will only ever be intersecting a few small AABBs, I don't anticipate major performance issues.
Let me know if you have any questions - I'd be happy to talk about it more! That's the gist of it, though - I am going to sidestep this problem by rendering the offending bounding boxes differently.
@No-vv7rp Před 2 lety
@@DouglasDwyer I see, that definitely makes sense. I could accept that the number of voxel chunks that intersect the camera's clipping plane would be far less than the total amount. One trick you could do to render these still using the same fragment shader you use for tracing voxel chunks would be to, for only the offending voxel chunks, render back faces instead of front faces. In the shader, you'd have to #ifdef away the depth definition specifying increase only, so technically you'd need 2 shader programs (with almost exactly the same source code). That way, for only the offending the AABBs, you can render the chunks using the "same" fragment shader (albeit at slightly slower speeds w/o early depth test, however I don't imagine many of these chunks would get culled anyway). Then, you can switch back to the increase only shader and render anything whose front faces are fully visible.
One thing I noticed is that I didn't see any artifacts that I would expect from the camera being inside a voxel chunk in the video. Did you just position the camera cleverly in the video to avoid the issue, did you record w/o depth testing, or something else?
Looking forward to seeing how this project goes. Might be interesting to see how you handle secondary rays. Those wouldn't be able to take advantage of rasterization cutting down on the distance traced, but you could use some other scheme like a BVH to trace through the chunks in world space, or DDA through the chunks until you reach a non-empty chunk. Thanks for the response!
@DouglasDwyer Před 2 lety
@@No-vv7rp Because each AABB is only 8x8x8 voxels, the camera has to come within 8 voxels of the rendered surface (or even less, since AABBs fit the surface as tightly as possible). The camera doesn't come that close to any of the AABBs in the video, so there aren't any problems. Again, though the data is organized into larger voxel chunks, the small AABBs are what are rasterized and ray marched. Have a look at 7:09 for a visual example - in the final product, those boxes are even smaller, as they fit their contents tightly.
As for secondary rays, I don't have any current plans. My goal is to keep this project compatible with low-end computers, and I don't think the integrated GPUs could handle BVH/DDA schemes, but it might be worth a try in the future. I will be using more traditional rasterization tricks (like shadow mapping) to improve visual fidelity. Some upcoming issues I do intend to address in the future are transparency and multiple lights, which should be interesting problems in their own right.
@stephenbaynham1762 Před rokem
If you're rendering the front faces to the depth buffer, aren't there potentially cases (such as a case where each chunk has just enough voxels to push the AABB to the full 8x8 size) where chunks will be erroneously concealed? I guess the actual fragment shader will retract that somewhat, do you just deal with this by rendering near-to-far? And if so, wouldn't the reverse (render backfaces, push depth toward the camera in the fragment shader) work the same? I imagine that being able to render fully-opaque backfaces without discard first and then render the rest afterward would provide some sort of performance advantage.
@delphicdescant Před 2 lety
What's wrong with cross-platform C++?
I've only ever had trivial issues getting that to work under a CMake build system, for example.
And there's always cross-compiling.
In fact, I get faster executables for Windows by using GCC in Linux to cross-compile *for* Windows than I do by using MSVC itself.
And for web there's Emscripten.
But now that you've started with Rust none of that probably sounds very necessary. Just thought I'd speak up in defense of C++ :)
@DouglasDwyer Před 2 lety ⁺¹
There's nothing wrong with C++ - it's a useful tool that has been successfully applied in a variety of contexts. What I'm referring to in particular is the difficulty of targeting WebAssembly as a platform using C++/Emscripten. Though Emscripten provides emulation for a number of system functions, if you want to consume some other library - like a compression or serialization library - it can be difficult to find a platform-agnostic implementation that's easy to incorporate and compile with Emscripten. Rust is much nicer in this regard. The whole ecosystem is WASM-aware, and the way that crates are distributed means that many libraries work on both web and native out-of-the-box. In addition, there are quite a few libraries written for Rust that specifically target WASM.
Also, I have personally found coding in Rust much more ergonomic than coding in C++. Rust's modern language features and ownership model are easier to deal with, the syntax is cleaner, and my code just feels less error prone in general. That said, this is a completely personal preference for me.
@delphicdescant Před 2 lety
@@DouglasDwyer Yeah that's fair. I'm not trying to target web with my own voxel raymarching project, so for me C++ is comfortable.
Rust is cool though. I've tried to get comfortable with it a couple times over the last few years, but I keep going back to C++ because I guess I actually kinda like living fast and loose with pointer math.
Anyway, I'm excited to see upcoming devlogs from you. I like the approach you're taking to cut down on the cost of raymarching for older hardware.
@dairin0d Před rokem ⁺¹
I've been kinda wondering... if I understand your explanation correctly, this method does a depth buffer copy pass for each set of 8x8x8 chunks that overlap in screen-space (kind of like depth peeling, I guess). But your video shows a significant depth complexity at certain angles, which (I assume) would require many dozens of these rather expensive depth-copy passes... yet the performance stays smooth. Is there something more going on? 🤔
@DouglasDwyer Před rokem ⁺¹
So, there's an algorithmic "trick" involved with the 8x8x8 chunks which overlap. There is only a single depth-copy that occurs in the scene - you are correct, otherwise the performance would be very slow. Let me try to clarify:
When I draw the 8x8x8 AABBs, the point of the depth buffer is to ensure that closer AABBs are drawn first. If I don't immediately adjust the depth, then the depth of the *AABB* is written naturally to the depth buffer. This means that, if I draw two AABBs which don't overlap *in world space*, the closer one will always show up on top, no matter how I draw the voxels inside. Problems only occur when AABBs physically conflict, which only happens between dynamic, smaller objects in the scene (because they are not axis-aligned). Therefore, I perform two render passes: I draw the main axis-aligned world to a color buffer, and draw the actual depth to another color buffer. The AABB depth is recorded in the normal depth buffer. After that's done, I perform a depth-copy pass, and then draw the dynamic objects by setting gl_FragDepth. Single copy pass :)
@dairin0d Před rokem ⁺¹
@@DouglasDwyer Ah! I see. Though, if I understand correctly, this requires to draw the axis-aligned AABBs either in the back-to-front order with regular alpha-blending, or in front-to-back order with disabled depth test and (1 - destination alpha) blending? Which means that early-Z optimization cannot be utilized by axis-aligned AABBs anyway, and ray marching will be executed for every fragment of every AABB (regardless of overdraw) at that stage.
Is there some additional trick you use here? Or is the ray marching cheap enough that you can afford having dozens or even hundreds of them executed per pixel?
@DouglasDwyer Před rokem ⁺¹
I am not using alpha blending to draw the voxels inside the AABBs. The transparent "air" sections of the AABBs are drawn by calling discard() in the fragment shader, which seems to maintain the early-Z test. No particular AABB ordering is necessary.
I currently haven't implemented any sort of transparency effects; that will be a separate (and interesting!) issue to tackle in the future. The current technique only supports opaque voxels.
@dairin0d Před rokem ⁺¹
@@DouglasDwyer Hm... that's rather unexpected, since every resource on the Internet seems to suggest that discard() disables the early-Z test (the "Early z and discard" thread on Khronos forums had a good explanation why). Could this be specific to your Intel UHD Graphics hardware, somehow? Have you tested it on other low-end GPUs? 🤔
@DouglasDwyer Před rokem ⁺²
This is a very interesting point. Thanks for sending me the resources. I have tested on three Intel integrated GPUs and one 2011 Macbook Pro (performance is worse on this one, of course, but not comparatively worse than other games), and didn't observe anything unexpected. I have also tested on discrete cards, and attained great performance there.
I think the reason for this discrepancy is that most of the internet resources on the topic - at least the ones I've seen - are relatively old. Note that the OpenGL Wiki only states "this will almost always turn off early depth tests on some hardware." Also, all modern hardware now includes conservative depth, which is a similar feature. Its existence implies that, at least theoretically, early depth testing with discard should be possible. I would assume that, on recent hardware (within the past 10 years or so) drivers now make the optimization to keep early depth with discard on. This would explain my observations - I find performance gets much worse when I write directly to depth (without conservative depth) than when I use discard. This is a good point of which to be aware, though - thanks for bringing it up!
@jonathanschenck8154 Před 28 dny
Voxel scale?
@ianmoore322 Před 2 lety
Will this be open source? I'd like to try it
@DouglasDwyer Před 2 lety
Thanks for your interest! At this time, I am hoping to further develop my tech and build a fully-functioning engine. Until then, I'm not planning to release the source code in full. That said, I do eventually hope to make available most of the project - just not yet :)
As an aside, I am definitely hoping to get a demo up and running on the web very soon!
@xaracen7207 Před 7 měsíci
3d noita? 3D NOITA!
@johnthomas338 Před rokem
This is amazing! Can I just say - as a friend - Friends don't let friends vocal fry... You have terrible vocal fry on your voice, it's an epidemic sweeping Americans...
@jjblock21 Před rokem
Everyone is rewriting their voxel engine in rust.
@DouglasDwyer Před rokem
RIIR baby
@yellowduckgamedev Před 2 lety
Please make a Mac version. I need this.
@sjoerdev Před 2 lety ⁺¹
do you have a discord or github?
@DouglasDwyer Před 2 lety ⁺¹
I do have a GitHub (@DouglasDwyer), but my voxel engine is not publicly available on it. As for Discord, I have a personal account (let me know if you want to talk), but not a server. I may create a server when my channel is a bit bigger.
By the way, I've seen your channel before. You have some truly intriguing projects. I didn't even know you could deploy to PS Vita from Unity!
@sjoerdev Před 2 lety ⁺¹
@@DouglasDwyer what is your discord?
@DouglasDwyer Před 2 lety
Please feel free to send me a message at Douglas Dwyer#6326
@jbritain Před 2 lety
You think UHD graphics is bad, my laptop has HD graphics which predate UHD...
@mmheti Před 6 měsíci
Seems cool but audio quality is unbearable, couldn't watch : (
@DouglasDwyer Před 6 měsíci
I hope you will consider checking out my latest video instead - I finally got a real microphone so the audio is better lol
@mmheti Před 6 měsíci
@@DouglasDwyer Yup, already checked it out, noticed the quality difference, and enjoyed the content to the fullest ; ) Thanks!
@Salzui Před 2 měsíci
Do. Not. Ever. Use. UserBenchmark. The page is created by an intel - nvidia fanboy.
@metaversegt-official Před rokem
Hello good job ✨🦜✨ is it possible to chat together ? 👀
@DouglasDwyer Před rokem ⁺¹
Hi, thank you for the interest in my work! You can find my contact information on GitHub: @DouglasDwyer github.com/DouglasDwyer

Další v pořadí

Automatické přehrávání

Doubling the speed of my game's graphics [Voxel Devlog #18]