Doubling the speed of my game's graphics [Voxel Devlog #18]

Douglas

zhlédnutí 19 689

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 9. 07. 2024
Online demo: github.com/DouglasDwyer/octo-...
Additional voxel models: drive.google.com/drive/folder...
With some clever tricks, I've managed to halve the frame times in my voxel game engine! Join me as I explain three essential optimizations for voxel rendering. These optimizations include using DDA for traversal, bitwise masking to filter out potential intersections, and performing a low-resolution depth prepass. In addition, I talk about ray marching in general and discuss the other new features that I added to my engine this month.
Music used in the video:
Chris Doerksen - Perfect Parry
Corbyn Kites - Orbit
White Bat Audio - Athena
White Bat Audio - Chroma
Chris Doerksen - New Groove
Hry

Komentáře • 162

@DouglasDwyer Před 2 měsíci ⁺¹¹
Want to learn how to write powerful, high-performance code? Then be sure to check out CodeCrafters using the link below, and help support my channel:
app.codecrafters.io/join?via=DouglasDwyer
They have one project which is completely free to complete during their beta, and you can begin any of their projects for free! Get 40% off if you upgrade to a paid account within three days.
@GabeRundlett Před 2 měsíci ⁺⁷³
Great video as always! A couple of suggestions:
Regarding using DDA on multiple levels aka a "hierarchy"- HDDA, Ken Museth, implementation in OpenVDB. The basic idea is that the DDA process is pretty much restarted when the hierarchy level changes. An alternative (if you had a shallow tree) would be to have a stack of the DDA traversal state, one copy for each level. This alternative may be too register heavy (and might benefit from the stack being in shared mem instead).
Regarding the voxel memory layout, you show that first comes the 64-bit bitmask and then the data segment. This means that when you fetch a brick, you always load 8 bytes of bitmask data, as well as an additional 120 bytes of data, to fill the cache line. This means that 93% of your cache during traversal will be filled with these data segments!!! try to pack at least 16 bricks together so that you fill all 128 bytes of the aligned cache line with bitmasks!!
@andreirost4696 Před 2 měsíci ⁺²
Eyooo hello
@oamnog Před 2 měsíci ⁺³
Ohh I get it, so that the cache isn't wasted on data segments that may not even be used as the ray may skip the voxel brick, and also the bitmask data of the surrounding voxel bricks are loaded ahead of time to be used spontaneously, and reducing updating the cache frequently.
Determining the neighbouring 15 voxel brick bitmask data to store could be determined using 2x2x4, 1x2x8, 1x4x4 and 1x1x16 bounding boxes that can be offset and rotated based on the ray general direction and position. The ray's position in the bounding box would prob be in the far corner, and the shape of the bounding box (1x1x16, 2x2x4, etc) is determined whether the ray is moving closely axis aligned, or sharply in two axis, etc. You would prob need to take into account the relative hierarchical voxel size that the ray is currently at.
But that approach sound complicated, and you could prob just pack 64 bricks' bitmasks together into 4 128 bytes aligned cache line (but idk how cache works so i may be wrong).
@DouglasDwyer Před 2 měsíci ⁺²⁷
Thanks for the suggestions! I'll be sure to check out the OpenVDB implementation.
As for cache lines, that's a very good point. I'll have to experiment with alternate data layouts in the future. But there's one nuance that I didn't mention - data segments are variable length. So if a brick has a single non-empty child, then the data segment will only be 4 bytes. But coming up with a way to pack the bitmasks (maybe I could store the bitmasks inline alongside each child pointer) might also help with cache coherency! That said, I don't really think that I'm memory bound right now :)
@pusab2064 Před 2 měsíci
This is some crazy low level thinking i aspire to be like this
@JoeTaber Před 2 měsíci ⁺¹
@@DouglasDwyer You can be limited by memory bandwidth separately from being limited by memory capacity. :)
@dleiferives Před 2 měsíci ⁺³³
It’s amazing how much you’ve grown as a programmer. Keep up the good work
@DouglasDwyer Před 2 měsíci ⁺⁴
Thanks! I intend to do so!
@frozein Před 2 měsíci ⁺¹¹
Great stuff, the bitmasking optimization was really clever. As for you hierarchical traversal question, it is definitely possible to avoid the ray-cube intersection test. The basic idea is at every step, you first descend as far down the tree until you reach a non-empty node. Then, perform your DDA step within that node, and if the ray exits the current tree node, ascend back up the tree until you are in bounds again. You do need to reinitialize DDA whenever you change levels though. I actually just implemented this for octree traversal in my engine and it works great!
@DouglasDwyer Před 2 měsíci ⁺⁵
Thanks for watching! I do the same thing in my engine that you describe. What you call reinitializing DDA is what I'm referring to as a ray-cube intersection test, since you use the cube and ray position data to determine the side of closest hit :)
Edit: speaking of your engine, can't wait to see your next video!
@frozein Před 2 měsíci ⁺²
@@DouglasDwyer Ah I see, makes sense. I avoid the ray-cube intersection by subtracting position within the node and multiplying by 2 when descending. I keep a stack of node positions to ascend. It would be slightly more complicated for a 64-tree like you have though.
@NathanNahrung Před 2 měsíci ⁺⁸
Would you see further improvements by expanding on on the beam technique? For example, what if your initial low resolution image was based on the displays aspect ratio? For example, a 16:9 ratio screen would generate a 17x10 square grid of rays (rays at screen edges so grid x & y are +1 the screen aspect ratio) for the first low resolution image, the next image which would start rays further from the camera could have 16x more samples (a 68x40 square grid of rays) as this would force to the next brick layer, and then this pattern could continue until the final resolution image which is at the displays ultimate resolution (including any multiple samples per pixel). Aternativly you could determine the initial low resolution by working backwards from the displays ultimate resolution, rounding up fractions of pixles by overdrawing the field of view for the lesser resolution images, and perhaps going all the way down to a 16 pixel image as the initial low res image.
@yiwmsh4393 Před 2 měsíci ⁺¹
The mountain top looks so distant to me, down here at the start of the trail, so I'm very thankful you're posting the pictures you took up there.
@linksword7110 Před 2 měsíci ⁺¹¹
holy, i thought you said you were going to reduce quality for more episodes. But this is amazing
@DouglasDwyer Před 2 měsíci ⁺³
I am not planning to reduce quality for discrete graphics cards. The lower quality will just be an option for the engine to run on iGPUs
@linksword7110 Před 2 měsíci ⁺³
i meant on the videos :p
@DouglasDwyer Před 2 měsíci ⁺⁹
@@linksword7110 Oh gotcha haha. I'm glad to hear you thought the quality is still good. This video only took 8 hours total to produce!
@nitaki_ Před 2 měsíci ⁺⁵⁵
This is a great video, but you should refrain from using rotating or moving gradients as a background for your explanations. The youtube compression does not like it at all
@franzusgutlus54 Před 2 měsíci ⁺⁴
You are so right!
@DouglasDwyer Před 2 měsíci ⁺¹⁴
Good suggestion. I was curious to see how it would turn out, but perhaps CZcams's compression butchers it...
@danieles6684 Před měsícem
@@DouglasDwyerI actually was going to say I liked it!
@dominobuilder100 Před 24 dny
@@DouglasDwyerI really liked it. Thought that it was a really nice detail, even with compression.
@StiffAftermath Před 24 dny
This project is very exciting to me to see all the persistent progress you are making. I hope you continue to persevere and make the most awesome voxel engine. My game is waiting for an engine such as this. John Lin's engine is amazing, but I dont think he is very open to licensing it out. Thank you for your continued work!! Much appreciated and looking forward to the future! ❤❤
@c0pi Před 2 měsíci ⁺¹
I recognize the place of the demo!! I had a pizza in that place. Nice!
@stevegta6830 Před 2 měsíci
I've always loved voxel engine graphics. Great work on this :)
@phylliida Před 2 měsíci
Ur videos are so good!! Always excited to see another one
@adricklynn8882 Před měsícem
Amazing work! Can't wait to see more
@stormy_vox Před 2 měsíci ⁺¹
your optimizations are really cool! i don’t fully understand the bitmask stuff bit it’s neat
@DanaTheLateBloomingFruitLoop Před 2 měsíci
Really cool stuff! The second optimization was genius!
@tommycard4569 Před 2 měsíci ⁺²
I always look forward to your videos, keep up the great work!
@DouglasDwyer Před 2 měsíci ⁺¹
Will do!
@MatejVancoCG Před 2 měsíci
Really cool. Thanks for such detailed explanation!
@platinumsun4632 Před 2 měsíci
Dude I really love the beam optimization. Looks so cool. Like out of an old 2.5d game.
@247flashgames Před 2 měsíci ⁺¹
You are insane, and I love the quality of your videos & algorithms! You explain your strategies well, and I can’t wait to see more videos! ❤
@DouglasDwyer Před 2 měsíci
Thank you :)
@RyDawgE Před 2 měsíci
This is AWESOME!!!! Very inspirational
@simongido565 Před 2 měsíci ⁺²
I do something similar in my engine. I call it compressed and normal cells. Compressed cell is the one that contains subgrid of selected size with the same voxels. The first advantage is that instead of storing for example 16x16x16 voxels I store only one that represents all of them if they are the same. Then about raymarching. I raymarch compressed grid as if it was normal voxel grid. If i get inside cell that contains only one voxel which means it is compressed, I just return color of the voxel that cell holds. If I get inside cell which contains 16x16x16 voxels it means it was not compressed, I start new raymarching that starts where compressed cell raymarching ended and I raymarch voxel subgrid which has dimensions 16x16x16, then if not hit happend I just continue raymarching compressed cells again and so on. It increased performance a lot and memory wise it is also great. I am thinking about implementing multiple levels but for now I am okay with just two levels.
@simongido565 Před 2 měsíci ⁺¹
Maybe I could draw a picture if you wanted 😆 it might make more sense then.
@sti3167 Před 2 měsíci
kind of like palette compression?
@simongido565 Před 2 měsíci ⁺¹
@@sti3167 more like grid compression? I do not know what exactly is pallete compression :D. Imagine grid 256x256x256 and each voxel has size 1.0. So to compress it and traverse faster I take some number by which is 256 divisible. Lets say 8. So I create new grid with dimensions 32x32x32 ( 256 / 8 ) and voxel size 1.0 * 8. Now I go through original grid by checking 8x8x8 subgrids. If that subgrid has uniform voxel I copy in top compressed grid just single voxel instead all of them. If they are not uniform I copy all of the voxels. When raymarching I raymarch compressed grid as if it was regular voxel grid. But once I get inside voxel (cell) which has more voxels inside, I switch to traversing that subgrid with dimensions 8x8x8.
@simongido565 Před 2 měsíci ⁺¹
So in the end it is grid of smaller grids. Which I create from denser grid.
@sti3167 Před 2 měsíci
@@simongido565 Sounds like you are generating and using LoDs/mips for voxels on multilevel grid(or nested grid, sometimes called brick-map) depending on distance/required details. I would love to see some code of how you generate it/store/traverse, but its up to you : )
P.S Afaik palette compression is about compressing data that belongs to voxels instead of storing it in voxels themselves - for example unique colors or normals etc in a chunk of voxels is being written to a lookup-table(or something like that) and then every voxel indexes into this table instead of directly defining it, that way you can minimize memory overheads using less bits, even without doing quantization/lossy compression.
This might improve performance too, because that way you can ensure you are not memory bound and there are less cache misses because it can fit into shared memory, cache lines etc
@owendavies8227 Před 2 měsíci
Wow, the beam optimization is absolutely genius.
@beholdergamedesign Před 2 měsíci ⁺⁵
I kind of wonder if we need another name for ray marching determined by grid boundaries or bounding volume intersections vs by distance fields.
@procedupixel213 Před 2 měsíci ⁺¹
"Grid traversal" algorithms is the name that I learned for some of these.
@BalintCsala Před 2 měsíci ⁺⁴
Ray marching is just the process of stepping the ray forward by some amount instead of using analytical formulas, if you visit the wikipedia page it tells you the name of those two specific variations, sphere tracing for SDF-s and cube-assisted raymarching for voxel traversal, although I personally see that referred to more as "voxel tracing"
@jonfriedl4786 Před 19 dny
Gosh all these awesome voxel engines I'll never make myself lol.
@hexadeque1101 Před 2 měsíci ⁺¹
Here is my DDA on an octree implementation if it is helpful:
I have basically a stack data structure holding the current node I'm looking at and its parents so that I can walk up the tree. The stack is initialized to just hold the root octree node.
At the beginning of each DDA step in the loop, the current position (rayOrigin + rayDirection*distance) is guaranteed to be within the current octree node on the top of the stack, but the current node is not guaranteed to be a leaf node (the current position is inside one of its children).
while (rayDepth < RENDER_DISTANCE) {
Get the bottom most node containing the current position (the current node is now guaranteed to be a leaf node containing the current position)
Check if the current node is a filled voxel. If so, the ray hit something
Step the ray with DDA (the current position is no longer within the bounds of the current node)
Walk up the nodes in the stack until the current position is within bounds of the current node (sets up for the first part of the loop for the next iteration)
}
Hopefully that made sense. If not, my actual code is in the replies.
@hexadeque1101 Před 2 měsíci
struct OctreeNode {
uint data; // Basically an enum using the following constants
uint[8] children;
};
// Different materials are not implemented yet so full and empty are the only two states a voxel can be
const uint EMPTY = 0;
const uint FULL = 1;
const uint PARTIAL = 2; // For parent nodes who have some full (or partial) children and some empty children
struct Ray {
bool hit;
uvec3 position;
ivec3 normal;
float depth;
};
layout(std430, binding = 1) buffer World {
OctreeNode world[];
};
...
Ray castRay(vec3 rayOrigin, vec3 rayDirection) {
...
// Ray box intersection on the size of the octree in case the ray starts outside the octree is omitted
ray.position = uvec3(floor(rayOrigin));
ray.depth = 0;
vec3 stepSize = vec3(
sqrt(1 + (rayDirection.y*rayDirection.y + rayDirection.z*rayDirection.z) / (rayDirection.x*rayDirection.x)),
sqrt(1 + (rayDirection.x*rayDirection.x + rayDirection.z*rayDirection.z) / (rayDirection.y*rayDirection.y)),
sqrt(1 + (rayDirection.x*rayDirection.x + rayDirection.y*rayDirection.y) / (rayDirection.z*rayDirection.z))
);
ivec3 direction = ivec3(sign(rayDirection));
vec3 rayLength = mix(rayOrigin - ray.position, 1 - rayOrigin + ray.position, step(0.0, rayDirection)) * stepSize;
// The stack is stored as an array indexed by currentLevel which is incremented every time we descend down the octree
// It stores uints which are indices into the world array
uint nodes[OCTREE_LEVELS + 1];
uint currentLevel = OCTREE_LEVELS; // Level 0 is an individual voxel
nodes[currentLevel] = 0; // Root node is always index 0
uvec3 currentNodePosition = uvec3(0);
while (ray.depth < RENDER_DISTANCE) {
while (world[nodes[currentLevel]].data == PARTIAL) {
currentLevel--;
uint voxelWidth = uint(pow(2, currentLevel));
uvec3 center = currentNodePosition + uvec3(voxelWidth);
uvec3 subnode = uvec3(greaterThanEqual(ray.position, center));
currentNodePosition += subnode * voxelWidth;
nodes[currentLevel] = world[nodes[currentLevel + 1]].children[subnode.x + subnode.y * 2 + subnode.z * 4];
}
if (world[nodes[currentLevel]].data == FULL) {
ray.hit = true;
return ray;
}
// DDA step + store hit info for if we hit something
// Possible optimization: instead of taking one unit long steps, take steps proportional to the size of the current node. i.e. if the current node is empty and 4 voxels wide, step 4x the distance. I could not get that working at the moment, though, as it brings extra complications. :/
if (rayLength.x < rayLength.y && rayLength.x < rayLength.z) {
ray.position.x += direction.x;
ray.depth = rayLength.x;
rayLength.x += stepSize.x;
ray.normal = ivec3(1.0, 0.0, 0.0) * -ivec3(sign(rayDirection));
} else if (rayLength.y < rayLength.z) {
ray.position.y += direction.y;
ray.depth = rayLength.y;
rayLength.y += stepSize.y;
ray.normal = ivec3(0.0, 1.0, 0.0) * -ivec3(sign(rayDirection));
} else {
ray.position.z += direction.z;
ray.depth = rayLength.z;
rayLength.z += stepSize.z;
ray.normal = ivec3(0.0, 0.0, 1.0) * -ivec3(sign(rayDirection));
}
// Bounds check
if (ray.position.x < 0 || ray.position.y < 0 || ray.position.z < 0 || ray.position.x >= worldSize || ray.position.y >= worldSize || ray.position.z >= worldSize) {
return ray;
}
// I was lazy and went all the way up to the root node every time and left only going up as far as necessary for "later"
currentLevel = OCTREE_LEVELS;
nodes[currentLevel] = 0;
currentNodePosition = uvec3(0);
}
return ray;
}
@HerrDoktorWeberMD Před 2 měsíci ⁺²
Doing god's work, stuff I wish I had the spare time to achieve. I just pray you don't pull a John Lin by dropping the most gorgeous voxel world renderer anyone has ever seen and then vanishing.
@DouglasDwyer Před 2 měsíci ⁺⁴
Well, I would love to "pull a John Lin" by dropping the most gorgeous voxel world renderer. But I don't intend to disappear!
@shadow_blader192 Před 2 měsíci ⁺¹
Amazing video
@DouglasDwyer Před 2 měsíci ⁺¹
Thank you!
@empireempire3545 Před 2 měsíci ⁺²
Okay i'm drunk but hear me out: beam-tree
You initially cast a single square 'macro-beam' from the camera and advance it forward as usual until you hit first voxel - this is the root node.
The moment you hit the voxel, you split the macro-beam into sub-beams - ideally as few as possible, these are the 1st gen children of the root.
The sub beam which hit the intiial voxel gets handled more or less as you do atm with low and high resolution beams.
The rest of the camera projection proceeds onwards - but at worst case you still need only 4 macro beams to surround the beam (X) with the voxel which you hit (consider how to divide the area of M's):
MMMMM
MMXMM
MMMMM
MMMMM
Looking from the side, the macro beams are truncated square-based pyramids, subdividing into smaller truncated square-based pyramids.
The idea of course is to reduce the number of beam-steps you have to make. Smaller number of beams means smaller number of steps you would have to calculate - and even if the calculation is slightly more complicated (square vs point) then the reduction of 2 orders of magnitude should more than make up for it.
@WlliamBrown Před 2 měsíci
This looks really awesome! I stumbled across your videos yesterday, and have viewed most of them. I have also been researching voxels with marching cubes, but it destroys precise environment manipulation. Have you considered applying marching cubes only within each of your 8x8x8 voxels?
@DouglasDwyer Před 2 měsíci
Thanks for watching! I prefer the look of smooth cubic voxels, so I'm not planning to integrate marching cubes at this time (it would also make ray traversal more complicated)
@0memega Před 2 měsíci
Before this, I didn't know you could even use DDA for raytracing. As I only used it for raycasting. Amazing video.
@CSPciccionsstibilepippons Před 2 měsíci ⁺¹
I tried the native demo today: with my Ryzen 3500u integrated GPU it runs at around 50 ms with default settings, but with 2x downscale it runs under 20 ms even with maximum lod bias and I can't even notice the difference in resolution 🎉🎉🎉
@bytesandbikes Před 2 měsíci
Very nice! 😀
@djordjepepic1656 Před 2 měsíci ⁺¹⁰
You might want to check out Nvidia's paper 'Raytracing Sparse Voxel Database Structures on the GPU', they outline a hierarchical DDA approach for efficient empty space skipping that you might be able to implement in your engine. You should read the paper for a better explanation, but the basic idea is to do DDA for all levels of the hierarchy at the same time, going down a level on collisions and going up a level when leaving a 'chunk' of the current level.
@DouglasDwyer Před 2 měsíci
Interesting - I'll be sure to check it out :)
@nathanruiz3424 Před 5 dny
Instead of standard ray/plane intersection marching, maybe you can use SDF ray marching when you aren't using DDA. SDFs are very GPU friendly and the SDF of a cube does not use division. Plus you can round/bevel your cubes using SDFs for added artistic desicions.
@Z_Z.t Před 2 měsíci
for DDA in different sized pow2 boxes just replace multiplication with bit shifts and integer arithmetics
@seth111yta1 Před měsícem
As voxel engines advance and relative voxel size shrinks, is there an advantage to just rendering voxels as a point cloud with like face-camera colored squares or something? great work BTW and thank you for actually making browser demos
@moonshot3159 Před 2 měsíci ⁺¹
holey moley this guy's good! Have you also considered similar optimizations that Vercidium (yt channel) did with his voxel engine? Or are those a different beast all together.
@DouglasDwyer Před 2 měsíci ⁺¹
Thanks for watching! I think that most of Vericidium's optimizations are very good, but they mostly apply to rasterizing voxels. Since I'm ray marching, they don't apply.
@Z_Z.t Před 2 měsíci
10:48 isnt using simple rasterization with depth gonna give better depth values? (im aware about greed meshing)
@Wizarth Před 2 měsíci
You mention that you ensure the beams never go far enough into the geometry hierarchy to miss a voxel that is smaller than the grid/pixel size.
Do you adjust the depth based on distance from camera? I.E. does it go deeper into the hierarchy when closer to the camera?
@DouglasDwyer Před 2 měsíci ⁺²
Yes, that's correct. The maximum distance that a ray can travel depends upon the size of the voxel (i.e. hierarchy depth) and the distance between adjacent rays. Adjacent rays "spread out" as they progress with a perspective camera, so that needs to be accounted for.
@victorwidell9751 Před měsícem
The low resolution pre rendering pass was interesting. Would it help to do it multiple times? I imagine it could be done with each level of an octree.
@jujuteuxOfficial Před 9 dny
For the beam optimisation: Have you considered that some single-voxel particles could be entirely missed?
@matanon8454 Před 2 měsíci
Its amazing! Is water flow simulation something you consider to add at some point? Would love to dig some rivers and see water flowing 😍
@DouglasDwyer Před 2 měsíci
I haven't studied fluid simulation yet, so unfortunately I probably won't get to it for a while. But I would love to have water like John Lin's!
@guillaumemagniadas2472 Před 2 měsíci
When you're speaking of sparse tree, are you speaking of sparse voxel octree ? I wrote an alternative of DDA algorithm that would compute a different step depending of the current level of depth in the SVO
@Octal_Covers Před 2 měsíci
Which DDA algorithm are you using? Different algorithms have difference performance characteristics so it might be worthwhile to look at some other options.
As for stepping through various sized voxels, would it be possible to multiply the amount you add to the ray position? Since it's a line, it should be the same relative positions
@DouglasDwyer Před 2 měsíci ⁺¹
I am using a similar technique to the one described on this page: lodev.org/cgtutor/raycasting.html
The problem is that when you step between grid levels, the distance from the ray to the current X/Y/Z planes changes in a non-uniform way. You need to update that distance somehow, and as far as I can tell updating the distance and recomputing it are pretty similar computationally. So every time I switch grid levels, I need to recompute the DDA variables. This is what I refer to as performing a box-intersection test, since it's the same thing mathematically.
@Octal_Covers Před 2 měsíci
@@DouglasDwyer you could attempt updating it each time the voxel size changes, and if it's not as performant as box intersection tests then you could just stick to box intersection tests.
The algorithm I'm using for my voxel game is the Fast Voxel Traversal Algorithm, which doesn't use as much trig functions
@spidermankey1398 Před měsícem ⁺¹
You can do bresenhem's algorithm instead of DDA which is more efficient
@DouglasDwyer Před měsícem ⁺¹
Bresenham's algorithm is not conservative - it may miss voxels on the line that rays pass through. As such, using Bresenham using would result in gaps and artifacts appearing on objects.
@Lantalia Před 2 měsíci
Neat! Have you done anything with multiple, unaligned, grids, so you can, for instance, have a moving ship that is itself a voxel grid? It seems like your rendering solution may make that a lot easier than I would normally expect
@DouglasDwyer Před 2 měsíci
While the renderer supports unaligned objects, I haven't re-added voxel entities on the game logic side. The functionality is there - hoping to take advantage of it in a future episode!
@JosephGleespen Před 2 měsíci
@@DouglasDwyer How do you ray trace the unaligned objects, or rather shoot rays through/past the unaligned obejcts? Do you trace them in separate passes and the composite them into a single image? I'm mostly trying to conceptualize how one would perform DDA while encountering things that are off grid.
@owendavies8227 Před 2 měsíci
I guess for preserving DDA calculations between scales, if you're just going up or down a single level, the numbers would be 4 times greater or less with no other changes, and that could be performed just with bitshifting and nothing else, which would be pretty fast.
@owendavies8227 Před 2 měsíci
I guess you could keep track of how many levels you are stepping between and do a bitshift of double the number of difference in levels of hierarchy, but it would probably be more computationally efficient just to do a double bitshift each time so you don't have to keep track of anything.
@DouglasDwyer Před 2 měsíci
Unfortunately, I don't quite think that this works. With DDA, you store the distance along the ray to each plane of the current voxel. But if you move up/down tree levels, the planes may shift by non-uniform amounts. There are perhaps ways to update the plane positions based upon the initial and final voxel, but I don't know of any that are more efficient than just reinitializing DDA from scratch (which is equivalent to a ray-box intersection test).
@owendavies8227 Před 2 měsíci
@@DouglasDwyer I am willing to believe I made some kind of mistake.
@owendavies8227 Před 2 měsíci
@@DouglasDwyer I don't know if this would necessarily be faster, but you could, if nothing else, theoretically update the plane positions in linear time. If the axes are aligned to the voxel grid (which could be done with a matrix multiplication for the camera and ray just once in the beginning), it would mostly just be addition, subtraction and bitshifts to move the planes around.
@AIShipped Před 2 měsíci
That’s amazing progress
@EndroEndro Před 2 měsíci ⁺¹
Sadly i'm getting error panicked at 618 in web-d2eddb7ca2dff036 js file (Could not load GPU adapter) and 602:21 in opera other browsers and exe the same issue.
@DouglasDwyer Před 2 měsíci ⁺¹
Thanks for trying the demo! It sounds like your computer doesn't support Vulkan or WebGPU. I will add an error message in the future for unsupported devices.
@slent2410 Před měsícem
So do you play on releasing this to the public once its completed or in a useable state?
@DouglasDwyer Před měsícem
There's already a demo playable online! But yes, I hope to turn this into an actual game someday.
@UnofficialFoneE Před 2 měsíci ⁺¹
Yes, this video made me "bricked" 👀
@steluste Před 2 měsíci
Are you using any engine/framework such as Monogame or it's all from scratch? Also do you have a public repository of the project?
@DouglasDwyer Před 2 měsíci ⁺¹
The project is written entirely from scratch in Rust and uses WebGPU as the graphics API. The project is not open-source at present, but many components of it (like the event system are networking system) are open-source on my GitHub!
@amaryllis0 Před 2 měsíci
Game looks great, i gotta second that the rotating backgrounds make me feel nauseous tho
@monstercameron Před měsícem
is this signed distance fields? 3:19
@UltimatePerfection Před měsícem
You should try getting the voxel resolution higher, with each voxel being 10cm or less.
@teabow. Před 25 dny
1:45 Is the entire world a single tree?
7:35 I understand the masking optimizations, but how do you figure out the masks for the rays? There are so many positions and directions.
@DouglasDwyer Před 25 dny ⁺¹
Thanks for watching!
1. The world is divided into 256³ chunks. Each chunk is its own tree. A pointer from each chunk to its tree structure is stored in a 3D array.
2. The masks are procedurally generated at compile-time. There are 64 possible positions within a brick and eight possible octants for the direction (determined by -x or +x, -y or +y, and -z or +z). Hence, there are 512 masks. I generate them by looping over every combination and testing all voxels to see which ones the mask should include.
@teabow. Před 25 dny
@@DouglasDwyer Thank you for answering!
I thought the rays were projected from a point behind the screen? How can their directions be so limited, then?
@DouglasDwyer Před 25 dny ⁺¹
You're correct - the rays themselves can have virtually infinite directions. However, the masks don't need to be one-to-one. All rays that point toward the same octant are grouped. So a ray whose direction has a negative x-component, positive y-component, and negative z-component will be grouped with all other -x,+y,-z rays. The same goes for all other combinations of positive and negative cardinal axes.
This way, we can eliminate all voxels that lay to the opposite side. If a direction has a negative x-component, then we know it cannot hit voxels that have a positive x-displacement from the ray, for example.
@teabow. Před 24 dny
@@DouglasDwyer Oh, I get it now! What an awesome optimization.
@Deniil2000 Před 2 měsíci
you mentioned u32 and u64 in your video. are you writing this in Rust?
@DouglasDwyer Před 2 měsíci
The project is written entirely in Rust and uses WebGPU as the graphics API.
@lumiey Před 2 měsíci
There is an LOD 'leaking'? artifact when I fly a bit high up
@DouglasDwyer Před 2 měsíci ⁺¹
Yep, the LOD generation isn't perfect. The LOD leaking shouldn't happen as much with procedurally generated terrain and other solid models. It happens because the church has a lot of thin surfaces and so the engine isn't sure which surface to put on top when generating the higher LODs.
@user-cl1rq1sg8m Před 7 dny
Could you improve the performance of the voxel game veloren
@Midazc Před 2 měsíci ⁺¹
Did you slow down the audio track? Sounds much more natural on 1.25x
@DouglasDwyer Před 2 měsíci ⁺¹
Speaking slowly and clearly is important for public speaking, so that's what I aim to do. But you are more than welcome to enjoy the video at 1.25x if you'd like!
@Harald723 Před 2 měsíci
YES MORE...
@lumiey Před 2 měsíci
It runs super smooth on my PC, but I can't see how fast it actually runs because I can't disable vsync in online demo
@lumiey Před 2 měsíci
My GTX 1060 can handle 1500x1500 resolution at 60fps (2.25 mega pixels (1080p is 2.0736MP), there is no performance difference between web/native
@paxcoder Před 2 měsíci
So down to 60ms? But it would have to be ⌊1000/60⌋ = 16ms to run at 60fps right?
@DouglasDwyer Před 2 měsíci ⁺¹
Yep! As noted in the video, the benchmarks presented were on a very bad GPU (Intel UHD 750H). On any discrete card, the engine runs at a buttery-smooth 60 FPS - for example, the scenes in the video run at 4 ms (or 250 FPS) on my NVIDIA 1660 TI. It's possible to make the game run in realtime on the Intel GPU as well by lowering the resolution. But for benchmarking I wanted the frame times to be as big as possible, so I chose a worst case.
@paxcoder Před 2 měsíci
@@DouglasDwyer Too bad a discrete card is a requirement, as I'm afraid I only use integrated cards, but that is a whole lot of voxels there, so I guess it makes sense.
@DouglasDwyer Před 2 měsíci ⁺¹
Discrete card is only a requirement for 1080p. If you play the game at a lower resolution, you can definitely hit 60 FPS on an iGPU. But don't take my word for it - try the demo and evaluate for yourself! There is an option in the settings menu to downscale the image resolution for better performance :)
@xezvcikgames1191 Před 2 měsíci
@@DouglasDwyer can you please show in pseudo-code how you do ensure low res wont step too far 11:06 ? i didnt quite get it, also what it would be like to trace more coarse lods in pre-pass, for example on CPU side? still could avoid a lot of traversal iterations. How do you even make sure its done in conservative manner? wont it require supercover dda or smth, hard to imagine tbh
@meetem7374 Před 10 dny
Hello, I've actually stumbled across this exact problem of DDA being fixed, and I needed to traverse the octree in a more efficient manner. And actually I found the solution, it's pretty dirty, but it works. IDK if you can contact me on CZcams, but you can try.
@meetem7374 Před 10 dny
Other optimizations in mine and your raymarcher are actually pretty simillar :) for bitmasks etc
@Stevidgry Před měsícem
vertically sectioned chunks?????
@StylishHobo Před 2 měsíci
Rather than your custom data structure. Have you considered using a k-d tree instead?
@hombacom Před 13 dny
Online demo doesn't work on my iMac 27 Chrome/Firefox or iPhone
@DouglasDwyer Před 13 dny
Thanks for trying out the demo! If you'd like to help debug the issue, please open a ticket at github.com/DouglasDwyer/octo-release. Open the developer console in Chrome (Firefox is not supported) and copy-paste what you see :)
@hombacom Před 13 dny
@@DouglasDwyer I'm too lazy so I do it here.. Uncaught (in promise) Error: Using exceptions for control flow, don't mind me. This isn't actually an error!
at imports.wbg.__wbindgen_throw (web-1b9437e1ec3ce582.js:958:15)
at web-1b9437e1ec3ce582_bg.wasm:0x5e1292
at web-1b9437e1ec3ce582_bg.wasm:0x2dc81b
at web-1b9437e1ec3ce582_bg.wasm:0x56f919
at web-1b9437e1ec3ce582_bg.wasm:0x591e27
at __wbg_finalize_init (web-1b9437e1ec3ce582.js:1948:10)
at __wbg_init (web-1b9437e1ec3ce582.js:1984:12)
octo-release/:1 No available adapters.
octo-release/:1 No available adapters.
web-1b9437e1ec3ce582.js:618 panicked at game\hal\src\gpu.rs:242:51:
Could not load GPU adapter.
Stack:
Error
at imports.wbg.__wbg_new_abda76e883ba8a5f (douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582.js:602:21)
at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[4786]:0x5e5856
at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[2553]:0x578261
at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[3660]:0x5d775c
at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[3610]:0x5d6b19
at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[3809]:0x5da3ca
at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[691]:0x38bb86
at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[1936]:0x537364
at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[1780]:0x52167d
at douglasdwyer.github.io/octo-release/web-1b9437e1ec3ce582_bg.wasm:wasm-function[459]:0x229111
@DouglasDwyer Před 13 dny
Thanks for doing this, I really appreciate it! For whatever reason, it looks like your web browser doesn't support WebGPU. Eventually, I'll get around to adding a proper error message for when devices aren't supported.
@thomasmcbride2287 Před 2 měsíci ⁺³
THREE DIFFERENT OPTIMIZATIONS??? YOU’RE A WIZARD!
@jamesoneill4607 Před 2 měsíci
Fewer rays
@shadow_blader192 Před 2 měsíci ⁺²
1:55 isn't it called raycasting?
@sjoerdev Před 2 měsíci ⁺²
raymarching is raycasting in small steps
@shadow_blader192 Před 2 měsíci
@@sjoerdev raymarching is raycasting based on minimum distance to objects?
@sjoerdev Před 2 měsíci ⁺²
@@shadow_blader192 thats sphere tracing which is a form of raymarching
@shadow_blader192 Před 2 měsíci ⁺³
@@sjoerdev oh those names
@DouglasDwyer Před 2 měsíci ⁺²
No idea, everyone seems to have their own names for these algorithms lol
@R.Daneel Před 2 měsíci
Wow. CZcams compression isn't doing your grey background any favours!
I can't tell from the video. Have to fallen down the SIMD rabbit-hole yet?
@TheRyulord Před 27 dny
This is being rendered on GPU using a custom shader so everything is inherently SIMD
@beholdergamedesign Před 2 měsíci
A voxel brick has 8 64 bit occupancy masks?
@DouglasDwyer Před 2 měsíci ⁺³
Since a voxel brick has 4^3 = 64 voxels, with one bit per voxel, each brick has 1 64-bit occupancy mask.
@mohammadazad8350 Před 2 měsíci
Premier!
@mohammadazad8350 Před 2 měsíci
So apparently commenting "First" in French earned me a heart... my studying is paying off!
@timmygilbert4102 Před 2 měsíci ⁺¹
@@mohammadazad8350je suis là pour tester ton français 👀 voyons voir si ça paye tant que ça 😂
@mohammadazad8350 Před 2 měsíci
@@timmygilbert4102 "I am ??? for test your (familiar) French. Let's see see if that pays ??? that that"
By interpolating, we get:
"I am going to test your French, let's see if it paid off".
I can't believe a CZcams comment just gave an exam on something I do for fun :).
@mohammadazad8350 Před 2 měsíci
*gave me
Also, I genuinely had to resist the urge to look up "tant" in the dictionary.
@timmygilbert4102 Před 2 měsíci
@@mohammadazad8350 good job 👍😁
@gsestream Před 2 měsíci ⁺¹
try to render your engine nightmare scenario, ie the worst case geometry. multiplication is always cheap operation. division might not be, but will usually be also cheap on modern hardware. no op required is better than have to do op tho. also if you use sphere collision volume overlapping, then you need only one test per cell, not 6 for all sides of a cube cell plane boundaries. but you are testing coordinate plane boundaries for the DDA universally I think. so instead of storing stuff in the data structure, why not use the 64-bit as a pointer to the voxel data, which could be anything, not limited to the 64-bits. pointer to data structure is very fast. some optimizations are not worth it and make the logic a mess. order of magnitude optimizations should be the focus always. oh you try to compress the data using bit storage. if you make the assumption of large empty areas, then you can compress the data with RLE ones and zeros (voxel no voxel) for traces. so you are not using BVH to traverse top-down the voxel tree, that would compress the data also. bvh top-down traversal will work like DDA but processes the tree in the occupancy order, and stops if there is no stuff voxels in the tree leaf. usually you can just do the old style draw distance capping to limit amount of voxels drawn, even if you have billions. the old trick. even when doing true 3d traces. ie only render closest to a draw distance, even for lighting bounce traces. less is more. to get an idea of actual performance, try 4K resolution and see how badly it chucks or not. making an off-line renderer to test any features at high res and lighting rendering gives directions. you know what to do. maybe use an off-line pre-calc sdf for the ray marches, at least for static parts of the scene. maybe separately test dynamic meshes. instead of complex hit detection, you would just raster all voxels. as triangles. difference is not significant in general, not in the order of magnitude margin range. try rendering a voxel cloud, which is spongy ultra sparse voxel mesh. sdf can be updated on fly locally where voxel mesh edits happen.

Další v pořadí

Automatické přehrávání

Emissive voxels and fancy lighting [Voxel Devlog #19]