Gaussian Splatting and Photogrammetry with 360/spherical imagery

Matthew Brennan

zhlédnutí 11 796

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 4. 10. 2023
'Old Centaur' photogrammetry model: skfb.ly/oxITY
Vernazza spherical imagery photogrammetry model: sketchfab.com/3d-models/verna...
COLMAP: colmap.github.io/
NerfStudio: docs.nerf.studio/
3D Gaussian Splatting for Real-Time Radiance Field Rendering: github.com/graphdeco-inria/ga...
Aras-p’s Unity Project: github.com/aras-p/UnityGaussi...

Komentáře • 81

@LaunchedPix Před 7 měsíci ⁺²
Excellent video. I just stumbled onto your channel and see I have a lot to catch up on and learn (from you) in the 3D modeling and visualization space. Lots more to watch and then play with. Your reigniting my curious spirit to expand beyond photogrammetry into NeRF and Gaussian Splatting. Thanks for spending your time sharing this content and making and sharing all of your videos! 👏👏👏
@djorkez Před 7 měsíci ⁺⁴
Nice model of Portovenere 😉
@AndreasMake Před 6 měsíci ⁺¹
Love this place, Porto vernere. Just south of La Spezia, Italy. Beautiful place. Nice 3d gs.
@marinomazor.adventures Před 6 měsíci
Nice work 🤟
@mankit.mp4 Před 7 měsíci ⁺³
Wonderful works Matthew thanks for showing us the possibilities and differences of photogrammetry and gaussian splatting. I'm a product designer working with a lot of craft making communities in stunning locations where I'm dabbling in such ways of documentations. Having studied yours and a few people's insta360 workflows I wonder whether the use of a full frame camera with ultra wide angle lens/fisheyes for 4K videos and using the extracts of stills for the software processing can provide a more optimal resolution and at the same time faster workflow?
@MatthewBrennan Před 7 měsíci ⁺¹
Yes- a full frame camera with a good lens will work better! Coming from a photogrammetry workflow typically, I almost always use an A7Rii with a 12mm lens for architectural capture. I used stills instead of video, but I’ve been experimenting with both lately - it’s always a balance of speed (in terms of captures but also processing) and quality
@pixxelpusher Před 7 měsíci ⁺¹
@@MatthewBrennan Can you specify 180 degrees instead of 360 in the workflow? I have a Meike 6.5mm fisheye lens which I imagine would be the same as only using one hemisphere of the insta360, but higher resolution as it's using the full sensor of the Sony
@MatthewBrennan Před 7 měsíci ⁺¹
@@pixxelpusher It depends on what type of scene you're trying to capture: a fisheye lens would not work well for something like a sculpture, because the object of interest would only occupy a very small amount of "real estate" on the sensor (so to speak) - but it should work very well for rapidly capturing a large urban scene like a piazza, which you should be able to do in far fewer photos than with something like a 20mm-35mm "wide" lens.
@mankit.mp4 Před 7 měsíci ⁺¹
@@MatthewBrennan Oh wow a 12mm. Didn't expect you'd go for something that wide since there might be distortions, but I suppose it depends on how it's corrected in post and obviously the wider it is the less likely you'd miss something, which is super important. Yes the optimal quality vs speed workflow is something I'm constantly trying to reach! Will stay tune for more of your conents keep it up!
@inkobako Před 6 měsíci ⁺²
Amazing stuff! I'm just getting into Gaussian Splatting your videos are really insightful about just how capable it can be. I want to work on a project using splatting but the constraints I'm working with are very limiting. What do you think the absolute minimum amount of low res photos in a 8m by 8m space that would be able to reproduce a scene?
@MatthewBrennan Před 6 měsíci ⁺¹
It depends how many occlusions there are (for example a very cluttered space with many objects) vs. if it’s an empty space (ie a gallery with flat art on the walls).
@bolloxim1 Před 6 měsíci
Very awesome, I've been looking at large area renderers. Question is there any issues capturing large areas with drones that might have shadows 'move' over the course of time. Any thoughts on capturing motion objects ? I've been reading on 4d Gaussian splatting using time as well. Could you capture for example a savannah like table top mountain for example but also capture the motion ?
@MatthewBrennan Před 6 měsíci ⁺¹
Shadows aren't too much of an issue as long as it's not drastic (i.e. trying to combine photos from 10am with ones from 5pm). Moving objects won't work from the perspective of photogrammetry, but it's not a problem if there are moving objects (i.e. cars, people, etc) in a broader scene. The so-called "4D" gaussian splatting is a bit misleading, because those captures were done with multi-camera rigs (10-20 cameras at least) in a controlled environment (such as a studio).
I've combined photo sets of buildings taken years apart - as long as the key features don't change, it's quite possible to combine photos taken at different times of day and in different seasons. You may have to use manual control points to "force" some alignment.
It's possible to capture some apparent "motion" in NeRF or GS scenes (like cars or people moving, or reflections changing) - but this is all based on the input imagery.
@identiticrisis Před 7 měsíci ⁺²
Is there a way to use a crude surface extraction technique to exclude those errant splats? I know they contribute to the detail in the reconstructed images, but they cause serious issues everywhere else. I imagine that removing them will be much more of a benefit given Gaussian splatting is intended to produce interactive spaces.
Something like a low poly model as a "bounding box" to tidy up the point cloud? Clearly the point cloud itself is at fault, but I wouldn't know how to improve the outcome in this case.
It seems like a combination of these techniques you've been showcasing would be very powerful indeed.
@MatthewBrennan Před 7 měsíci ⁺³
the Unity project I've been using (by Aras-P, available here: github.com/aras-p/UnityGaussianSplatting ) was just updated with an editing tool to delete the "floating" splats...
@natelawrence Před 7 měsíci ⁺¹
@@MatthewBrennanThanks for the heads up.
I've been waiting for someone to implement cropping for these scenes.
@shark3D Před 7 měsíci ⁺¹
all I know is this looks like the location I had to model for fast/furious 8 (iceland dock) but probably just a similarity
@topvirtualtourscom Před 7 měsíci ⁺¹
Great video, it looks like 360 video for gaussian splatting is almost unusable because of such low quality result. Could I use full frame Sony a7iii camera and fisheye lens 7.5mm for videos and photos for gaussian splatting or it should be 12mm lens, what is the widest lens possible to use for good result, both videos and photos?
@MatthewBrennan Před 7 měsíci
I think it really depends on the resolution of the video - in this case I was using an Insta360 one 5.7k camera, which produces pretty grainy video - the new 1" sensor insta360 appears to take much better quality video. A full frame camera with wide angle lens, either shooting video or stills, will definitely produce better results (as you can see from the statue scan at the end of this video)!
@topvirtualtourscom Před 7 měsíci ⁺¹
@@MatthewBrennan Thanks for a quick answer. I am actually using 8k Insta 360 pro 2 but still dont think it will be good for gaussian splatting. You think I could use fisheye lens 7.5mm or its better to use 12mm lens? And I think the village in video is Portovenere.
@MatthewBrennan Před 7 měsíci
@topvirtualtourscom3619 fisheye lenses typically aren't great for photogrammetry because of the amount of distortion - and as with spherical imagery, you're putting a lot of data onto the sensor. I have some fisheye and wide-angle datasets that I'm planning to process in the next week and I'll post the results and a comparison.
My opinion is that the 12mm lens would work better than the 7.5mm, while still capturing a very wide field of view.
Also- you're right! It's Porto Venere! Good eye. How can I contact you?
@dreadthedrums Před 4 měsíci
Wow - amazing work and thanks for the detailed run through your workflow. Looks like you have spent a fair bit of time with both metashape and colmap, have you worked out if it is possible to georeference a splat? I imagine this would ensure the rotation, scale are preserved. I use a mavic with RTK for photogrammetry here in Australia, and can get a regular point cloud to within 20mm accuracy with good ground control points, if you could do that with a splat l, it would be an absolute game changer
@MatthewBrennan Před 4 měsíci
I don't see why you couldn't georef a splatted cloud - especially if you've got GCP with scale bars. The applications currently are primarily visual (i.e. generating video)
@dreadthedrums Před 4 měsíci
@@MatthewBrennan thanks for the response. Any idea on a workflow that might work, as Colmap doesn't support georeferencing to coordinate systems if I understand correctly
@MatthewBrennan Před 4 měsíci ⁺¹
Metashape allows georeferencing, however all of that (including cloud/model orientation and transform) seems to be stripped during the Gaussian splat training process. It shouldn't be hard to integrate, but it'd require coding skills beyond what I possess! :)
@dreadthedrums Před 4 měsíci
@@MatthewBrennan Interesting. I'll have a look at Metashape to Gaussian workflow for now. Assume there is a way to export the poses and sparse cloud in a format that is acceptable to be trained. Interesting that the gaussian strip's the info when every individual splat has a location relative to some coordinate system.
@MatthewBrennan Před 4 měsíci
@@dreadthedrums here's the Agisoft export script (exports in "COLMAP" format that train.py expects): github.com/agisoft-llc/metashape-scripts/blob/master/src/export_for_gaussian_splatting.py
@vassilisseferidis Před 7 měsíci
Great video Matthew. Thank you.
I am following the same workflow with an Insta360 Pro which supports a higher (8k) resolution . The result is good only if you follow the same path with the original recording camera but if fails if you try to wander off. Is there a way to improve the quality in your opinion?
@MatthewBrennan Před 7 měsíci ⁺¹
Unfortunately I think the only way (at the moment - this technology is still new and no doubt will advance quickly) is to use higher resolution, low-distortion images. For example, an 8k 360 is still only giving you 8x1k images once you split them apart using NeRFstudio, whereas a frame camera will give you a 7000px x 4000px image, you'll just have to take more photos.
The gaussian splatting method fails if you move from the camera path because those are the only locations that splats have been "trained" from - in other words, when viewed from a different angle, there's technically no data about what color the splat should be (because there was no photo of it).
@MatthewBrennan Před 7 měsíci ⁺¹
I think another big impact is the number of points in the initial COLMAP alignment - look up some strategies for increasing points in the sparse cloud - since those are what are used for the splatting. Less points = less splats (for example, the "Bike" scene from the GS paper had 6 million splats!).
@foolishonboards Před měsícem
that's what I'm also wondering after watching this video and reading the comments. wouldn't there be any way's to generate more points from those 8x1k images ? @@MatthewBrennan
@MatthewBrennan Před měsícem
@@foolishonboards Yes - I've found using Agisoft metashape and simply upping the tie/key point limit works!
@foolishonboards Před měsícem
@@MatthewBrennan thanks for the info. did you tried to use an AI tool to bump up the resolution of these 8k x 1k images before feeding them to photogrammetry process ?
@RiccaDiego Před 2 měsíci
Hi! Amazing information! I think you can help me with some information.
I have point clouds from a BLK360 scanner from leica. Do you know if it is possible to turn these point clouds into Gaussian Splatting?
Thanks a lot!
@MatthewBrennan Před 2 měsíci
no, probably not, because the Gaussian splatting is based on image data, not LiDAR data. You could use a photogrammetry program to align your LiDAR datasets to imagery, though.
@samueljames1511 Před 6 měsíci
With a Insta 360 pro, Would it be better to use the images of the 6 fisheye lenses, Undistorting them and then putting them into Colmap or would it be better stitching them, using nerf studio then using Colmap
@MatthewBrennan Před 6 měsíci
You will have to use some method of splitting the equirectangular (360) images into a series of "flat" frames that can be understood by the GS scripts (the GS method on github likes COLMAP format/structure, unfortunately). The fastest/easiest route to this that I've found is to use NerfStudio to automatically split the 360 images into either 8 or 14 (depending on the amount of overlap you want) frames. Then you can align these in COLMAP or in metashape (and export in colmap format).
@HeadPack Před měsícem
Very informative video. You are showing a textured model from photogrammetry in the end. How does one create that?
@MatthewBrennan Před měsícem ⁺¹
You need a photogrammetry software. In this case, I used Agisoft Metashape, but there are free/open-source options, such as COLMAP and VisualSFM
@HeadPack Před 29 dny
@@MatthewBrennan Thank you very much for that information. Much appreciated.
@user-dg2tr1oh5l Před 5 měsíci ⁺¹
Great video. Thank you.
I tried NerfStudio to convert 360 pictures, but there were always black line in 0,4,5,6,7 picture when 8 images per equirectangular image is used.
@MatthewBrennan Před 5 měsíci
Strange, I’ve never seen that - I’ll try it again and see if I can reproduce it.
@panonesia Před 3 měsíci
@@MatthewBrennan can you share how you make planar projection from equirectangular image ? I tried NerfStudio but the result not good, colmap only found 4% poses, sad
@MatthewBrennan Před 3 měsíci
@@panonesia try extracting 14 frames instead of 8, for more overlap. I haven't used anything other than NerfStudio, so I can't make any suggestion there unfortunately - of course it could also be an issue with COLMAP settings - try changing features detected, etc... I stopped using colmap and only use Agisoft Metashape for alignment now.
@panonesia Před 3 měsíci
@@MatthewBrennan ah... so metashape have good feature for planar image? any special setting for aligment? generic preselection using source, estimated or sequeantial?
@MatthewBrennan Před 3 měsíci
@@panonesia metashape is an industry-standard photogrammetry software. If you're working with video frames you can do sequential, otherwise I leave it on source (which will use GPS if you have drone or gps exif)
@PierreJeanLievaux Před 5 měsíci ⁺¹
I know it is Porto Venere La Spezia
@blackhatultra Před 7 měsíci
is it possible to use it in a regular 3d package? light, work on model and redener. As for now I don't see any mesh so how this technique is usable?
@MatthewBrennan Před 7 měsíci ⁺¹
Right now you can edit the splats, but the point cloud cannot be re-lit, the lighting is baked in. As for its use: for the moment I think it’s a solution in search of a problem. I can see this being immediately useful in virtual production though.
@blackhatultra Před 7 měsíci
is it possible to apply a z-defocus?
@@MatthewBrennan
@natelawrence Před 7 měsíci
How many training iterations are you using when generating your 3D Gaussian Splatting scenes?
Also, how much VRAM does your GPU you're calculating them with have?
@MatthewBrennan Před 7 měsíci ⁺¹
30,000 iterations, A100 with 40GB
@natelawrence Před 7 měsíci ⁺²
@@MatthewBrennan Hmm. Thanks for replying. That is definitely a bit disheartening.
The cloudy results here can't be blamed on lack of computing resources or (as a result) not enough training cycles.
I wonder to what extent using consecutive neighbor matching for video frames during feature matching and scene reconstruction is to blame.
I fully understand how much how much time that would save during image comparisons. I guess it's not as clear to me to what extent non-consecutive input images are/aren't compared after the camera poses are estimated via matching consecutive images.
In other words, after each image is compared with the one directly following it (and therefore has been compared to both the one that precedes it and follows it) and the point cloud and camera poses are calculated, does the typical Structure from Motion bundle adjuster then look at the camera poses and say, 'Based on how I've reconstructed the scene so far, these frames are looking at the same area of the point cloud, so even though they don't directly neighbor each other temporally, I'm going to compare their features to each other to increase the accuracy of the reconstruction.'?
@morganwwww Před 6 měsíci
jessuusss christ@@MatthewBrennan
@smcclure3545 Před 6 měsíci
how well would this work with 360 video walking through a building with multiple rooms and hallways?
@smcclure3545 Před 6 měsíci
particularly if there's already an underlying point cloud generated by a previous lidar scan?
@MatthewBrennan Před 6 měsíci ⁺¹
In this case I would use the 360 imagery to texture the lidar scan. I'm not sure Gaussian Splatting would add any value there.
@smcclure3545 Před 6 měsíci
@MatthewBrennan Thanks for the reply. underlying the question is the issue of reducing the time it takes to both process repeat scans and produce updated imagery and/or geometry for building maintenance and operations. 3d surface models from reality capture are more accurate, but seem more process intense and less relatable yo an average user.
I wonder, would you be willing to have a meeting with me? I'm conducting research as an innovator in this space for my job and it would be helpful to "project a trendline" for where the tech is headed.
@MatthewBrennan Před 6 měsíci ⁺¹
@@smcclure3545 sure- feel free to send me an email.
@smcclure3545 Před 6 měsíci
@@MatthewBrennan done, thanks 😊
@Zanaga2 Před 3 měsíci
Is it possible combining multiple cameras to get a better quality and keep the 360 capturing workflow speed?
@MatthewBrennan Před 3 měsíci ⁺¹
If you had multiple 360 cameras you could mount them vertically on a pole and move that through the scene, yes (getting numerous heights would be key).
Another solution would be multiple mirrorless cameras mounted at 3 heights, so something like 12-15 cameras total, firing simultaneously as you moved through an environment.
@Zanaga2 Před 3 měsíci
@MatthewBrennan oh, I forgot to specify, but I was thinking about regular cameras (non-360). My bad.
But thanks for answering, I might try it one day for a long shot with no cuts.
@crestz1 Před 6 měsíci
Hi what’s the application you’re using to view the Gaussian splats?
@MatthewBrennan Před 6 měsíci
These are all visualized/rendered in Unity
@tamiopaulalezon9573 Před 27 dny
What is your drone?
@MatthewBrennan Před 26 dny
Mavic 2
@infectioussneeze9099 Před 23 dny
what are your computer spec?
@MatthewBrennan Před 22 dny
The 3DGS was trained on a cloud workstation instance with an A100 GPU. The video rendering and photogrammetry was done on a desktop with a Ryzen 9 3900X, 4070ti, and 32GB RAM.
@rafalfaro Před 6 měsíci ⁺¹
This is cool but the low framerate made me sick at the beginning.
@nekosan01 Před 2 měsíci
how it accurate compare to epic realitycapture? You talking like old photogrammetry not exist and this is something new and something good, but how it's good? It's requires highend videocard! and result is not better than old photogrammetry what run on old PC just fine.
@MatthewBrennan Před 2 měsíci
Not sure I understand what you're asking. NeRF/3DGS are completely different from photogrammetry, the only similarity is that they use the initial camera pose estimation. NeRF/3DGS at the moment (afaik) don't have a quantifiable accuracy and shouldn't be used for anything beyond visualization.
@RikMaxSpeed Před 6 měsíci
The town is Porto Venere, Italy. Beautiful! 🤩 en.wikipedia.org/wiki/Porto_Venere

Další v pořadí

Automatické přehrávání

Photogrammetry / NeRF / Gaussian Splatting comparison