K-d Trees - Computerphile

Sdílet
Vložit
  • čas přidán 20. 01. 2022
  • One of the cleanest ways to cut down a search space when working out point proximity! Mike Pound explains K-Dimension Trees.
    EXTRA BITS: • EXTRA BITS: More on K-...
    / computerphile
    / computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottscomputer
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Komentáře • 226

  • @JohnnyWednesday
    @JohnnyWednesday Před 2 lety +583

    It's worth noting for any budding game developers that K-d trees have to be rebuilt if objects move so if you're looking to accelerate a locality query between many moving entities? a quad/oct tree is your best choice as it is possible to maintain them in real-time with minimal computation.

    • @ssvis2
      @ssvis2 Před 2 lety +35

      Yep. You also have to factor in computational time to split your data into whatever structure. Dynamic data makes it a lot harder and in some cases can actually cause you to spend more time on tree rebuilding than brute force searches.

    • @SerBallister
      @SerBallister Před 2 lety +14

      Typically both can be used, KD-trees for static objects which are then placed (by their bounding boxes) in a spatial hierarchy like an Oct-tree.

    • @GilesBathgate
      @GilesBathgate Před 2 lety +5

      So actually, you can do some transformation on the k-d tree such as scaling all axis by a factor or translation. With translation, you just move the splitting planes, relative to your origin or query point.

    • @SerBallister
      @SerBallister Před 2 lety +4

      @@GilesBathgate Yeah. Or the opposite, the ray or whatever youre testing can be transformed from its world position into the kdtree's space.

    • @JohnnyWednesday
      @JohnnyWednesday Před 2 lety +4

      ​@@SerBallister - No matter how much you transform the KD tree? it's not going to help you manage a world full of moving entities at interactive frame rates. Transforming coordinates between two different reference frames is a moot point.

  • @nullablebool
    @nullablebool Před 2 lety +144

    Dr. Pound - the teacher I wish I had. What a guy.

    • @GamerMartin96
      @GamerMartin96 Před 2 lety +16

      I was lucky enough to be taught by Mike for a couple of modules at uni. Definitely one of the best in the department, alongside Steve Bagley and the other UoN professors on Computerphile

    • @ghosttwo2
      @ghosttwo2 Před 2 lety +10

      Looks like Toby Maguire's dad.

    • @paulbunyangonewild7596
      @paulbunyangonewild7596 Před 2 lety +1

      This sounds like a dirty innuendo. If that is the case..... I would also love to have him as a teacher.

    • @JamienM
      @JamienM Před 2 lety +1

      Pounding out the knowledge 😁

    • @danceswithdirt7197
      @danceswithdirt7197 Před 2 lety

      How funny. I was just thinking I'd love to take a course with the dude.

  • @jacob_90s
    @jacob_90s Před 2 lety +137

    Great video, but just want to note as well that multidimensional trees can also be used for things other than just geometric use cases; they can be also very useful if you need to search for data with multiple ranges. If, for instance, you had a database of retail transactions, and you wanted to search efficiently for all transactions within a certain date range AND with a minimum and maximum price, you would need to use a k-d tree or something like it, to be able to efficiently find the results.

    • @jacob_90s
      @jacob_90s Před 2 lety +2

      @Sky Gardener Yep, but the issue comes down to type. Trying to use a geometric data type to hold dates, numbers, strings, etc is possible (to a limited extent) but an unnecessary pain. The problem is that this is an index that should be generic, and let the designer choose the types (like regular indexes) yet most database developers are treating it like an application specific problem, which should only ever be used for geometric problems

  • @zacharychristy8928
    @zacharychristy8928 Před 2 lety +22

    The KD tree really shines for me when you represent it with a simple heap. You get an advanced data structure with range/radius searches, and nearest neighbor queries, all in logarithmic time for no extra memory! The kd tree would just be an array of points!
    If you want to pick the split dimensions more smartly (like if you have a long, thin pointcloud) you only need one array of bytes that represents which split occurs at each node.

  • @niks660097
    @niks660097 Před 2 lety +7

    K-d Trees are used extensively in RTS games, to calculate gameplay stats e.g damage based on range or units attacking closest enemy etc...

  • @astropgn
    @astropgn Před 2 lety +44

    10:37 I think that a good visualization to see if it could be in another rectangle is to draw a circle with radius = the distance from the point to G (the closest distance so far). The circle goes to the other rectangle, so it means that it could have a point in there with a smaller distance. But since it doesn't go to any other square, it is impossible to have a point there it a smaller distance.

    • @iwanadai3065
      @iwanadai3065 Před 11 měsíci

      does that accurately represent the process the computer takes with a kd tree?

  • @mickenev
    @mickenev Před 9 měsíci +2

    This is hands down the best explanation I have seen of K-d trees and it is done in a way that captivates attention. Well done and thank you from college students around the world.

  • @oneratdylan
    @oneratdylan Před 2 lety

    The timing of this is amazing. I learnt about KD trees last week and I'm using them in my game. Gotta love a KD tree

  • @Alig4realz
    @Alig4realz Před 2 lety +5

    Every time I see Mike Pound on Computerphile its an instant click

    • @tissuepaper9962
      @tissuepaper9962 Před 2 lety

      Same with the OEIS guy whose name I can't remember.

  • @RobLang
    @RobLang Před 2 lety

    Awesome video! The right balance of humour, description and algorithm. Brilliant.

  • @archivedandgone
    @archivedandgone Před 2 lety +2

    My favorite Computerphile professor!

  • @GilesBathgate
    @GilesBathgate Před 2 lety +4

    It was mentioned that this doesn't really work for edges and facets, but actually you can do this. Objects are either on one side of the splitting plane, or the other, or they span the plane, in which case they are added to both left and right cells/leafs of the tree.

  • @erickweil4580
    @erickweil4580 Před 2 lety +3

    I just finished implementing a Bounding Volume Hierarchy for my 4D Game and this video pops up. To build the BVH I use a very similar approach as is used to build the kd tree using the objects centroid as points, but then spliting is stoped once some amount of objects, like 4, are on the leafs, then work from bottom up calculating the combined AABB's of the leafs, and the AABB's of each branch node so that the AABB contains every child (they may overlap).
    In the end you have a structure that with a few AABB tests let you decide which objects are colliding with you, in the best case scenario, with 4 childs each node, you only need to do 60 tests to find a object in the middle of 1M others (The tree will have a depth of 15, each level you check for 4 nodes to decide where to continue), but in practice, the nodes may overlap leading to a few more checks.

  • @moralboundaries1
    @moralboundaries1 Před 2 lety +12

    These data structure videos are fascinating! More please! Can you look into the data structures that are used for meteorological and cosmological simulations?

  • @Mutual_Information
    @Mutual_Information Před 2 lety +22

    K-d Trees.. Computerphile doesn't shy away from the more esoteric topics!

    • @DantalionNl
      @DantalionNl Před 2 lety +1

      If looking for more esoteric tree data structures I'd suggest BSP trees or octtrees as well

    • @lancecoleman7440
      @lancecoleman7440 Před 2 lety

      brave is brave

    • @MysticPing
      @MysticPing Před 2 lety +3

      Both BSPs and KD-Trees are very common in computer graphics

    • @kiramaticc
      @kiramaticc Před 2 lety +6

      Esoteric? I thought K-d trees were commonly taught in most Comp Sci degrees. I know I learned K-d trees at my university.

    • @Mutual_Information
      @Mutual_Information Před 2 lety +3

      ​For CZcams, I'd consider an explainer on K-d trees pretty niche. It's one way to speed up nearest neighbor search - seems pretty specialized to me.
      But I see your point. A CS student would *not* think it's esoteric at all. And maybe I should factor that in. This is Computerphile after all

  • @NickAskew
    @NickAskew Před 2 lety +14

    Ah, now I see where I possibly went wrong. We were using a k-d tree (actually 3D tree) looking for points and indeed sometimes the points were over the border and I guess we were not working back up the tree. I'll have to see if that was ever fixed. Actually I think we were looking for the nearest triangle on a surface and if I remember correctly we used the tree to determine likely nearest points and then build a list of triangles that met at that point and then calculated the nearest perpendicular to the plane of the triangles (or to the edge if the nearest point on the plane was outside the triangle) and then we could decide which triangle the point was nearest to.
    What I remember from years ago is that building the spatial index is far more costly than using it. So for purposes such as lidar I'd expect they need to refresh the index so often that using it the way you suggest seems like it might be expensive in terms of processing.

  • @rickhackro
    @rickhackro Před 2 lety

    Absolutely amazing.

  • @normalPlayerSeverEarth
    @normalPlayerSeverEarth Před měsícem

    Thank you so much for explaining this instructively.

  • @victornoagbodji
    @victornoagbodji Před 2 lety

    This is an amazing explanation! Thanks 😊 🙏

  • @edmunns8825
    @edmunns8825 Před 2 lety +1

    As an embedded engineer I like decision trees, It's all we can run on an MSP430. But you can do a lot with them if you know how to train them. Thanks WEKA!

  • @kaiweiyeo4279
    @kaiweiyeo4279 Před 2 lety +2

    I love Computerphile. Thanks for educating the public! Have you considered a series on a confusing thing called Design Patterns by the Gang of Four?

  • @RonJohn63
    @RonJohn63 Před 2 lety

    The splitting reminds me of what's done in Quicksort, and the comparisons are tree pruning.

  • @jeremystott8188
    @jeremystott8188 Před 2 lety

    Great video, very interesting!

  • @mbakenemdusink9757
    @mbakenemdusink9757 Před 2 lety +3

    Sounds like by far the most expensive operation is going to be the calculation of the actual distance between to points, requiring two multiplications and a square root: sqrt(x^2 + y^2), and you'll have to do that a lot.
    I once had to solve this problem back in the 90s for an interactive graph editor, so we had to determine the node closest to the mouse.
    We used the much faster Manhattan distance (x + y), and that worked really nice.
    Algorithms also derail in time when your starting point is far away from the actual nodes.

    • @EmersonPeters
      @EmersonPeters Před 2 lety +3

      I was going to say that when comparing distances you can skip the square root, but then realized that the correct distance is needed if deciding if a branch can be pruned.
      Edit: Wait, no, you can just square the distance to the line instead lol

    • @tissuepaper9962
      @tissuepaper9962 Před 2 lety +1

      You don't need to take the square root. Squared distance works just as well. There are actually a *lot* of alternative distance functions you can use that match euclidean distance better than Manhattan distance does without breaking the computational bank.

  • @jameshamann465
    @jameshamann465 Před 2 lety

    I listen to these while doing other activities. Mostly just because I like Mike's calming British accent

  • @CryingMG
    @CryingMG Před 2 lety

    This guy is the best!

  • @slam_slam_
    @slam_slam_ Před 2 lety

    Awesome video!!

  • @todayonthebench
    @todayonthebench Před 2 lety +3

    The downside with the K-d trees is that the data needs to be prepared by making comparisons before it can be used to find the closest point to another point. Depending on how long it takes to prepare the data, it might actually be slower than other methods.
    Though, most methods also needs to prepare their data as well, making it rather an argument about how the data is prepared and how long that takes. (Though, for really small datasets the brute force approach can be statistically the fastest.)
    Another method of finding the closest point to another point or in a given direction of movement (as in ray tracing) is to group the points based on location. Quantizing it into smaller lists. Then we only need to check the points in the neighboring lists, or the lists relative to our direction. And quantizing data into lists is a relatively trivial task since it does no comparisons nor calculates any medians. This however has many downsides of its own, so it too won't be ideal in a lot of situations. (And one can recursively quantize the lists into new lists, how many items a list contains is though a debatable topic.)

  • @sudiptoborun
    @sudiptoborun Před 2 lety

    I'm studying Graph Theory and I find this fascinating.

  • @AndrewHelgeCox
    @AndrewHelgeCox Před 2 lety

    BVHs have become the basis for ray tracing hardware but at the turn of the century, kd-trees were the state of the art in interactive ray tracing. Later, an interest in build and refit times for dynamic meshes led researchers to find ways to make BVHs competitive for the static mesh case.

    • @AndrewHelgeCox
      @AndrewHelgeCox Před 2 lety

      Look for papers from Saarland by Ingo Wald, Carsten Benthin and others from twenty years ago to get the history of kd-trees and BVHs in fast ray tracing.

  • @davidm2.johnston684
    @davidm2.johnston684 Před 2 lety +3

    Very interested in computer graphics and the bounding volume hierarchy over here!

  • @seg-v6139
    @seg-v6139 Před 2 lety +1

    very informative video as always :)

    • @coolwei1427
      @coolwei1427 Před 2 lety +1

      How Tf you know it good? It only out 8min and the vid is 13 lmao

  • @ncb4_69
    @ncb4_69 Před 2 lety

    Thank ya
    Wish you all the best

  • @Voltra_
    @Voltra_ Před rokem

    That's a neat data structure

  • @nerdinator2807
    @nerdinator2807 Před 2 lety +2

    DR MIIIIKE POOOUND

  • @MGtvMusic
    @MGtvMusic Před rokem

    Amazing!

  • @usama57926
    @usama57926 Před 2 lety

    Nice explanation

  • @zxuiji
    @zxuiji Před 2 lety +5

    9:52, around this point it occurred to me there's an optimisation for distance checking, normally sqrt() is used but it's actually sufficient to just multiply the difference between the coordinate axis against each other and use that as a distance for comparing to other distances of the same kind

    • @alegian7934
      @alegian7934 Před 2 lety +2

      what if you had points with the same x value? then x1-x2=0 so your product is 0, while their distance is not. Also intrestingly, your product dx*dy could be interpreted as the area of the rectangle between the points. Kind of how you can click & drag your mouse on your desktop and it highlights a rectangle

    • @tesfabpel
      @tesfabpel Před 2 lety +7

      distance squared is used a lot to speed up the check... this avoids the costly sqrt.

    • @zxuiji
      @zxuiji Před 2 lety

      @@alegian7934 just add 1 to both values then multiply

    • @zxuiji
      @zxuiji Před 2 lety

      @@tesfabpel tbh, I think it's also possible to convert it to square root form using the distance between the furthest points on each axis, not gonna try to explain with my phone keyboard though

    • @the-mush
      @the-mush Před 2 lety +1

      @@tesfabpel wouldn't it be even more efficient to just use manhattan distances?

  • @CoolFire666
    @CoolFire666 Před 2 lety +1

    How do you pick the point you're going to use to split on? And what happens if you're splitting in the X direction and there's another point with the same X coordinate?

  • @MrGoatflakes
    @MrGoatflakes Před 2 lety

    Oh sheeet Dr. Mike explaining K-D trees 😄

  • @irfanjames6551
    @irfanjames6551 Před 2 lety +2

    I would really like you to do a video on "Bounding Volume Hierarchy".
    I think they implemented it for Unreal 5 (nanite).

  • @bejamartins
    @bejamartins Před 2 lety

    I remember implementing a k-d structure for a geocahing project in university. Got a 19 in that project :D due to suberb preformance.

  • @lerichroi4129
    @lerichroi4129 Před rokem

    A teacher like no other. My university teacher only made this simple concept extra hard to understand.

  • @defeatSpace
    @defeatSpace Před 2 lety

    The computer processor speed for lidar scanners makes me think about how for the brain, processes emerge and dissolve in parallel in different parts of the brain at different frequency bands like theta (5-8 Hz), alpha (9-12 Hz), beta (14-28 Hz) and gamma (40-80 Hz), where as it might be slower you still have millions of these processors that generate to complete tasks with additional asynchronous verification for accuracy. Not taking into account the conditioning that occurs for natural neural-networks, animal brains are so much better at unconscious data processing despite using slower processors because they have a ton of evolutionary experience that enables practically infinite capability to flexibly allocate further resources for multiprocessing, it all seems to really boil down to how efficiently programmers can diversify tasks and the efficiency of multithreading capabilities for a processor, while still having limitations because processors cannot really support multiprocessing like brains do.

  • @arkhoseu348
    @arkhoseu348 Před 2 lety

    nice explanation

  • @dodogaz
    @dodogaz Před 2 lety +9

    Really interesting algorithm.
    building the tree doesn't seem so fast: how do you choose the next point for the split? If it has to be in the middle, you still need to compute the position of every other point, right?
    Also, I guess the position of the starting point impacts the performance, i mean it will be faster to inspect closest points than the ones farther away.

    • @gothxx
      @gothxx Před 2 lety

      In the middle of the section. So half each time. Also insert might be slower, but search is fast.

    • @ILMTitan
      @ILMTitan Před 2 lety +5

      Building the tree shouldn't be too slow. You need two lists, sorted on X and Y. Then, when you split in one dimension, you just build two new lists, each maintaining the order from the other dimension. That would be O(n log(n) + n log(n) + n log(n)) = O(n log(n)). Mutation however, would be slow, as one extra point could easily change the entire tree.

    • @oscarsmith3942
      @oscarsmith3942 Před 2 lety +1

      @@ILMTitan That is a much less stupid n log(n) algorithm than the one I came up with. The other algorithm you can use is to use Quick Select to find the median in O(n) time, which gives a runtime of T(n) = 2T(n/2) + O(n) = O(n log(n)). I guess mine has the advantage of using O(1) space, but it still will be at least 10x slower in practice...

  • @cannonball7
    @cannonball7 Před 2 lety +1

    Hell yeah, K-d Trees in CRISP 360p

    • @seanski44
      @seanski44 Před 2 lety +1

      Sometimes CZcams ain't that quick at processing :)

  • @DAG_42
    @DAG_42 Před 2 lety

    In the case of a vast quantity of points, what should you do if you don't expand the tree all the way down? Or do you always add all points to the tree?

  • @kebman
    @kebman Před 2 lety

    I like the woolly sweater! I have one on just like that right now! Very tasteful!

  • @bramvandyck1273
    @bramvandyck1273 Před 5 měsíci

    So that was clear! Can you explain the Ball-Tree algorithm too?

  • @z4br4k98
    @z4br4k98 Před 2 lety

    just in time for my computer graphics exam!

  • @josiahsuartono3710
    @josiahsuartono3710 Před 2 lety

    I am currently doing something which requires me to calculate the distance to coastlines for every single pixel of water to a set of land (which includes the coastlines). This algorithm is something I've used and finding Mike talking about it really made me happy :)
    Any suggestion on what algorithm I could try instead?
    Detail of what I've done with KD-Tree: Given a reasonable map (2d array) of water and land, from which a set of land and water locations,
    1. Build a tree using the entire land locations
    2. Make a query for each water location and get the shortest distance

    • @finnaginfrost6297
      @finnaginfrost6297 Před 2 lety +1

      If you want each pixel of water to have a single 'distance to any land' number, you could write a flood-fill / A-Star compute shader to compute all your distances super quickly, and run it on a GPU each time you want to update it.
      Said differently, compute the Signed Distance Field (look it up on wikipedia) of your land/water 2d array.
      What are your 'land and water locations'? Are they special pixels on your array, are they vertices of your coastline?

    • @josiahsuartono3710
      @josiahsuartono3710 Před 2 lety

      @@finnaginfrost6297 Thank you for your suggestions, will look further into these.
      The land and water locations are two mutually exclusive and collectively exhaustive groups of pixels in my 2D array (snapshot of a map).

  • @nyuh
    @nyuh Před 2 lety

    oooohhh, I've actually been coding stuff that uses k-d trees. what a nice coincidence!

  • @rylaczero3740
    @rylaczero3740 Před 2 lety +4

    Wikipedia:
    In particular, when the number of points is only slightly higher than the number of dimensions, the algorithm is only slightly better than a linear search of all of the points. As a general rule, if the dimensionality is k, the number of points in the data, n, should be n ≫ 2^k.

  • @yiwenju2735
    @yiwenju2735 Před rokem

    Detecting the closest point from the video isn't robust, but the rest of the algorithm is correct. People refer to the wiki page of kdtree for finding nearest neighbors.

  • @gnagyusa
    @gnagyusa Před 2 lety +3

    Mike's incorrect about triangles. In fact, many ray-tracers use K-d trees to store triangles, including Equinox3D. They are a bit slower to construct than a BVH, but faster to traverse.

  • @Jkauppa
    @Jkauppa Před 2 lety

    gaussian mixture with noise explanation is the best, ie, probability distribution function estimation

    • @Jkauppa
      @Jkauppa Před 2 lety

      or extending to polynomial function (+ normal noise) fitting, polynomial function mixture model fitting, automatic amount of components based on normal/any noise explanation, any numbers of dimensions

    • @Jkauppa
      @Jkauppa Před 2 lety

      then you forgot what you tried to solve, initially, and got lost

    • @Jkauppa
      @Jkauppa Před 2 lety

      closest points by x-sort, y-sort, etc, ie, dimensional sort bins

    • @Jkauppa
      @Jkauppa Před 2 lety

      you get the closest, nearest, points very fast

    • @Jkauppa
      @Jkauppa Před 2 lety

      I assume O(n log n)

  • @Raptomic
    @Raptomic Před 2 lety +2

    If the point in the example (in the space defined by the lines of E, G and B) was closer the the border and the line of B, shouldn't we be checking for other closer points above B ? In such a case, the line to B wouldn't be the shortest to the space above B (kind of like the situation explained for the space right of E so we would have to check for points there I believe.
    Nice video as always

    • @Refract3d
      @Refract3d Před 2 lety +6

      I think (and don't quote me on this), he's comparing the distance to be in only 1 axis, therefore knowing that the minimum distance was already longer than the distance to G, therefore eliminating anything in that quadrant. I guess in that scenario though, if that distance was actually smaller, you'd have to start checking the points distance themselves...It's unclear how it would be usually calculated.

    • @ILMTitan
      @ILMTitan Před 2 lety

      @@Refract3d Right. If (B.y - X.y) < Distance(X, G), you basically repeat the algorithm on the other side of B, finding a best guess first leaf, seeing if it is closer, and working your way back up the tree.

  • @martins4555
    @martins4555 Před 8 měsíci

    Really great video and very well described. Tho the outro is a bit weird with two sounds running at the same time suddenly

  • @finkelmann
    @finkelmann Před 2 lety +5

    Nicely explained as always!
    However... I think you made a small mistake. Around 10:30, youre saying the distance from X to B IS already larger than G-B. But, If B were to move much to the left, but keep the same "vertical" distance from X, there could still be a point on that orange line which is quite close to X. In other words, not all points inside B are guaranteed to be farther from X than B is...
    I may be wrong here, but i think points above the B line should still be verified.

    • @Cookie-qu1gs
      @Cookie-qu1gs Před 2 lety +5

      Although he seems to say the distance from X to B, he's actually talking about the distance from X to the line through B, which is an important distinction. Any point above that line must be *at minimum* as far away from X as the line is, and since the distance from X to the line is greater than the distance from G to X, we know that no points above that line could possibly be the closest point. We can find the distance from X to the line through B by simply comparing the Y values of X and B, which is a lot better than having to calculate the distance using both X and Y co-ordinates. So not only does checking the distance to the line rather than the point ensure you know when you can and cannot avoid checking points beyond the line, it is also faster to check than checking the distance to the point.
      Hope this cleared up your confusion! :D

    • @finkelmann
      @finkelmann Před 2 lety +1

      @@Cookie-qu1gs exactly, you pretty much clarified what I pointed out. The distance to the line is what i referred to as the minimal "vertical distance". So yeah, in that case the area inside/above B really is all cleared.

  • @AlessXC-lz4kc
    @AlessXC-lz4kc Před 2 lety

    Very nice

  • @radoslawlanga1809
    @radoslawlanga1809 Před 2 lety

    This is not how CZcams works, one second you are watching kittens playing and next thing you see are K-d trees videos.;)

  • @baehyunsol3349
    @baehyunsol3349 Před 2 lety +1

    You had 15 points on the plane, which is 2^4 - 1. Once you chose A and divide others into 2, each side had 7 points, which is 2^3 - 1. 2^2 - 1 for the next iteration, and 2^1 - 1 for the final. Was it intentional to begin with 2^n - 1 points?
    Anyway, thanks for a great video!

    • @G12GilbertProduction
      @G12GilbertProduction Před 2 lety

      Probably he draw a 14 points, not 15 at all. So, 14 ^ 2 - 1 it could be final output before (O(x)) goes pretend to reverse this iteration.

  • @MattRiddell
    @MattRiddell Před 2 lety

    Locality sensitive hashing next!

  • @ac_nero
    @ac_nero Před 2 lety

    neat in fact!

  • @372leonard
    @372leonard Před 2 lety

    do a video on bounding volume hierachies please

  • @angelorf
    @angelorf Před 2 lety

    Is the next video gonna be on the bounding volume hierarchy?

  • @_mvr_
    @_mvr_ Před 2 lety

    Do a video on octrees!

  • @AldenLaslett
    @AldenLaslett Před 2 lety +1

    Can you do R-Trees next?

  • @TheEdwinpako
    @TheEdwinpako Před 2 lety

    So in nth dimension u would split with (n-1)th dimensional planes? For example in 3d with 2d planes

  • @aleksandar5323
    @aleksandar5323 Před 2 lety

    Hmm, what do we do, if the node on the right of E is not a leaf node? Do we go further down it and how does the algorithm go?

  • @adamkelemen4607
    @adamkelemen4607 Před 2 lety

    Is there a bound for k in a sense that for some k the method gets inefficient? I read somewhere that from k>6 it gets slow but wouldnt know..

    • @deanjohnson8233
      @deanjohnson8233 Před 2 lety

      Assuming well distributed points, each level of the tree reduces your problem space by a factor of 2 so I don’t think the algorithm would become inefficient for high values of k. It certainly gets slower as k increases, but I don’t think it would grow much worse than any other algorithm.

  • @domnik9062
    @domnik9062 Před 2 lety

    Love binary trees and there exponential cost efficiency

  • @Josh-tu9ji
    @Josh-tu9ji Před rokem

    Sure, it's O(logn) to search through a k-d tree, but what's the time complexity of constructing such a tree. It seems like it would still be close to O(n) if you have to visit a lot of nodes to accurately construct a k-d tree.

  • @sweepingtime
    @sweepingtime Před 2 lety

    Even making an error where G is the child of D is illuminating, because it made me think: why isn't this the child of E? Well, it got me to follow along.

  • @stewdean
    @stewdean Před 2 lety

    So it'll work in six dimensions. And what happens if you want to remove one of the dimensions dynamically - so everything on dimensions 2 and 5 doesn't matter?

  • @Hacktheplanet_
    @Hacktheplanet_ Před 2 lety

    Pound army 💪

  • @mouazq
    @mouazq Před 2 lety

    The search time was reduced at the expense of great spacial complexity (Storing all nodes in a tree structure in memory)

  • @norliegh
    @norliegh Před rokem

    reminds me so much of geohashing just slightly different.

  • @SSJ0016
    @SSJ0016 Před 2 lety

    Could you implement a circle of a growing radius around your point of interest? The first point to contact the circumference would be the closest point.

    • @fugoair
      @fugoair Před 2 lety

      Try implementing your idea. One of the issues you might discover is that the detection of that 'contact' is slower then all the complicated tree stuff in kd tree. As more points are added the time complexity gets worse while the kd tree gets better.

    • @deanjohnson8233
      @deanjohnson8233 Před 2 lety

      How would you determine which pints to check against? Without some spatial ordering, you can’t check in an efficient order. But if you have spatial ordering, you probably don’t need any explicit uses of a growing circle.

  • @runrickyrun157
    @runrickyrun157 Před 2 lety

    Man, that thumbnail. I really thought Gave from The Office was going to tell me about math.

  • @diegonayalazo
    @diegonayalazo Před 2 lety

    Thanks

  • @elietheprof5678
    @elietheprof5678 Před 2 lety +2

    What do you think of bubble-based space partitioning? Spheres which contain spheres which contain objects, for example. It seems like the easiest space-partition for beginner programmers to implement... but it must come with some disadvantages, right?

  • @BigSoul29
    @BigSoul29 Před 2 lety

    Hi, I noticed that when I try looking for coordinates nearest to the center, it won't give me the ones that I expected. I willl plot my coordinates on a scatterplot, I see that there are coordinates literally on (0.0) but when I try finding the 10 nearest points to the center it will give me points much farther than the ones I see exist on my scatterplot.. Can you explain why that is? Evem if the computer iterates through all the coordinates and finds the child coordinate how is it that it gives me different results than I am expecting, can someone explain?

  • @Theraot
    @Theraot Před 2 lety +1

    Two plus two is four minus one that's three, quick mafs. 11:33

  • @jacklee5876
    @jacklee5876 Před 2 lety

    If the root represents the starting point of your search, does it mean that one must create a K-d tree for every point so each point knows how to traverse on their own tree?

    • @tscoffey1
      @tscoffey1 Před 2 lety

      It's like a binary search. A properly chosen root node is optimal for most searches of the tree.

  • @cauchyschwarz3295
    @cauchyschwarz3295 Před 2 lety

    What has not been adressed is how you find the point that is in the middle when you decide the next split-point. Don't I have a similar problem as in the beginning of having to sort the points along that axis and check them individually?

    • @jtsiomb
      @jtsiomb Před 2 lety +1

      There are kd-tree construction strategies. You can pick a point at random, you can pick 3 points at random and pick the middle of the 3, or you can go through the whole lot of them. But also in some cases you might not even want to split down the middle. A common construction strategy is to optimize splits to minimize the surface area of each subtree bounding box (surface area heuristic). It all depends on what you're going to use your kd-tree for, and how much time you have to spend on constructing it.

  • @lucidmoses
    @lucidmoses Před 2 lety

    Clever

  • @MrChipmunk1
    @MrChipmunk1 Před 2 lety

    Wish this video had been published before my algorithms final a month ago

  • @Veptis
    @Veptis Před 2 lety

    How frequently do people come up with these smart algorithms and structures to reduce time, memory complexity?

  • @PMA65537
    @PMA65537 Před 2 lety

    8:55 Is there a difference if the closest is not a leaf node?

    • @KieronTaylor
      @KieronTaylor Před 2 lety

      That's what the backtrack up the tree tells you. You don't know all the possibilities until you've visited a leaf, but once you have, any of the nodes on the branch is a viable solution. The most wasteful case is a point near your top level node, but at least it's the same number of moves for all solutions - preferable in all but "lucky" situations.

  • @ForSquirel
    @ForSquirel Před 2 lety

    I just wanna know, If I buy that sweater, jumper, whatever you call it, will I automatically get a +1 to intelligence? Cuz I'm down for that.

  • @esquerte
    @esquerte Před 2 lety +3

    Now we finally know how to find G point.

    • @miallo
      @miallo Před 2 lety

      On that note: 11:42 "I need to not count my Exes"

  • @MysticPing
    @MysticPing Před 2 lety +1

    If only you'd done this video two weeks ago and I would've done better on my last exam haha. Could not for the life of me understand them.

  • @solsticeprojekt1937
    @solsticeprojekt1937 Před 2 lety +1

    Did he mention how to pick the point to split at? Because he's just picking optimals. If I have to search for the ones in the middles it's not actually worth it for dynamically moving items? Edit: Right. He picks the median point, he said.

    • @albertbatfinder5240
      @albertbatfinder5240 Před 2 lety +1

      Median implies the points undergo a sort every time the point array is presented for evaluation. If you’ve already sorted them by x-axis, then y-axis, any binary search is going to hone in on a set of nearest point candidates pretty quickly. Maybe that’s all he’s doing with the entire explanation. I may have implemented this myself, without calling it a k-d tree or even honouring it with a name at all.

    • @solsticeprojekt1937
      @solsticeprojekt1937 Před 2 lety +1

      @@albertbatfinder5240 Thanks for the reply!

    • @deanjohnson8233
      @deanjohnson8233 Před 2 lety

      You could also just pick a random splitting point. Some splits will not be very helpful, but on average it can work well. What strategy you use will depend on how often you have to rebuild the tree compared to the number of queries against the tree.

    • @solsticeprojekt1937
      @solsticeprojekt1937 Před 2 lety

      @@deanjohnson8233 Thanks! I think that should have been mentioned. I gotta try that some day. I think this can get really fast using cmovcc. I'm sure I'm not inventing anything new here. I knew of kd-trees, but never actually how to implement them. :D

  • @JBantha
    @JBantha Před 2 lety +1

    as a human this seems overly convoluted to compare a point but I'm happy for the computers to have a way to do it.

    • @tissuepaper9962
      @tissuepaper9962 Před 2 lety +1

      No fair, you've got a hardware accelerator for calculating euclidean distance.

  • @theprofessionalfence-sitter

    So, is this method just faster on average or even in the worst case?

    • @deanjohnson8233
      @deanjohnson8233 Před 2 lety +1

      Depending on how you pick the splitting points, it is theoretically possible that the tree just becomes a linked list where you could have to check every single point.
      In the video he mentioned using the median point as the split point. This would never lead to the linked list tree (and thus better worst case performance) but requires you to be able to sort the points along each dimension. Other strategies could remove the need to sort but could theoretically lead to very bad worst case performance. For example, you could pick a random splitting point - but if every random choice just happens to make the tree a linked list….

  • @richbuilds_com
    @richbuilds_com Před 2 lety +3

    Can only view in 360p. Is this retro episode? ;-)

    • @DantalionNl
      @DantalionNl Před 2 lety

      This is we don't wait for youtube to process the video before publishing and just hit publish as soon as the upload of the source file is finished~

    • @seanski44
      @seanski44 Před 2 lety +1

      @@DantalionNl or schedule it assuming CZcams have their act together to process something in a timely fashion...

  • @ChrisWalshZX
    @ChrisWalshZX Před 2 lety +3

    I thought Mike was just going to go into binary trees and I knew this wouldn't be enough to solve the problem. The splitting on a specific point's X/Y/Z axis is really interesting.
    As a side note, and I'm sure MIke will agree but it kind of off-topic but the program would only actually compare and store the SQUARE of the distances as this saves performing an expensive SQRT function for each distance measured but still allows you to compare which value is smaller between two different distances.

    • @APaleDot
      @APaleDot Před 2 lety +5

      Using a k-d tree, you don't use the square because you are mostly comparing along a single axis.

    • @the-mush
      @the-mush Před 2 lety +1

      @@APaleDot also, even if you where using more than one axis, wouldn't it be even faster if you used a "Manhattan" distance? Don't see the point of squaring the numbers if you don't need the real "euclidean" distance.

    • @APaleDot
      @APaleDot Před 2 lety

      @@the-mush
      Maybe I'm misunderstanding what you mean, but the goal of the algorithm is to minimize the euclidean distance. To find the closest point.
      The k-d tree just allows you to rule out certain points because you know they are further away along one axis than the current smallest euclidean distance.

    • @deanjohnson8233
      @deanjohnson8233 Před 2 lety +1

      @@the-mush if you are trying to find a “close enough” point, then you can use the Manhattan distance. If you want to have the point that is truly the closest in the usual sense, you have to use Euclidean.

    • @the-mush
      @the-mush Před 2 lety

      @APaleDot yeah! I wasn't talking about the k-d tree, just was thinking about the differences between squared distances and manhattan. Talking about it in other thread, I now realize that squaring the numbers first allows you to avoid a problem you get if you oversimplify the algorithm and only sum them up (as in manhattan distances).
      However, as you said, if you only consider one axis at a time, none of that matters since you only work with the deltas one-dimensionally. I think that detail of essentially working in straight lines along the axes made me remember the manhattan distances thing, but the relation is way thinner than I first thought.

  • @phyzix_phyzix
    @phyzix_phyzix Před 2 lety

    I'm confused. If B was all the way to the left (x = 0) then a point above B could potentially be at a shorter distance than B.

    • @Tumbolisu
      @Tumbolisu Před 2 lety

      Yes, but X is so far away from the line that goes through B, that nothing above that line could possibly be closer than G.
      He made the unfortunate mistake of placing B right above X. So whenever he was trying to point at the line, he also had to point at B itself.