Fibonacci Heaps or "How to invent an extremely clever data structure"
Vložit
- čas přidán 17. 05. 2024
- I want to tell you about a daunting, but truly fascinating data structure. At first sight, Fibonacci Heaps can seem intimidating. In this video, I'm going to show you all the necessary steps to invent a really clever data structure.
00:00 Introduction
00:50 Priority Queues and Binary Heaps
05:44 Fibonacci Heaps
08:24 Amortized Analysis
10:28 ExtractMin
16:54 DecreaseKey
22:02 3 Questions
28:16 Final Words
Animations created with Manim
Music: Goldberg Variations, J.S. Bach, Kimiko Ishizaka
Sources:
Cormen, Leiserson, Rivest, Stein: Introduction to Algorithms ‒ Third Edition (the chapter on Fibonacci Heaps is available for free on their website under "Material Removed from 3e")
Fredman, Tarjan: Fibonacci heaps and their uses in improved network optimization algorithms.
Vuillemin: A Data Structure for Manipulating Priority Queues
#SoME2 - Věda a technologie
There are a couple of things I wanted to mention but eventually decided against because I felt the would break the flow of the video. For those who are interested:
03:22 Throughout the video I'm assuming you can find any given heap element in constant time using some sort of lookup table. This is easier said than done. Depending on your data, you might need to rely on randomization (e.g. universal hashing).
06:08 The textbook Fibonacci Heap implementation uses circular linked lists, while I use regular linked lists. Circular linked lists reduce the space requirements, but they make the operations a bit harder to explain. Just be aware that my operations are slightly different to the ones found in most textbooks.
13:06 This analysis is subtly inaccurate. The size of the root list at this point is O(#trees + max degree), not #trees, because in phase 1, we moved the children of the minimum into the root list. In Big O notation, however, this makes no difference.
14:12 It's actually the maximum degree plus 1 because we need to include degree 0. Again, it doesn't matter in Big O notation.
20:03 One detail I included in the visualization but didn't actually mention: Tree roots are never cut out or marked. They are never cut out, because there part of the root list already. And they are never marked, because it's simply not necessary for bounding the node degrees (cf. 25:50).
27:30 You don't actually need to involve the golden ratio to see that the Fibonacci numbers grow exponentially. It's easy to see that the i-th number in the sequence is at least twice as big as the i-minus-2-th (due to the construction of the sequence and the fact that it grows monotonically).
28:16 There are more Fibonacci Heap operations, which I didn't mention: You can union two Heaps in constant time, simply by concatenating their root lists. You can also implement the operations "Remove" and "IncreaseKey" using a combination of the operations from the video.
28:31 That ExtractMin can't be improved only holds for the general case where your priority queue stores arbitrary values and you find the minimum using comparisons. In special cases (like if all values are integers within a certain range) this running time can be improved.
Unioning two heaps in constant time has potentially great value in constructive solid geometry, so I wonder if that may be one of the interesting use-cases, assuming other details aren’t too expensive for implementation to be worth it compared to other structures and algorithms.
Great video. You made it with Manim?
@@algorists yes he did to confirm you can see description.
@@ashutoshmahapatra537 yes. I overlooked it
it's sad he dont include the code it would be a wonderful documentation and resource
I don’t know what your background is with this sort of content creation, but you’ve done an incredible job with the videos on this channel so far! The production quality is extremely high, and I really appreciate the academic rigor. There’s so much watered down information out there, especially on CZcams and with regards to programming-related topics, but you’ve covered all your bases and that’s no small feat. I *especially* appreciate the pinned comment with commentary on your references. I hope you’re well and take care!
Side note, looking at your other videos I think it’s clear the algorithm is starting to appreciate all your hard work.
I’m here from recommendation. So, yeah it’s working. Anyone showing an interest for DSA will eventually be recommended. CZcams algorithm knows (probabilisticly) the level of a user depending on their searches. My search on CZcams these days are for intermediate to adVanced topics and I’m getting recommendations accordingly. It took two weeks for this video to hit me and I wasn’t even a subscriber. I subbed asap and hit like.
I think the background music fits tho, just look at Mathmaniac , 3b1b or vcubing, or reducible. Most of their videos use classical music of some sort
The only things I remember from school: Merging two Fibonacci heaps was fast.
Does these things really being taught in your school 😐
You know more than most developers
@@prabhavagrawal1712 Ye, data structure mentioned it as a minor improvement for some graph’s spanning tree algorithm I think, already forgot which lol.
@@piaoyugexia You probably mean Prim's algorithm, where you start from some vertex and build up a tree from there by repeatedly adding the edge with the smallest weight. It can be implemented in O(n²) using a Fibonacci Heap (just like Dijkstra).
@@sithdev8206 AHHH YES! Thanks for the information :D
The memories start to come back XD
28:45 There are a couple of extremely important use cases that I know of. In the interest of full disclosure, I've worked on both of these.
First use case, in bioinformatics. Machines that read strands of DNA are not perfect; they make errors. The way we typically fix this is to read lots of copies of the same DNA, and then correct the ones that have low counts using ones that have high counts. For de novo sequence assembly, the algorithm of choice is known as the Tour Bus algorithm.
Tour Bus involves putting the DNA sequences into a graph (called a "de Bruijn graph"), and then running Dijkstra's all shortest paths algorithm on it, traversing high-count paths first, then low-count paths, and then using counts and edit distance to see if it's a worthy correction. And because this is Dijkstra's all shortest paths algorithm at scale (billions of nodes!), we use Fibonacci heaps. See the Velvet paper for details: www.ncbi.nlm.nih.gov/pmc/articles/PMC2336801/
Second use case, in certain kinds of physical simulation. Physical simulation often involves a time step, which has to be carefully chosen: Too small, and the simulation takes ages to run. Too large, and you lose accuracy.
There are physical systems where the appropriate time step varies widely over the simulation domain. An example is carbon sequestration, were you take CO2 off a power plant (say) and inject it into an underground reservoir, such as an old oil well. These reservoirs are essentially rocks with pores and fractures, so fluid flows at geologic speeds. Even though you might inject CO2 at a speed measured in metres per second, the plume through the reservoir might advance at a speed measured in centimetres per year.
The solution is to use different time steps in different areas, a technique known as discrete event simulation (or DES for short). When you advance through time, you need to find the next part of the domain to perform a simulation step. When performing a simulation step, you may find that this affects surrounding areas such that they need to use a smaller time step for their next iteration (which is "decrease key" by any other name). Again, Fibonacci heaps are the tool of choice here. See, for example: papers.ssrn.com/sol3/papers.cfm?abstract_id=3365738
Thank you for sharing
vnice
Ooh, those papers look interesting, I think I know what I'll be reading this weekend.
Wow! I wonder if you might also lose accuracy with a time step that is too small? Like, if the coordinates of a thing and the velocity of the thing are many orders of magnitude apart, then adding a tiny time-step times that velocity will get rounding-errored to no change.
I've never thought much about having different parts of a simulation running on different time steps. That's such a cool idea to have basically an "update priority" for each object in your sim.
@@MCLooyverse There's almost always no penalty for using a time step that is too small, apart from performance. The only thing that comes close is probably catastrophic cancellation if values are too close, but numeric analysts are good at avoiding that.
This channel needs way more views and subscribers. I was shocked by the production quality relative to view count.
Really good! I like your thesis that sometimes we overvalue elegance of implementation - often things are just hard to get right. There's also a lesson here that sometimes a data structure will maintain an elegant invariant (e.g., Fibonacci heaps, Red-Black trees) that makes analysis easy, but actually implementing a data structure that maintains that invariant is kinda grungy.
Great video! Another thing I'd like to add though is that in a lot of cases you don't even have to use heaps for priority queues.
For example, in the example problem shown in the video, how many different priorities the router can have? 3? 10? 256? Either way, it is still a really small number in most cases, and it is know in advance. You can then just have a list (a bucket) for every possible priority, and put a message in the bucket with the corresponding priority once it arrives. That's O(1) Insert. You can also store a pointer to the lowest non-empty bucket, that's O(1) GetMin. DecreaseKey is also easy - just move a message from one bucket to another and maybe update the pointer. And ExtractMin just takes one message from the bucket and if it becomes empty, move the pointer to the next bucket until you hit a non-empty one. That operation is just O(#buckets), which is supposedly a small constant number that doesn't increase with the number of messages, so that's just O(1). This way you get a priority queue with _all_ operations being O(1), as long as your priorities are just small integers. Even if you don't know their maximum value in advance, you can just add new buckets later, and the queue will continue being fast as long as that maximum priority is still small.
This data structure is called a Bucket Queue and it is honestly the best choice for most of the applications, given how simple and fast it is. You can even optimize it if priorities slowly grow over time: all that matters is the number of priority values in the heap at any given time.
Of course, you would still need binary of Fibonacci Heaps if your priorities are floating point numbers or more complicated data, as is needed in some algorithms like Dijkstra's
Good point! In special cases you can do much better than O(log(n)) for ExtractMin. I added a note in my pinned comment. Something like your approach is probably the best solution for my motivating example in the beginning. So-called Van Emde Boas Trees extend your idea even further and allow you to implement ExtractMin in O(log(log(#buckets))).
I know it doesn't really add much value and is kind of trivial. But just in case someone is interested: In reallity, routers (And kernel priorities, for example) use even a dumber approach. Priorities are inmutable, never change. So they don't need the full set of operations. Just put and get.
Is usually solved with an array of piles for each priority, just pop form high to low while empty and return the first found.
@@framegrace1
It also kind of reminded me of how Linux Kernel used to do timeouts. The time to call the callback is kind of priority. They had a series of buckets where the time-resolution would be more coarse the further out. So series A would have 16 buckets each one jiffy resolution and would cover 0 to 15 jiffies into future. Series B would again have 16 buckets but with each bucket would have 16 jiffy resolution. , together they'd cover 16 to 255 jiffies into future. Series C, D, and E again increase resolution and range. Every 16 jiffies a bucket from B series would be distributed to the A series buckets. Every 256 jiffies a bucket from C series would be distributed to the B series buckets. Every 4096 jiffies a bucket from D series would be distributed to the C series buckets. So they'd get progressively better sorted as expiry time approached.
One reason that it finally got changed was the periodic latency bump.
A* is a good example where your priority que will have many possible priorities
Really interesting and smart - great presentation
Just a few things:
- You defined the runtime of DecreaseKey in the heap as O(1) by using a hash table in 4:50. But lookup in a hashtable doesn't come for free in many cases and often will be seen in O(log n) (if you're NOT looking at amortized complexity of a perfect implementation tailored for exactly the data you're using)
- You ARE looking at amortized complexity in your heap, but that is something that only experienced programmers (or the opposite if suited with enough ignorance) should be doing and it would have been great if you explained why this can be a trap in some situations. Depending on use-case you rather have a slower but smoother algorithm than one of amortized speed that will just freeze your code flow for a second or so every now and then.
Nothing wrong with amortized complexity per se, but the user needs to be aware of its constraints.
You're bringing up some valid points. I'm definitely being handwavy about how you actually find the value to decrease. I've added a note to my pinned comment to address this. Although I largely agree with your second point as well, this more general discussion of amortized complexity seems to fall outside the scope of this video.
@@sithdev8206 Yeah, the video is pretty long as it is, but as small comment could have made sense in that chapter.
Other than that, i really enjoyed your work.
Will watch your other stuff soon.
Frankly, this sort of stop-the-world algorithm is often a nightmare for user experience.
Criminally underrated channel. You explained everything so well and the animations are incredibly well-made!
Dude, this is amazing. Incredible quality, rigorous, well explained. By far the best CS video I've seen on CZcams
This is so damn interesting! I wish we covered stuff like this when I studied computer science. Unfortunately we spent wayy to much time drawing absolute useless and boring UML diagrams :/
Brilliant visuals, great use of Manim, and crystal-clear presentation! I hope you post more Comp Sci content!
Incredible quality, I hope you'll keep producing great content :)
I don't often use videos for learning about computer science, but I randomly got this recommended, and the animations were really helpful! Great work!
This is the kind of thing a standard library developer might write and just call "heap" without 99% of users ever knowing about it. Crazy complicated but very clever. I guess having at least one operation be O(log n) is unavoidable, but the impressive part is the DecreaseKey operation being constant time while still maintaining the rest. In practice though, a regular binary tree would probably do ok at it, because the priority probably doesn't change that much in a single iteration. It would likely only need to move up 0 or 1 levels. It's one of those times when big O notation can be deceptive, because if you have to balance 0-2 trees vs swapping 0 or 1 nodes (assuming certain constraints), yes those are both constant but one is still greater. But the Fib. tree guarantees it no matter what. Definitely an interesting structure.
And then in practice turns out that there are only 2 data structures that are actually good for performance: arraylist and hashmap, and everything else was basically just academic curiosity.
Maybe it was different back when RAM was actually true to its namesake, and didn't act like a fast magnetic tape. Then again, with RAM sizes and CPU speeds at the time you could simply scan the entire thing faster than the algorithmic overhead from using advanced data structures.
@@michaelbuckers Hmm, I would add quad trees and octrees to that list, they're very useful for selecting subsets of data by spatial proximity. But you have a point. Good use of the cache can often negate any theoretical benifit of more complex structures.
@@DFPercush Quadtree isn't really a data structure, it's more like a spatial index. You need it because you're shoving multidimensional data into a single-dimensions array. To that end honestly simple chunk partitioning works better, which is a spatial equivalent of plain array. Quadtrees can't even really boast not requiring hyperparameter tuning - that honor goes to KD trees and as you imagine they're not great performance-wise.
This is the best format I've ever seen for teaching a heap data structure!
29:20 „Fibonacci Heaps have shown that more often than not a solution to a sprecific problem is messy.“
Love that insight.
This is THE DEFINITIVE video on Fibonacci heaps and priority queues. Well done!
Fantastic choice of music! The Goldbergs with their perfect ratios are a very clever nod to the material you're covering as well. Well explained as well.
Any sort of background noice is distracting. I loose 10 iq points and can no longer follow the line of reasoning. There is no point in occupying part of my brain like this.
@@albertmagician8613 Good for you -- not even joking, due to the concerning decrease in people's attention span these days -- but you really have trouble concentrating on the voice over -10 decibels of the most chill piano suite ever? That's rough, buddy.
I don't know how much time you had spent in this content and awesome animation so it doesn't become college lecture. Thank you so much for this
This made learning Fib heaps so much easier. I was so confused by what my professor was trying to teach until I found this video. You explain everything so well, thank you so much, you're a life saver
very well done, was just researching this general topic of inventive computer science, love that I stumbled on this video.. 3b1b’s summer math exposition has really made a bunch of interesting video come to life 💜💜
Not just highly informative, but very stylishly composed and presented. (Quite a bit of work, I imagine, to produce something this well done.)
Absolutely fantastic, high-quality content! An outstanding example of an educational video.
I rarely write CZcams comments on such topics, but after watching this, I felt compelled to let you know how much I appreciated your work!
Absolute gem! You look like you just started and I wish you good luck, I'd love to see more of this!
Posting here before your channel blows up.
The quality of your videos is really good! Keep it up and in sure your channel will grow!
super nice quality, good explanation. Video duration fits the contents complexity.
Thanks! You deserve way more views/subscribers!
Ever since I first learned of red-black trees, I've been interested in the various ways trees can be used/maintained to keep data organized. This is an interesting approach.
I love implementing and working with data structures, they tickle my brain in just the right way. It's why they're my goto project when learning a new language.
I'd never actually heard of fibonacci heaps before but they do look super interesting looking forward to implementing one and playing around with it. Thanks for the video really well put together and informative.
Thanks for the content. Your explanations were very clear and accessible. Loved the music also, the only drawback was to chose to really listen to only one of them !
I was struggling to understand this concept and you saved me! Tons of thanks
Now that is what i want from a teacher. Explaining the why instead of just going with the how and what.
Gotta admit, when I checked your subscriber count I misread it as 'M' because there was no way this quality would go unseen.
Well, +1 to sub count and I can't wait to see what more you do!
This is the best made, most clear video on the topic I have ever seen. Amazing!
The main thing that came to mind when explaining why the FIbonacci Heap isn't more commonly used was entirely the amortized complexity thing - if you're running a server, life is great until your clients start doing a bunch of debt-accruing actions (inserts, decreases) and then calling extract min. Whoever called extract min won't be happy.
It's great in a world where you're only crunching numbers and waiting for a single final result after a lot of heap operations (which is why people listed a lot of scientific applications), but most of the computation in a normal day is business logic, which serve lots of individual requests, and therefore much prefer consistency over raw speed.
Super useful! I took an algorithms course once but never learned about this one--thanks for sharing
I love your videos so much, they are super well-produced, informative, and helpful. I hope you keep making them.
I enjoyed this video twice: while watching it and while reading the pinned comment. Excellent work, I hope this channel skyrockets. One final word for you: more
This is a lesson about the importance of accounting for constant time. If you only have a few elements (say, 100k) in your structure to worry about, it really doesn't matter that this super cool algorithm you found outperforms the others... starting at a 1B count and more.
Linked list == missed CPU caches (or any non-contiguous memory, really)
Great video, it made it so easy to grasp all the concepts.
Also it shows why it may not always be practical to use the most optimal structure for everything in a weirdly beautiful way. Thanks so much!
This is an incredibly intuitive way to explain these concepts, great video!
That was a rollercoaster. I couldn't repeat back any of that but that really helped to clear it up.
A beautifully explained video that has renewed my interest in CS! It makes me want to take an algos class now
Thank you so much for this video. I've just spent a beautiful half hour, laid in my bed, my brain in constant and calm concentration.
Awesome video! CZcams actually did a good job with recommendations!
I remember when I was in school a professor briefly mentioned this data structure, but I never looked into it.
Beautiful work! It’s really inspiring to see that there are people creating this kinds of fabulous content.
A beautiful and elegant explanation. Thanks!
Gorgeous video and amazing production and explanation. Congrats!
This is fantastic content, it deserves much more views!
Great explanation; I'll need to rewatch it again several times to really understand it. Superb Work 👍👍🙏🙏
Hey! Just wanna tell you! You are way much better than our professor speaking 3hours but still got me confusing!!!!!!!!! You literally killed it !!!
Why can't I find these videos while I'm sober? I'll be back...
This has the same level of good explanation as Reducible videos, that I love, really good and enjoyable video
Really great job! I really follow yoi second by second!
How did you produce such video? What tool did you use?
Great video, you should do more of these. Algo needs a lot of visualisations to grasp.
Great video. I remember encountering Fibonacci Heaps a long time ago and tried to implement them in C++ with a paper on them. It went fine, but this helped me understand them just a little more. The sad reality of these is, unfortunately, like you mentioned, that they are rarely that practical. Still, it's good to see them explored more as they really are innovative.
Good video! I still need to learn a lot in order to fully understand the explanation, but I'll get there some day. Untill then, I'll be watching more of your videos to learn more!
That was really easy to learn, Appreciated Bruh 😄
As to whether knowing about Fibonacci heaps is a useful thing, yes. It's something to have in your bag of tools. It may not be a tool you use more than a few times. But when you find that scenario that can benefit from it, you'll be glad you know it.
Another concept that you may find interesting to do a video on is a Hughes list. It's another tricky little item for your bag of tools you won't use often. Eric Lippert did a couple well written blog posts on them but I think they could use some CZcams presence.
Great video, easily my favourite SoME2 submission!
Thank you for such a thorough and thoughtful video! Great explanation. Watching this video makes me want to take another data structures course.
The algorithm is fascinating but I can see why it isn't used much. Since each element has precisely one Insert an one Extract operation on it (once it becomes the minimum) it seems to make sense only if there's substantial amount of DecreaseKey operations per element (such as proportional to number of elements in the heap) and in scenarios which don't have strict requirements on time spent on individual operations, typical for realtime environment.
I think the main problem with fibonacci heaps is the relative unpredictability of the extract min, because it can spend a lot of time on the merging
You're absolutely right. The problem with DecreaseKey is that, due to amortization, its overhead is quite substantial, which is why Fibonacci heaps are often slower than other data structures even if there are lots of DecreaseKey calls.
@@michmart9261 Yes, in modern stocking exchanging system, extraction operation needs to be very fast for every order. So we need use 2,4 tree or TreeMap in Java.
Everything was going smooth till 6:08, then madness ensues. Great video, btw
Wow~, I stopped reading Introduction to Algorithm since I cannot quite get Fibonacci heap. After watching your video, I think I can continue now. Thx a lot. And BTW, I enjoyed the piano very much
This data structure is my absolute favorite, and has been for years. Second place to Red Black Trees or Circular arrays (vectors). I remember when I first saw the algorithm for how trees are merged, I exclaimed that it's absolutely genius how it works.
Very nice video! Good explanation and no over-the-top animations.
But...
Why the background music?
I like Bach and have the Golberg Variations on CD (Glenn Gould)
Why not start the video with a recommendation of background music and leave it to the viewer to start a CD or Spotify?
My mind keeps switching between listening to the music and the narration.
And is not that the background music makes it difficult to follow the narration.
This was a lot of fun! Definitely subbed.
I was lost listening my professor introducing Fibonacci Heaps for 3 hours but have got better idea watching your 30-minute video!
Hey! This video is an absolute banger, so when I got to choose a topic for a homework in my uni I instantly picked Fib Heaps. No matter what I'm gonna use your video as a reference, but I don't have time for all the manim shenanigans. So maybe you can send me a repo or something of your code, I then could appreciate it enough
Best explanation about the fibonacci heap. It is incredible!
A really brilliant solution for an interesting problem.
Wow this was an extremely well-made video
I expect that in most use cases even for huge amounts of data a naive fibonacci heap will still be rather slow because especially at larger scales cache locality becomes the most crucial factor. But there are ways to massively improve locality of things like hash sets/maps (so called "dense" hash maps/sets) so I expect similar approaches might work for fibonacci heaps as well but it might be a lot trickier considering the much more complex contract.
so sad that this channel has not posted anything since last year. such a high quality content, i wish i could see more from this channel. I hope the admin is fine
Very nice explanation - thank you!
Oh my god, this is so beautiful!
Thank you for a great, very accessible deliver of the topic!
This inspired me to try and design a Fibonacci heap circuit. No idea if it’ll be useful, but it’s certainly interesting!
Awesome video! very well explained.
Best ds video ever. Great music, great animations, amazing expression. Thank you
Extremely great explanation, thanks!
Very fun and interesting video on the topic! Hopefully this one makes it into the pool of winners
Makes me want to to pay you tuition! Great content, brilliant explanation.
This is one of the most amazing video on algorithm
Excellent video!
Very nice.. great explanation !
👏👏thanks alot for this. your way of explaining things is very clear. Subscribed and will be waiting for your new videos. Hope you can post more videos
> How can anybody come up with that?
A data structures is merely the **negative space*(* to the picture of a particular algorithm.
Very interesting video and well presented. Personally I found the piano music created a competing point of mental focus which distracted a bit from your content … maybe have something more ambient and backgroundy, or have no music at all, but overall a very enjoyable video thanks!
Amazing video! Thank you for the great content :)
Really interesting video. I think for really big data where decreasing/changing node values becomes a regular thing this one can be handy. I usually work with small data packages on embedded machines. In that area this algorithm has quite a lot of memory footprint. But some functions could work really well with interrupts.
😅 but in practice we often use lists/linear search. Extremly simple algorithm, and many embedded chips can pipeline comparison's really quick.
For distributed storage systems this could be neat. Every machine can collect nodes without carying about the network. First pull we have a lot of traffic but with little overhead. Simple reads are also very fast.
best video of this year so far as a computer science student!!
This was a very nice explanation. 😄
very cute data structure, and nice video
This feels like Universal Search, where we add so many tasks and tricks to try to hide the real runtime behind big-O notation.
The threshold for fibonacci heaps to be faster than binary heaps grows O(2^n) where n = number of steps per method. This means that if one of the methods take on average 50 times as long as a single value swap, it would still be O(1), and one may be fooled into thinking it would be better than binary heaps for large inputs, but in reality, for that to be the case, the tree would need to contain at least 2^50 (or 1125899906842624) nodes. The fact is, for average input size n, an algorithm better than binary heap would need its Insert, DecreaseKey, etc... to be less than log2(n) instructions
Great video! How are you animating the data structures? Are you using manim? Any source I can look at?
I use Manim's Graph class (with tree layout). For the tree animations, I basically animate the vertices and edges individually from a set of start trees to a set of end trees, based on the vertex and edge IDs. The only real issue was to get the z indices working correctly (drawing order).
29:08 the diss xD, amazing video!
Never learned fibonacci heaps in my undergrad. Holy shit this is awesome.
Beautifully explained! 😲
Phenomenal explanation
great video, even better music choice