Find Median from Data Stream - Heap & Priority Queue - Leetcode 295
Vložit
- čas přidán 5. 07. 2024
- 🚀 neetcode.io/ - A better way to prepare for Coding Interviews
🐦 Twitter: / neetcode1
🥷 Discord: / discord
🐮 Support the channel: / neetcode
Coding Solutions: • Coding Interview Solut...
Problem Link: neetcode.io/problems/find-med...
0:00 - Read the problem
1:39 - Drawing Brute Force
5:02 - Drawing Heap Solution
15:15 - Coding Solution
leetcode 295
This question was identified as an interview question from here: github.com/xizhengszhang/Leet...
#median #heap #python - Věda a technologie
Anyone who can come up with solutions like this, on the fly, without having first studied the problem, is a genius and I salute them. Also thank you Neetcode for this video
no such human exists
The only purpose of asking this kind of question in interview , probably that have you mugged up all leetcode solutions.
Testing preparedness. This kind of problem solving is healthy and great but not a predictor of success as a developer.
@@thePontiacBandit hard disagree. I think asking this in an interview is extremely unhealthy and a waste of time for everyone invovled
@@APudgyPanda96 I agree with you. I'm trying to imagine why they would even have something like this. If I structured interviews like this, I'd be testing preparedness. It wouldn't be a sign of a good developer at all.
I dealt with a slightly different problem in the 1980s. I was to find the "moving median" of N points in a data sequence that keeps on coming. But I wasn't going to do this with programming, but with dedicated hardware, like gate arrays. At any instant, there were N saved points. The oldest was discarded, the sample just arrived was inserted, and the median of the N points in the list was reported. I used a min-heap and a max heap. The exciting thing was that all the comparisons and data movements in working with a heap can be done in parallel, so the O(log N) time per sample reduces to O(1).
You're on another level, Mr Radar - God Mode.
That's so fascinating.
wow!!
Hey I just solved this on leetcode..no. 480. It's the same problem?
haha, you started DSA since 1980s, even implementing with low-level hardware, so this channel is too easy for you, I think.
This is just incredible... for some reason I either never learned, or don't remember learning, Heaps in my computer science education at university. Throughout the years I have always heard the word Heap but never really drilled down into what it did or where it is useful. This really inspired me to fill the gap!!
Thanks for making this clear!!!
10:18
But two, that's "two" big of a difference 😁
Thank you NeetCode, your videos have helped me understand hard problems a bunch!
a fantastic explanation as always, Thanks!
thanks neetcode for amazing explanation !!!!.....
you are doing a great job man.....there are a few youtube channels which implement problem code in python...YOU ARE THE BEST OF THEM 👑
Excellent work on explaining this!
I was wondering if you can do a follow up video expounding upon the other approaches to the problem? IMHO, the primary reason this a leetcode hard problem is because of the myriad of approaches to solving it especially given different constraints (i.e. memory bound - how can do we do it with Reservoir Sampling, Segment Trees, etc.) I saw the Leetcode solution to this problem mentions these different approaches but I think you will do a much better job of explaining it :)
is leetcode premium worth it to see leetcode offcial solutions?
Man you're actually so goated. These explanations legitimely make me understand the structures and concepts and I can code out the solution from understanding the logic before you go through writing the code. While I'm sure Google was great, I'm glad you're focusing full time on this because you're the absolute best resource I've come across
Very nicely presented, thanks!
I'm a bit confused on why to add the new numbers directly to small heap by default. Couldn't this be optimized by comparing the incoming number to both of the root nodes and adding the new one to the small/large heap depending on if the new value is less than or greater than the root nodes?
Like let's take 3547:
First add 3 to the small heap - O(1)
Next 5 comes in, is greater than 3, add to large heap - O(1)
Next 4 comes in, is between the values, add to small heap - O (log n)
Next 7 comes in, is larger than 4 and larger than 5, add to large heap - O (log n)
if that last number was a 1, its smaller than 4 and 5, add to small heap. Now the heaps are unbalanced so pop 4 from small heap, push to large heap - O (log n) + O (log n)
This way whenever your trees are imbalanced you can just pop the root of the larger one and push it to the smaller one to keep them always balanced and at most you have to push and pop once rather than twice with the "swap" method in the video.
I may be missing something, but it seems inefficient at first glance.
Other than that, great video, you always explain these so concisely 👍
You're right.
i may have figured out why.. in case either of the 2 heaps keeps increasing due to the value we keep pushing and it becomes too large. We are already making a swap why not do it without all the lookup.
@@dumdumbringgumgum2940 You are correct, but remember the lookup is merely an O(1) and it can save you a couple of O(log n) operations in case the numbers are sort of balanced such as in the example
totally!! I guess he just wanted to to make the code cleaner and easier to understand.
Thank you so much for made this amazing video!
Best explanation. Thank you
amazing explanation, thank you very much sir !
best explanation ever! thank you so much for this.
Awesome explanation... thank you!!
Thanks, much appreciated
Thank You for the amazing explanation Neetcode
whenever i came to see your approch i just had to watch 1/3 of the video cuz your explanation is soo good
awesome explanation.. thanks a lot!
very well explained !!!
Best Explanation!
Very nice visual explanation. I would suggest, that instead of saying size is approximately equal, can we say size difference must be at MOST 1 ?
Yes if its not, simply remove from whichever is larger and insert into the other.
perfect solution and illustration, thanks so much
Nice explanation!
Why do you do this thing where you first add to the left by default, and then check & rebalance between the two heaps? Can't you peek at the min/max in either tree and decide where it should go based on that, and based on the current sizes of each tree without having to actually go through an `add` operation?
Does using heapq._heapify_max a standard for implementing a max heap or we need to use -1 with min heap?
no, first, this is not a standard method that subject to be changed without any notice. Second, it will introduce bugs if used outside the implementation of heapq.nsmallest
It's an implementation specific method alongside heapq._heappop_max and heapq._heappush_max used to aid the implementation for heapq.nsmallest
They will not maintain proper max heap structure when used in ways other than inside heapq.nsmallest
Great stuff, thanks mate! I've been thinking off migrating to Python for interviews but the edge case with max heap again made me stop doing that :)
Good point, the language used should add more time ( as there's usually more compilation errors going from Python to Java to say C++ )
we can also do with sortedlist from sorted containers
An alternative solution could be to use self-balancing trees (AVL, Red-Black, etc.) where all the operations are log_n. However, we don't need find and instead just need to maintain a separate pointer to the median, then after each insert, depending on whether the inserted number is greater or less the current median, we move the median pointer to the element adjacent to the previous median.
Thank you for this great explanation! I was wondering why can't we use a self-balancing BST like AVL or Red Black Tree ?
You can definitely use them but heaps are more optimal
I have 0 knowledge of dsa or oop yet i think this is the easiest leetcode problem there ever exists.
I can do this only using C !
question: if you're doing a bunch of logn operations for every add, isn't this a nlogn algorithm? if you're adding n elements, and each time you'll do logn operations, this becomes nlogn. you may as well have just sorted it.
Well spotted
sorting 1 time takes nlogn , sorting n times will take (n^2)log(n) :)
why you dun use sortedcontainers? is it we are not allowed to use this library?
The solution is understandable, but it would take weeks of full time work for me, to find it by myself. As every test, mock code verifies your knowledge of patterns.
Why can't I think of such amazing solutions 😢
Hi, I have a question about this question. Can I use binary search to do insertion sort when adding a new number? This will also be a O(logn) algorithm
But only to find the insertion index. To actually insert it would still be O(n) because you potentially have to move/shift every item in the list.
@@peterparker892 Make sense! Thank u so much!
What is the time complexity of this solution ?
anyone can tell me that , at 23:16, is there any case where the input will enter into line no 24? if so what could be that sample input?
Go to 12:51, it will answer your query.
For some interesting reason it runs a bit slower, if we add incoming num into the big heap instead of small, and make the necessary amendment to the code to pop from big instead of small.
I had this in an interview and was asked:
- "If all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?"
- "If 99% of all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?"
How would you go about this? :) Thanks!
bump, I'm also curious how one would do this.
Did you give any answers or did you intervieweres give any response Bernhard?
My only thought was you could use an 8-bit integer or float.
Maybe initialise an array of size 100, where the key is the number and the value is the frequency. You can then get the answer in constant time by iterating through until you reach the halfway point.
If 99% of the numbers are in the range you could do the same thing but have two extra variables for "over 100" and "under 100" and this would work as long as the median number was in the range 0-100
@@mongoose2014 wow that's great. Had you done something like that on a previous problem? Very clever of you.
Yes, as @mongoose2014 said, one can use set/dict approach to solve this.
Even tho this is a hard problem and I have trouble with mediums and even some easy problems, I was able to get this one pr easily. So I guess everyone has a type of problem that just clicks. No point beating yourself up over not getting all of the patterns immediately.
many thanks
Thank you so much🥰🥰🥰
What is time complexity for this solution?
Great Explanation! - Question: Why can't we find the index to insert the element to array list using Binary Search and add it to that index as we go? Is it because insertion to a particular index in ArrayList is O(n) operation?
Yup, that's correct! You will need to shift the elements so it will be O(n).
A bit shorter and easier:
import heapq
class MedianFinder:
def __init__(self):
self.min_heap = [] # To store the larger half of numbers
self.max_heap = [] # To store the smaller half of numbers
def addNum(self, num: int) -> None:
# Push the number to the max-heap (negated) to simulate a min-heap
heapq.heappush(self.max_heap, -num)
# Pop the smallest number from the max-heap and push it to the min-heap
heapq.heappush(self.min_heap, -heapq.heappop(self.max_heap))
# If the min-heap has more elements than the max-heap, balance the heaps
if len(self.min_heap) > len(self.max_heap):
heapq.heappush(self.max_heap, -heapq.heappop(self.min_heap))
def findMedian(self) -> float:
# If the total count of numbers is odd, the median is the top of the max-heap
if len(self.max_heap) > len(self.min_heap):
return -self.max_heap[0]
# If the total count of numbers is even, the median is the average of the tops of both heaps
return (-self.max_heap[0] + self.min_heap[0]) / 2
Got asked this during a 20 minute technical interview for a FAANG internship. Got the brute force solution but didn't get anywhere close to optimal. Safe to say I didn't do well 😔
Great explanation. Couldn't heaps be avoided in this case? We can insert the values from our stream into our list in log time by using binary search (by definition empty and lists of size 1 are always sorted) and then find the median that way. Am I missing something?
N log N
Yes.
For 1 elem -> log1
2->log2
3->log3
...etc
N->logN
Do the sum:
Log1+log2+log3+...+logn=log(1×2×3×4×..×n)=log(n!)~nlogn
So it is
NlogN
@@dadisuperman3472 isn't that the same for the heap solution?
@@mirkowachter3323 yes almost.
The video's solution is 3*NlogN which at infinity is equivalent to NlogN.
But sorted insertion as in the question above, is exactly NlogN
@@dadisuperman3472 the video solution is O(logn), not O(nlogn).
I Don't understand when we are chcking if every num in small is
not too bad! I thought I had to implement my own heap
Thanks!
Thank you so much, I really appreciate it! 😊
I don't understand why line 16 doesn't output "IndexError: list index out of range" when he tries to get the first index of the empty large heap. What am I missing?
this is amazing
Great video
Can we use library function of heaps in interview?
yes
cant u use an order statistic tree for this?
You are amazing!!!!!
Wouldn't you get an index out of range if the first action was to find median? before adding at least two elements?
Line 34 would run but there wouldn't be anything to return from the heaps.
line 34: return ( -1 * self.small[0] + self.large[0]) / 2
This is correct. The problem states that addNum() will be called first so index out of range won't happen. You could add a guard clause that returns 0 if both heaps are empty.
Thank you
Is this not the same as using a single heap?
Binary search actually works for this problem.
U a God
Awesome!
i inow it's silly, but inserting into an empty heap is O(1).
Getting the max or min is O(1) if it doesn't involve removing it. I think you covered that though. Awesome video. I liked and subscribed!
instead of using heap or priority queues. You can use segment tree . Let me explain , each node from l to r represents the position where each value in the segment l , r lies in the sorted array . Then you can use lazy update to update the position and binary search in segment tree to find the median :"D have fun.
which board you are using?
How can we create a heap in Js.
the addNum function should be SO MUCH simpler. You don't need to check anything if you keep everything on track.
if "small" is larger, push element to there and pop from there and push it to "bigger", VICE VERSA.
The heaps themselves will make the work of whos bigger/smaller.
thank you
@NeetCode ,
What do you think is the problem with using re-balancing BST directly which keeps root as the median (in odd case)?
I mean ,it seems like you are trying to do the exactly same thing indirectly
Maintaining the balanced bst is O(n) no? Since you may have to shift all nodes.
@@atd233 why would it take average o(n) time ?, its like calling the heapify function .
So,its technically the same thing
@@RS-vu4nn you only heapify once. In fact you don't even really need to call it since you start with an empty array. You don't heapify afterwards.
When adding an item to the heap, it takes logn. Popping is O(1).
With the balanced BST, you have to re balance on both add and pop.
@@atd233 Yeah you are technically right, in case of balanced tree that would be the case. I think user-defined balanced tree would be the right approach but that would be too much of hassle ,that's why people generally use two heaps instead.
Thanks for helping with clarification.
@@atd233 Popping from a heap is also O(lg n) because the heap "nodes" have to be repositioned to maintain the min or max heap property. Peeking at the top of a heap is O(1).
heya! im not sure why the solution works for you, but i had to divide by 2.0 instead of 2 because of some floating error - took very long to debug this :/ had to read through the discuss section in leetcode to find some answers - but thanks for doing this!
Can you link it here?
Thanks for this I was wondering why mine was working too. This fixed it.
isn't heappop() an O(logn) operation, since the heap has to be rearranged later, instead of O(1) ?
We have to peek not pop
The brute force solution is actually O(n^2). The reason is because it takes n time to search the array to know where to insert the current number, and then the actual insertion takes another n because you have to shift over all the elements after it.
Then it means that we have to do 2*n operations, not n^2, so it is still O(n)
@@Lulit999 No, the insertion takes n, and you insert n times, that is O(n(n)) = O(n^2)
@@electric336 but you need to calculate the time complexity for the given function ( one function that would insert it 1 time ), not for all of the possible utilizations of the function (having continuously calling the function )
Thanks for the explanation!
I found the below solution a bit cleaner yet still intuitive.
The idea is that for every new number,
1. we first push it to the small heap
2. then pop from small heap and push to big heap
3. check if the small heap is too small(not balanced)
the code can be reduced largely in this case:
class MedianFinder:
def __init__(self):
self.maxHeap = []
self.minHeap = []
def addNum(self, num: int) -> None:
heapq.heappush(self.maxHeap, num * -1)
heapq.heappush(self.minHeap, heapq.heappop(self.maxHeap) * -1)
if len(self.minHeap) > len(self.maxHeap):
heapq.heappush(self.maxHeap, heapq.heappop(self.minHeap) * -1)
def findMedian(self) -> float:
if len(self.maxHeap) == len(self.minHeap):
return (self.minHeap[0] + self.maxHeap[0] * -1) /2
else:
return self.maxHeap[0] * -1
i like the way you think ,its impressive
Thanks for making this look easy and Isn't that O(n) as we are finding lengths of small and large?
len() function in Python runs in O(1) complexity
Cant we do it in set in which insert in sorted order and will be less painful than using two heaps. max insert complexity is still log(n) and median will also be O(1)
Sets don't insert in sorted order...
@@thewatcherlollol Are you sure ?
Check difference between
and in C++. if a certain language does not have an ordered set then it does not mean it does not exist. You can always make one if a language's std library does not have it.
Jesus, on an interview I have no idea how they expect us to come up with this if we never saw this heap solution before. I'm just gonna give them the in-order sorting solution and talk to them about the 2 heap solution if I ever get this, cause this is tough to remember.
Why didn't you use -1* in find median def while calculating median ?
How to come up that we require 2 priority queue one of min and one of max pls give intuition also
I believe in C++, we cant delete arbitrary elements from heap, so we would have to modifications in this approach.
Definitely the best videos. Los mejores vídeos sin duda.
Your explaining is good as f
so maximum time complexity for add operation will be : 3 log N , correct ?
If in interviews this is how we have to explain but in any OA or CP we can use this
bisect.insort(nums,num) this will do all the minheap stuff
What do OA and CP abbreviate?
@@PippyPappyPatterson Online Assessment and Competitive Programming
@@consistentthoughts826 ohhhh wow thank you. Seems like every company has two rounds of OAs, so that makes a lot of sense.
I really need to dig into `bisect` because I keep seeing it but don't know how to use it yet.
Do you know of any Easy Leetcode problems with solutions that use `bisect`? That's normally how I learn the fastest.
@@PippyPappyPatterson try " first and last occurrence of an element in an array" question in leetcode. You will get an idea
@@consistentthoughts826 Thanks chief, happy tech-whale hunting.
Shouldn't the return self.small[0] should be -1* self.small[0] in the findMedian method. As we are artificially adding them for our purposes.
Great... Although you code in python, which is above my head, thanks for the intuition though;
i enjoyed
Hello sir could you please explain why sometimes it's self.small but sometimes it's just small?? I'm really confused, thank you!
He fixes it in the end for all of them to have the self.
Instead of a heap can't we maintain a simple list. For every insert we add the element in the sorted order using binary search.
GOAT
Had to make it a 2.0 instead of 2 because of some bullshit rounding error at the end
Heaps are implemented using lists? That's misleading, at best.
Still a great vid :)
Well implemented with arrays
@@NeetCode I think they are implemented as trees, but sometimes have the interface of a list / array. I may be mistaken though.
that's fucking beautiful
Hey
Great explaination!
Why is this code not working: Can you please help!
Thanks in Advance.
class MedianFinder {
public:
vector arr;
MedianFinder() {
arr.clear();
}
void addNum(int num) {
arr.push_back(num);
}
double findMedian() {
int n = arr.size();
std::nth_element(arr.begin(), arr.begin()+n/2, arr.end());
std::nth_element(arr.begin(), arr.begin()+(n/2-1),arr.end());
if(n&1) return arr[n/2];
return ((arr[n/2-1]+arr[n/2])/2);
}
};
Speed will be a little be faster if we check item is large or equal to self.large[0], then decide which heap to add
import heapq
class MedianFinder:
def __init__(self):
self.small = [] # maxheap
self.large = [] # minheap
heapq.heapify(self.small)
heapq.heapify(self.large)
def addNum(self, num: int) -> None:
if (self.large and (self.large[0]) 1:
temp = -1 * heapq.heappop(self.small)
heapq.heappush(self.large, temp)
if len(self.small) - len(self.large) < -1:
temp = -1 * heapq.heappop(self.large)
heapq.heappush(self.small, temp)
def findMedian(self) -> float:
if len(self.small) > len(self.large):
return -1 * self.small[0]
if len(self.large) > len(self.small):
return self.large[0]
return (-1 * self.small[0] + self.large[0])/2
Came up with this solution without watching the coded solution -
class MedianFinder:
def __init__(self):
self.maxHeap = [] # lower half of the sorted array
self.minHeap = [] # upper half of the sorted array
def addNum(self, num: int) -> None:
heappush(self.maxHeap, -num)
# if all elements in maxHeap are not smaller than all elements in meanHeap
if self.minHeap and self.maxHeap and (-self.maxHeap[0]) > self.minHeap[0]:
heappush(self.minHeap, -heappop(self.maxHeap))
# check if difference in size of min heap and max heap is within allowable limit
sizeDiff = len(self.maxHeap) - len(self.minHeap)
if sizeDiff>1: # maxHeap has 1+ extra elements
heappush(self.minHeap, -heappop(self.maxHeap))
elif sizeDiff < 0: # minHeap has extra elements
heappush(self.maxHeap, -heappop(self.minHeap))
def findMedian(self) -> float:
if (len(self.minHeap) + len(self.maxHeap))%2==0: # even size
return (self.minHeap[0] - self.maxHeap[0])/2
else:
return -self.maxHeap[0]
I wonder why not use a binary search instead? Since we can build the list sorted, we can binary search to find the position to insert new num in O(logn)? Then retrieve is just the same as in-order insertion which is O(1). Is it because inserting an item into an array itself takes O(n) due to copy operation in the underlying data structure?
I coded this using a binary search based approach. The downside is that elements will need to be shifted right once you get the position and that can potentially be O(n).
we got to shift elements again right!! thats O(n)
Another solution: Implement a BST. When you invoke addNum() insert into the BST (which is lgN) operation (assuming it is balanced).
Then findMedian is similar to implementKthSmallest element in BST.
const medianUsingBST = () => {
let root = undefined;
let size = 0;
// Ensures sort during insertion
const insertIntoBST = (num) => {
if (!root) {
root = new TreeNode(num);
size++;
return;
}
const helper = (curr, parent) => {
if (!curr) {
if (num < parent.val) {
parent.left = new TreeNode(num);
} else {
parent.right = new TreeNode(num);
}
size++;
return;
}
if (num > curr.val) {
helper(curr.right, curr);
} else {
helper(curr.left, curr);
}
}
helper(root, undefined);
}
const findKthSmallestElementInBst = (k) => {
let counter = 0;
let result = undefined;
const inOrder = (curr) => {
inOrder(curr.left);
counter++;
if (counter === k) {
result = curr.val;
}
inOrder(curr.right);
}
inOrder(root);
}
const addNum = (num) => {
insertIntoBST(num);
}
const findMedian = () => {
if (size % 2 === 1) {
return findKthSmallestElementInBst(Math.ceil(size / 2));
}
return (findKthSmallestElementInBst(size / 2) + findKthSmallestElementInBst(size / 2) + 1) / 2;
}
return { addNum, findMedian}
}
I think the find is O(n) here though? Because you're going through half the nodes to get the median. Which is linear. With heap it's constant. And insertion will be O(logn) for both.
You can store the iterator pointing to the middle and ++ or -- it.
The heaps solutions is easier to implement though
-1 * val = -val