Kth Largest Element in an Array - Quick Select - Leetcode 215 - Python
Vložit
- čas přidán 8. 07. 2024
- 🚀 neetcode.io/ - A better way to prepare for Coding Interviews
🐦 Twitter: / neetcode1
🥷 Discord: / discord
🐮 Support the channel: / neetcode
Twitter: / neetcode1
Discord: / discord
⭐ BLIND-75 PLAYLIST: • Two Sum - Leetcode 1 -...
💡 CODING SOLUTIONS: • Coding Interview Solut...
💡 DYNAMIC PROGRAMMING PLAYLIST: • House Robber - Leetco...
🌲 TREE PLAYLIST: • Invert Binary Tree - D...
💡 GRAPH PLAYLIST: • Course Schedule - Grap...
💡 BACKTRACKING PLAYLIST: • Word Search - Backtrac...
💡 LINKED LIST PLAYLIST: • Reverse Linked List - ...
💡 BINARY SEARCH PLAYLIST: • Binary Search
📚 STACK PLAYLIST: • Stack Problems
Problem Link: neetcode.io/problems/kth-larg...
0:00 - Read the problem
3:39 - Drawing Explanation
11:40 - Time Complexity
13:25 - Coding Explanation
leetcode 215
This question was identified as an interview question from here: github.com/xizhengszhang/Leet...
#quick #select #python
Disclosure: Some of the links above may be affiliate links, from which I may earn a small commission. - Věda a technologie
This now exceeds the time limit due to the last test case added by leetcode.
Yes, It will pass all the cases if we use Arrays.sort or Max Heap
If you add at the beginning "if k == 50000: return 1" It works haha
Thank you, thought I was the only one stuck there. So no way to answer the follow up of doing it without sorting anymore, right ?
@@business_central you can, use three-way partitioning method. Dividing z array into three parts: elements less than the pivot, elements equal to the pivot, and elements greater than the pivot. This way, we can skip over duplicates quickly.
Thank you for the prespective@@kidusBerhanu!
Can you share the code to make sure I am doing it right?
To me, this is one of the best leetcode solution channels I have found so far!
For me as well
agreed
not just to you. this is objectively the best leetcode yt channel
I literally stopped looking at other solutions.
It probably is the best one full stop. At least in english.
I am preparing for my coding interview and your tutorials are helping a lot! Thanks!
Same like you..even I see this channel for coding rounds
Like others have pointed out, this will time out on a test in leetcode, though the solution is pretty quick and easy.
Problem: if you pick a static pivot, it has a possible worst case. To fix this, instead of picking the rightmost pivot, just pick a random pivot between l and r and then swap that with the right most pivot. This significantly decreases the odds of catching the worst case time complexity.
this still possibly fails the test though... no?
Clear explanation, visuals, codes, and also not condescending at all.
Thank you for your videos
I usually don't even look at your code because explanation is clear enough to implement it by my own , thanks
Awesome explanation! I agree that if we understand the quicksort algorithm first, the quick selection approach will be much more intuitive. Basically, in quicksort partition, we know that the "pivot" is a point where the left-side number(s) are less than the pivot and the right-side number(s) are larger than the pivot. We keep recursively running until the pivot aligns with the length - kth position. This would definitely challenging if I have not review quicksort :)
max heap solution
class Solution:
def findKthLargest(self, nums: List[int], k: int) -> int:
heap = []
for num in nums:
heapq.heappush(heap,-num)
while k > 0:
res = heapq.heappop(heap)
k -=1
return -res
thanks alot
Add these two lines at the top of the recursive function can improve run time from over 2500ms to
The choice of your P might be the reason it runs so badly, if the array is already sorted it will run in O(N^2), better select a random index.
yup, I randomized the pivot and it beat 95% of solutions in python. Randomized quickselect is much faster with probability
@@chinesemimi Can you please give the code as I am getting error
Picking a random index doesn't fix it if all or most of the elements have the same value. If every element is the same swapping a random element leaves the list unchanged. An additional improvement is needed to avoid quadratic performance in that case. My solution to that is to keep track of the right-most index < pivot (initialized to -1 in case every element is the same). That implements all duplicates of the pivot in a single iteration.
import random
class Solution:
def findKthLargest(self, nums: List[int], k: int) -> int:
n = len(nums)
k = n - k
def quickselect(l, r):
pivotIndex = random.randint(l, r)
nums[pivotIndex], nums[r] = nums[r], nums[pivotIndex]
pivot = nums[r]
p = l
lessEnd = -1
for i in range(l, r):
num = nums[i]
if num < pivot:
lessEnd = p
if num p:
return quickselect(p+1, r)
elif k > lessEnd:
return nums[p]
else:
return quickselect(l, lessEnd)
return quickselect(0, n-1)
Your videos get etched in my brain forever! So clean, very systematic and easy to follow.. Thank you very much for such great content!
Thanks for the explanation. This was the greatest Quick Select expanation that I found on youtube. Love your explanation and content. And thank you explaining the Time Complexity in so much detail.
As always on of the best leetcode solutions explanations. You are absolutely the best.
Might be worth reuploading this problem. A test case was added that causes this input to fail.
for those who use c++:
if u push elements one by one into the priority queue it will take nlog(n) time . coz pushing one element takes logn time
an efficient way to do this is :
priority_queue pq(arr, arr + N);
it takes only o(n) time
You can also do
// uses heapify constructor
std::priority_queue q(std::less(), nums);
One of the Best Teachers on the CZcams! The way of the explanation is Awesome! Keep it up Sir!
Great video! Many people have mentioned the new test case LeetCode has implemented, which causes quickSelect to time out. In Python, even changing the algorithm to use a random pointer instead of the rightmost element did not fix the issue. The max heap solution still works with the new test cases. Love your videos!
Alternative:
1) With each number, insert it to a minheap; if the size is over K, remove the minimal (aka. "pop").
2) get all items in the heap and return the largest.
Another alternative: The array needs not to be sorted fully -> QuickSelect algorithm.
Exactly this! We can limit the capacity of the heap by K, in the end it will not be O(K LogN) but O(N LogK), which for large N and small K will be very close to O(N).
akshay told me this
@@pawansomavarpu5273 sounds like a genius to me
If I see this question in the interview, I would definitely show the interviewer 2 solutions
* heap O(nlogk) worst case
* quick select with average O(n) but worst case O(n^2)
Interviewer needs to choose which one to code.
@@nguyen-dev show off 😂
Thank you so much for this video, i got this question for my meta on site interview today.
I really like your explanation. Your tone is very clear and both the drawings and codes are clean and neat! Must give you an upvote!
this is just a brilliant explanation! thank you very much for your videos - it's a huge help!
for heap solution, O(n log k) since you have to keep a heap of size k.
Yeah, the time complexity analysis is wrong. It's O(1) to pop and O(log(n)) to insert with a binary heap. It's n insertions, but you can limit the heap to size k by popping in O(1) time when it gets too large. Since each insertion costs log(k), it's O(nlog(k)) in total.
What you describe is a different solution.
Approach 1: Just like in the video, you use a max heap of size n and then pop k times. It costs O(n) to heapify a list and it costs O(k log n) to pop k times.
Approach 2: Alternatively, you can use a min heap of size k by pushing until it reaches size k and then always push and pop. If you encounter an element that is smaller than the current min, it won't affect the kth largest element and it will be pushed off the heap immediately. If you encounter a bigger element, the kth largest element will be replaced to whatever comes next. The heap will always stay at size k (except at the beginning)
Time: "push" + "pushAndPop" = (log1 + ... + log k) + (log k + ... + log k) = O(k log k + (n-k) log k) = O(n log k)
@@MatthewLuigamma032 I think the pop operation is never O(1). Since you need to rebalance the tree after you remove the minimum elemnent. The way neetcode did it, by heapifying the whole array makes is O(logn) pop and push to the heap. By just keeping into the heap the top k elements would lead to a O(nlog(k)) time complexity
@@MatthewLuigamma032 There is no reason to put every value into the heap if you are USING a min Heap to store largest values. Peek() is O(1). Therefore, the solution is O(n log k). For leetcode, my minheap solution beats 96.23 % of java submissions. (As of today).
this is best coding questions channel ever, love you man!!
It's a pretty common case to have to work on data that is pre-sorted or near pre-sorted, so this implementation with using the last element as the pivot isn't a good choice. Many people have suggested picking a random element and switching it for the last element, but for arrays where every element has the same value that still leaves the input array sorted and still takes quadratic time. An additional improvement is to keep track of the last index < pivot and use that for the left window, instead of p-1. This eliminates processing duplicates of the pivot repeatedly. E.g.:
class Solution:
def findKthLargest(self, nums: List[int], k: int) -> int:
n = len(nums)
k = n - k
def quickselect(l, r):
pivotIndex = random.randint(l, r)
nums[pivotIndex], nums[r] = nums[r], nums[pivotIndex]
pivot = nums[r]
p = l
lessEnd = -1
for i in range(l, r):
num = nums[i]
if num < pivot:
lessEnd = p
if num p:
return quickselect(p+1, r)
elif k > lessEnd:
return nums[p]
else:
return quickselect(l, lessEnd)
return quickselect(0, n-1)
👍👍
thank you
Great explanation for QuickSelect, thank you so much!
Love the explanation and complexity analyzations!
As of today, 22/10/23 leet code is making this solution TLE. So probably they have made some changes either in test cases or in time limit. But it was worth to learn it.
I did QuickSelect and got TLE and came here to see they solved it with QuickSelect. -_-
It seems like so. I coded quick sort in C++ and it timed out. Heap solution works but it is much slower than the sort solution.
Have you figured out a better solution given the new changes leetcode made?
@@TUMATATAN Use Min heap
class Solution:
def findKthLargest(self, nums: List[int], k: int) -> int:
heap=[]
heapq.heapify(heap)
c=0
for i in nums:
c+=1
heapq.heappush(heap,i)
if c>k:
heapq.heappop(heap)
c-=1
return heapq.heappop(heap)
Great videos, thanks for making these! May I ask which software you're using for drawing the explanations? Seems very useful.
Thanks and I use Paint3D
just a note : there is an implementation of quickselect in c++ i.e. std::nth_element which actually gives faster results compared to other approaches in the leetcode submissions
Thanks a lot! Helped me!!
I think the big O for the heap appraoch is off. If you maintain the heap size to be no more than K, then each insert/pop is O(log(k)), making the overall worst case complexity O(n * log(k)) (n inserts/pops, and then just return heap[0])
It's not off. He's talking about a different heap solution from you. He's saying to put all N elements in a max-heap and then pop off the k largest. That will be O(N) to build the heap and O(k log N) to pop k elements. Your solution works too and has the time complexity that you said.
@@nikhil_a01 Can you explain how building a max-heap can be O(N)?
@@sixpooltube You can search for Floyd's method for more information. To build the heap we percolate down any element that's not in the correct position.
In a perfect binary tree, 1/2 the elements are leaves so they don't move at all. 1/4 are in the second last level and they move at most 1 place down. Binary trees are bottom-heavy. 3/4 of the nodes are in the last two levels. It's only the root which may need to move the full log(N) height of the tree.
If you sum the number of moves for each level, you find the bound is 2N operations, which is O(N).
Great Explanation as always . Thank you very much !
You deserve lots of subscription from people !
Thank you very much !
I believe the last part of the coding solution may be wrong. When p > k, you should be looking at the right side of the array instead of the left. In the drawing explanation, p = 3 while k = 2, you look at the right side. The flip is because the kth largest element counts from the right side instead of the left.
The question is "the kth largest element" you mentioned for the original array or the new subarray? Another thing is if p > k, we should find in the LEFT because k, i.e. the target index that we're trying to find, is lying on the left subarray.
Best explanation! Thank you!!
Thank you for explaining quick select!
No kidding, best and concise explanation and implementation I've seen for years.
Yeah, True.
how many years have you been stuck on this problem?
We had to solve a similar problem (kth smallest) in my algorithms class. This helps, thank you :)
Glad it helped!
Actually for the heap solution i think the time complexity gonna be O(nLog(h) + klog(n)) with h is the height of at the time of the insert operation !!
Good point. Someone can correct me if I'm wrong, my understanding is we don't need to do a Heap Insert operation, we can just run Heapify algorithm which is O(n).
@@NeetCode Yes there are two ways to build a heap. Create by inserting the elements one by one O(nlog(n)) or heapify O(n).
@@NeetCode we will have trade off for space complexity right? Using Insert: space O(k), but for using Heapify: space O(n). So it will depend on what we want :D
@@kissdoing1696 you can hepify input array without extra space and pop k times to the same variable, so constant time. But the tradeoff is the depth of the heap (k vs n).
A wonderful explanation. Thank you
Thanks! Wonderful explanation
Man, you are so good at this!
You are GEM bro keep up your hard work its very neat explanation
Thank you for this one!
Great explanation !
The quickselect gives TLE as of now.
“We know for sure” that this is a great video and great explanation 😀
The actual time complexity for Quick Select is 2N and N+K log N for Heap Sort. 2N is not always better than N + K log N. So we can't say Quick Select is faster than Heap Sort?
I think the best case is every time the pivot end up in the middle, so should be log(n) isn't it?
This is nlog(n) time complexity.
If you randomly select a number and sort its location as you are showing above. And it happens to be the minimum number in the set. Successive random number selections ends up picking the minimum number in the remaining unsorted set. Once you finish this, You basically ended up performing quick sort on the complete array. And hence N log(N).
Will the random number selection always leads to this answer, no depends on how random function is implemented.
Great video, Thank you!
In my opinion, you can use a while loop instead of a recursive function for the implementation of Quick Select because the space complexity would be reduced from O(n) to O(1).
The leetcode question have so limited information to think of this algo. They should update their questions after recieving feedback from the community.
But anyway, this is a nice explanation.
Dumb question but why has this problem been tagged as Heap/Priority Queue? It does not use a heap.
(Using maxHeap) ,push the values in the maxHeap , then pop till k times , the last popped element is the kth largest element , Heap has a sort nature in them
Correction on 1:48, it needs to be min heap instead of max heap
Thanks for making it easy
Thanks about the time complexity!
Sadly this method is no more available on leetcode now...
They added a testcase which leads to time limit...
Has anyone figured out the solution with ne changes?
your vibe is infectious
This worked for me, very similar to what we did in " Kth Largest Element in a Stream " but we arent inserting anything here were just popping elements.
class Solution:
def findKthLargest(self, nums: List[int], k: int) -> int:
heapq.heapify(nums)
while len(nums) > k:
heapq.heappop(nums)
return nums[0]
so did I. Is it still O(n) time complexity?
@@timlee7492 no it’s doesn’t perform as well. It’s runtime is more like O(k*log(k))
if anyone asks me about this que i will definitely recommend this one,great one
Excellent explanation
6:15 -what if input looks like this? [3, 2, 1, 5, 6, 1, 4], so additional 1 close to the end? We would have to somehow backtrack to p=3 and move all items one space to the right? so move "6" from position 4 to 5, then move "5" from position 3 to 4 and then finally insert that 1 to position 3 and increase p to 4 (to point to "5")?-
[edit]
Scratch that. Looks like if nums[i] < p, then i and p will be in sync and that swap do nothing. But what is lacking in this example is if there is some number less than pivot AFTER a bigger number is found. Add "1" just before last "4" and then that line 9 seems handy as it will swap "5" with "1", so for input [3, 2, 1, 5, 6, 1, 4] we'll end up with i=5, p=3 so after last iteration we'll have [3, 2, 1, 1, 6, 5, 4], i=5, p=4 and the last swap outside for loop will give us [3, 2, 1, 1, 4, 5, 6], p=4
Thanks a lot dude !
I think the worst case for your quickSelect is not O(n^2), but O(k*n). Really nice videos, thanks a lot.
If you are still strugging for why we use "nums[p], nums[r] = pivot, nums[p]" or "nums[p], nums[r] = nums[r], nums[p]" rather than "nums[p], pivot = pivot, nums[p]", here is the explaination:
In Python, multiple assignments happen simultaneously, and both the left-hand side and right-hand side assignments occur at the same time. So when you write nums[p], pivot = pivot, nums[p], the execution order goes as follows:
1. 【nums[p] = pivot】nums[p] is set to the value of pivot.
2. 【pivot = nums[p]】pivot is set to the value of nums[p]. But at this moment, the value of nums[p] has already been modified to the new pivot value.
As a result, you haven't actually swapped the values of pivot and nums[p]; you've simply set them both to the same new value.
awesome stuff
I could reproduce the QuickSelect based algorithm in C# in a single attempt after listening to your detailed explanation. Appreciate the effort :-)
Finally! Thanks for listening! I asked this in the poll you submitted a week ago. U a God
can I take a queue of size K and push while parsing the array and only when current element in array iteration is bigger than head of queue. after parsing, just return bottom of queue?
How is heapifying an array O(n) time? Adding an element to a heap takes logn time. So adding all the elements in the array to a heap would take O(nlogn) time where n is the number of elements in the array
Can you please make a video on master method to calculate complexity
is it alright to use a global variable for actual interviews? I notice you aren't passing the array in to your helper functions. Just curious.
In my experience at FAANG interviews, it's usually fine. But if you are unsure, feel free to ask your interviewer and they will let you know.
Thank you!
What will happen if the array would be [5,6,3,2,1,4]? Looks like 5 and 6 will be left exactly where they are, on the left. Shouldn't they be moved to the right?
Hey @neetcode , QuickSelect solution is giving TLE . So you might wanna add that to your description. However Heap Sort still works fine.
Neetcode's algorithm isn't actually O(n) average time unless every input is randomly ordered (which is rare both in real life and in Leetcode test cases). For real-world data it's closer to the O(n^2) worst case of quickselect due to a few implementation details. His algorithm can be fixed, but it needs 2 changes.
1. You need to pick a random pivot instead of the last - pick a random element and swap it for the last as many have said in other comments
2. Additionally - which I haven't seen anyone mention - you need to use the *first* occurrence of the pivot to determine whether to use the left window and to calculate the boundary of the window, not the last occurrence (referenced by `p`) as the video does. That requires an additional variable. You can also just remember the right-most element < pivot if one exists and use that for the new right boundary (I found this a bit simpler to implement, the logic is just slightly different). You need to do this because you may have an array where every element has the same value (still a sorted array - it's monotonically sorted). In that case swapping the random element doesn't fix anything.
Both improvements are required to take O(n) average time for monotonically sorted arrays.
I think there was a way to solve this in worst case linear time using the chunk median of medians approach.
You are correct.
You can improve the performance 6x with just 1 extra line, by doing randomized quickSelect, pick the pivot to be the midpoint
def quickselect(l, r):
nums[r], nums[(l+r)//2] = nums[(l+r)//2], nums[r] ###
Hi, can you explain the reasoning for this?
Thanks!
The explanation is unparalleled. One thing I would like to suggest that try to avoid change the input parameters.
Sorting the array and then take the element at position length of array minus k is already in linear time.
Use radix sort!
Great vid
can we choose a random value as our pivot instead of the leftmost element since that might lower the probability of running into a worst case time scenario?
Of course. You can choose a random, the last or the first element as your pivot.
Can I use bubble sort? As in bubble sort after every iteration the largest element is bubbled to the top. So running for k iterations will yield the result. Time complexity is o(kn) and it is in place
great vid
How did I just hear about you, and why do you not have 10 mil subs already??
In the heap solution, you mentioned that we are popping "k" times. How is that? We are popping more than 2 times in the first example. Instead here "k" is the limit for how many numbers can exist in the heap at a given point in time, isn't it? If the size of heap is more than "k", we pop.
i believe @kruger that will happen if you were to use a min heap right? since the min heap will need to start polling out elements if its size > k, here he explicitly mentions he uses a max heap. In case of min heap of size k, the order of operations is O(log k), but since there are N elements in the input array, the total operations will be O(Nlogk), some please let me know if my thinking is wrong.
Runtime: 55 ms, faster than 99.62% of Python3 online submissions for Kth Largest Element in an Array.
from heapq import heapify, heappush, heappushpop
class Solution: ## complexity nlogk
def findKthLargest(self, nums: List[int], k: int) -> int:
x = nums[:k]
heapify(x)
while(len(nums) > k):
heappushpop(x, nums[k])
k += 1
return x[0]
you should select pivot randomly to improve leetcode solution ranking
Rather than doing length-k
We could have partitioned the array in such a way that
Greater Elements - Elements equal to Pivot - Elements less than pivot.
Thank you so much
15:23, shouldnt we leave pivot as it is ? when nums[i] < pivot?
Your explanation is so fucking good, thank you
The time complexity of the algorithm with Heap data structure requires: T(O)=n*log(n)+k*log(n)
Because add to a heap operation requires log(n) time(heapifyUp), and removing from heap also requires log(n) time (for heapifyDown). So we need to add 'n' elements from array to heap
@neetcode, isn't it?
no heapify process doesn't take nlogn time instead it takes O(n) . Now the process you saying is not directly making a heap instead you are making a base heap than adding the value that's why it takes nlogn but for a heapifying an array the time it will take for each heapify process would be considered constant . so your time complexity would be roughly n/2 which is a linear time complexity
I think it's best to use Priority Queue and limit it's size to K and return the head of the queue;
Thank you
One thing I could not understand, you are making use of recursion. Recursion does occupy space in stack memory. Still, how could you say space complexity is O(1) ? Please correct me if I am wrong/mistaken?
Yes in Python it is. In a language with tail-call recursion it would be O(1) because we don't perform any additional operations on the return value of the recursive call. It's very easy to convert the algorithm to an iterative one:
import random
class Solution:
def findKthLargest(self, nums: List[int], k: int) -> int:
n = len(nums)
k = n - k
l = 0
r = n - 1
while True:
pivotIndex = random.randint(l, r)
nums[pivotIndex], nums[r] = nums[r], nums[pivotIndex]
pivot = nums[r]
p = l
lessEnd = -1
for i in range(l, r):
num = nums[i]
if num < pivot:
lessEnd = p
if num p:
l = p+1
elif k > lessEnd:
return nums[p]
else:
r = lessEnd
Great explanation!!!! just a small typo in the code: shouldnt line 13 be: " if p>k: return quickSelect(0, p-1)" Instead of quickSelect(1,p-1).
It's L not 1
in your quickSelect function, within that for loop, (line9) why do you need that line? don't u just need to move the pointer plus one?
I was about to ask the same. Looks like if nums[i] < p, then i and p will be in sync, hence that swap do nothing. But what is lacking in this example is if there is some number less than pivot AFTER a bigger number is found. Add "1" just before last "4" and then that line 9 seems handy as it will swap "5" with "1", so for input [3, 2, 1, 5, 6, 1, 4] we'll end up with i=5, p=3 so after last iteration we'll have [3, 2, 1, 1, 6, 5, 4], i=5, p=4 and the last swap outside for loop will give us [3, 2, 1, 1, 4, 5, 6], p=4
Solution using both minheap or maxheap depending on whichever needs fewer calls
N=len(nums)
## python default is minheap
if k>=(N+1)//2:
## minheap
heapq.heapify(nums)
for i in range(N-k+1):
ans=heapq.heappop(nums)
return ans
else:
##maxheap
#print("maxheap")
nums=[-1*el for el in nums]
heapq.heapify(nums)
for i in range(k):
ans=heapq.heappop(nums)
return -1*ans