Top K Frequent Elements - Bucket Sort - Leetcode 347 - Python
Vložit
- čas přidán 24. 07. 2024
- 🚀 neetcode.io/ - A better way to prepare for Coding Interviews
🐦 Twitter: / neetcode1
🥷 Discord: / discord
🐮 Support the channel: / neetcode
💡 CODING SOLUTIONS: • Coding Interview Solut...
💡 DYNAMIC PROGRAMMING PLAYLIST: • House Robber - Leetco...
🌲 TREE PLAYLIST: • Invert Binary Tree - D...
💡 GRAPH PLAYLIST: • Course Schedule - Grap...
💡 BACKTRACKING PLAYLIST: • Word Search - Backtrac...
💡 LINKED LIST PLAYLIST: • Reverse Linked List - ...
Problem Link: neetcode.io/problems/top-k-el...
0:00 - Read the problem
2:58 - Drawing Explanation
9:42 - Coding Explanation
leetcode 347
This question was identified as an interview question from here: github.com/xizhengszhang/Leet...
#sorted #array #python
Disclosure: Some of the links above may be affiliate links, from which I may earn a small commission. - Věda a technologie
I appreciate the time you put making and sharing all your content for free. Here is the $10 I might have spent on your udemy course.
Thank you so much!!!
Does he have udemy course???
@@onlysubscriptions2152 nah, just a hypothetical $10 he would’ve spent since most people pay wall this content, but he does it for free.
As a side note, consider donating directly to the creators if they have a donation link, because CZcams takes a whopping 30% of your donation.
In this case, Neetcode accepts Patreon donations, which takes a more reasonable commission of about 8%.
@@hamdi_even 8% is too high tbh. Not for neetcode specifically, he sells his own courses and is set. But for someone else who might be in need of the money 8% is ridiculous
I have never practiced DSA in my life, not even in college. After getting laid off, I stumbled across your videos to learn DSA. They are so crisp, informative, and to the point. I can't thank you enough.
Hi you got any job?
@@e889. not yet.
Update?
Hey guys, yes I did. Wouldn't have been possible without Neetcode.
I love you man. You're an actual angel. Your explanations are always so clear. And your drawings are so easy to understand.
Thanks, appreciate the kind words 🙂
@@NeetCode you are n angel. :)
Amazing contents! The best algorithms channel that focus on logic and thinking in a clear way. Happy to have found this channel, been writing neetcode ever since.
Best youtube channel for leetcode problems hands down.
i used your previous video on groupAnagrams to solve this, just hashmapped the array in to a defaultdict(int) den sorted the dictionary entirely in a descending order. Your videos have been really helpful, first time i solved a medium all by myself
That's so cool that python has a convenient way to sort hashmap by value. I looked into it for java, and it is a nightmare. I would have to create my own comparator. I think if I'm doing that, I may as well just learn bucket sort at this point.
haha, did the same, cheers
@@trenvert123 Java is sooooo verbose......
@@trenvert123 In Go I faced the same problem, spent 20 minutes trying to sort a hash map by values (failed). So I just copied values into new array and sorted them there lol.
@@Albert-nc1rj you can also insert the contents of your hashMap into a second hashMap, but use the values from frequency hashMap as the keys of the second hashMap.
Once you have done that, take all the keys out into an array, sort the array and retrieve first k values, use first hashMap to get integers and return that.
Your explanation is like art! Thank you!
This is perfect, thank you bro, took me a while to understand this problem, gonna need to redo it without looking at the solution in a couple days.
Thank you, liked and commented again to support
I was wondering where you were going with the bucket. This is so clever !!! Brilliant !!
Thanks for the video! I came up with the same solution except I assumed each element is "repeated unique number of times" from the problem statement - "It is guaranteed that the answer is unique.". So instead of looping over each lists, I just considered the first element.
the one thing I dont like about usage of heap questions is that most of the times you havge to default to some library to do it cause I doubt any of us could code up a heap in a phone screen.
Am I the only one who is a little confused as to how this solution is O(N). If you loop through the array which is the size of the array, and then in each index you might have to loop through up to N times. So how is this not o(n^2)
Edit: Nevermind, I think I realize it now, I figured I would write it out for anyone who might still be confused. As we traverse through the array, we go through the whole array. So this is O(n). But we aren't doing an operation n times at each stop. We are doing N more operation throughout the entire array. So even though the for loops are nested, we are doing N more operations throughout a for loop which is N, so the total is just N+N, which simplifies to O(N)
Thanks a lot buddy! I was scratching my head off to find out this same doubt.
Now that I saw your comment, I was able to understand it.
Thanks again!!
So essentially the inner loop is just operating on the subset of N elements?
I would say it's n+k rather than n+n, because the size of the res array is k. So after looping through the freq array of size n, you only need to fill the res array k times then stop, so k more operations. Still it's O(n)
@@quanmai5759 Yah I also think so it will be n+k
@@ahmedmansour5032 but at worest case you will face frequency = 1 for each element in nums .. and O(N) is always calcualted in worest case, I have made a commend on video please go through it , you will understand what I am saying
Such an awesome explanation and solution. Thanks, Man! Love it.
The algorithm that you explained at 3:15 was counting sort and not bucket sort. What you did, however, towards the end was similar (not same as) bucket sort.
is it different?
@@namoan1216 no hes wrong
For some reasons you have solved all blind-75 related Heap problems in some other way :D
Best one is this for better runtime.
We can optimize it more by storing the maxFrequency while creating the HashMap (which has the integer and their corresponding frequency). Then, the next iteration to get the required elements can start from this maxFrequency instead of N.
Made it simple but efficient as always!
This solution blew my mind! excellent video as always :)
Thank you man! Blessed to have you!
The solution and thought process is genius! Can't come up with this optimal solution by myself, thanks a lot.
I got a similar question on my onsite interview with amazon (not the same question but same concept). I did not know bucket sort so I used the sorting method. The interviewer said there was a way of getting a linear time complexity and I did not know what to do .
Incredible video, keep up the awesome work! :)
Fantastic explanation, appreciate it!
Holy this is so much clearer than the quick select one.... Thank you so much
But not constant space complexity
Trust me bro, you are amazing explaining things. Thanks a lot for such content. Pls keep posting..
Excellent solution, thank you!
Thank you so much, your explanations are so easy to understand, I would be lost without you
While counting, you can keep track of the max occurrences and then you only need to initialize freq to that max instead of len(nums)
Good one ...I spent a lot of time understanding this but finally got it..🤗🤗
Sure but it's still O(N).
thx
count = {}
maxFreq = 0 # or 1
for each in nums:
count[each] = count.get(each, 0) + 1
maxFreq = max(maxFreq, count[each])
freq = [set() for i in range(maxFreq + 1)]
for the heap solution, it's better to use a min heap of size k rather than using a max heap and then removing max k times. Using the min heap, you would remove min and add the next frequency. by the end, you are left with k most frequent ones and removing the min gives you the answer. You can reduce this to n log k and not n log n
and even for values like k = 1e9, logk is around 30 so the complexity is around O(30*n) which is basically O(n)
@@gurmukhsinghnirman4935 indeed!
How would you cap the size of the heap `h` at size `k`?
As you're adding frequencies, `if len(h) > k: heapq.heappop(h)`?
@@PippyPappyPatterson so let's suppose k = 3 and you have numbers [1 , 2 , 3, 4, 5]. You can find the k-largest or in this case 3rd largest using a min-heap of size 3. As you add in numbers, your heap can grow like this:
[ 1 ]
[ 1 , 2 ]
[ 1, 2 , 3 ] ** you're capped at 3 ***
[2, 3, 4] ** add next( 4 ) and remove min (1)**
[3, 4, 5] ** add 5, remove 2 **
Now the head of the heap will be the 3rd largest element.
@@PippyPappyPatterson here
class Solution:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
num_to_count = collections.defaultdict(int)
for num in nums:
num_to_count[num] += 1
min_heap = []
for num in num_to_count:
if len(min_heap) < k:
heapq.heappush(min_heap, (num_to_count[num], num))
else:
heapq.heappushpop(min_heap, (num_to_count[num], num))
res = []
while min_heap:
_, val = heapq.heappop(min_heap)
res.append(val)
return res
I'm speachless! thank you, NeatCode!
The explanation was so amazing that I understood how to solve half way through the video!
i had actually thought of the 2nd array implementation you said with N array size.. but i didnt think on how I would extract the top K as you did by going backwards! you're a genius!
I'd like to suggest a minor improvement:
for n in nums:
count[n] = 1 + count.get(n, 0)
if count.get(n, 0) > max_val:
max_val = count.get(n, 0)
freq = [[] for i in range(max_val + 1)]
The max_val variable is used to track the maximum frequency instead of using the length of nums.
This can potentially save some space if the maximum frequency is significantly less than the length of nums.
Addition of max_val adds a small const time overhead
People say you are supposed to learn enough to be able to figure out leetcode problems, as opposed to memorizing leetcode. Are we seriously supposed to have been figured this method out? This was so specific...
When you encounter a similar problem next time, you will think , well, i already saw it somewhere. I can do this.
@@edd4851 Really? How many leetcode problems are solved this way?
@@netanelkaye3014 1😅
its best not to think about it lol
This is the best explanation!! Thanks!
just as i was going to tackle this problem, you released a new video :)
The best explanation I've seen! Thank you so much man!
Got this question for google, what to do then if input is streaming (like a log)? guess we keep updating the count (histogram), and rebuild the freq array everytime?
Fabulous explanation! I love your videos!
This is amazing explanation. Thank you for sharing this video.. Learnt something new today!
Clever solution! I came up with the nlogn solution immediately and thought the problem was over since the Leetcode page only wanted that. Then I watched your video and I was shook when you said there was an O(N) solution haha.
leetcode page says to find a solution that is BETTER than n log n
Hey! Could I receive a bit of clarification please? Why is your approach at 3:10 not O(n)? I thought it would be as would first need to find the max value in the given array (nums), then create our bucket array with that number as the upper limit? But since max( ) takes O(n) time - I'm quite confused. Thanks in advance!
An awesome solution! Beautiful!
Man you light up my leetcode
Is the bucket sort always about having the number of buckets as the size of the highest frequency number or which kind of bucketing we use in the standard bucket sort?
I got my first job after following your neetcode 150, 2 years ago. now after the layoff i am here again learning the dsa.
We can do a little space optimization by having max(counts) size for bucket instead of nums.length
Good point!
I was solving 692, and I got AC, I remembered this video came back to this. Thanks for such detailed videos.
such a fun explainlation,, Loved it!!
for the return function, an alternate way is to use the extend() method in python:
res = [ ]
ptr = len(frequency)-1
while len(res)
Result size will be wrong if len(frequency[-1]) > k I think
@@kestiv2429 The question guarantees that there is a unique solution. Hence every time we extend the result array, at some point it will(has to be) be equal to k. If it were not guaranteed, then you would be right.
I'm thinking you can just invert the frequencyMap to {frequency: list of values with that frequency} and then sort the keys in that inverted map. This sorting would be O(sqrt(n) log(sqrt(n))) (which is < O(n)) because there cannot be more than O(sqrt(n)) different frequencies (if each value has a different frequency, then n = O(c * (c+1) /2), with c being the number of distinct frequencies).
Then it's just a matter of iterating over the reverse sorted keys and adding values to a resultArray until that array reaches length k
lookupDict = defaultdict(int)
for n in nums:
lookupDict[n] += 1
inverseDict = defaultdict(list)
for key, v in lookupDict.items():
inverseDict[v].append(key)
sortedKeys = sorted(inverseDict.keys(), reverse = True)
sortedKeysIndex = 0
res = []
while k > 0:
values = inverseDict[sortedKeys[sortedKeysIndex]]
if len(values) > k:
res.extend(values[:k])
else:
res.extend(values)
k -= len(values)
sortedKeysIndex += 1
return res
really love the approach taken, thank you
Cannot believe I figured this on my own, definitely took some time but I was able to figure it out on my own.
Such a genius solution!
Also, you can do res += freq[i] in line 13. The problem description mentions the solution will be unique. So, we know that all the elements added if will either match k or will be lesser. So, no need to run a loop.
Thank you ! Beautiful explanation
without knowing what is bucket sort I was only able to come with the hashmap sorting solution thanks mate for the o(n) soln.. great explanation
Firstly, thank you for being so amazing with your videos!
Actually I was wondering in the for loop on line 12, since inside that loop we keep checking for the length of res, isn't that increases the time complexity from the linear expectation?
nooo actually the inner runs only for n times . soo thats n(outer loop) + n(inner loop) = 2n which is O(n).
Watching this after attempting on NeetCode is soooo good!
Nice explanation. Thanks for sharing !~
Love this solution!
Really helpful. Thank you so much.
You can also use count = Counter(list)
This will count in itself. No need for a loop. Another day thanking Guido van Rossum for making my life easier.
Best explanation this helped me in solving Top K Frequent Elements and Sort Characters By Frequency
I think the runtime of using heap is O(n log k), we need O(n) to construct the heap and remove an item cost O(log k) ?
Yes, this is even what the leetcode official answers says
Isn't it O(log n) to remove from a heap with n elements? And we do that k times, so that makes O(k log n).
@@michaelchen9275 If we restrict the heap to be of size k (since we only care about k most frequent), at worst case we'll end up popping n elements. i.e. O(n log k)
nick white, kevin, neetcode best guys to get the perfect explanation...
Is this solution also works when theres no limit on the values of the initial array? Or it assumes that the values in the array are bounded?
Hey man, great videos and I appreciate neetcode a lot. I had a question about this in Java though. I tried to implement the buckets with an ArrayList of Arraylists but the algorithm was really slow. Do you have java code for this to compare to? I see you have it posted but using a different algorithm than this bucketing strategy. I would love to see how this is actually done efficiently in Java with this method in particular.
i have done the same but getting indexout of bound error. also can you tell how to append value of arraylist by adding new int to it.
List li=new ArrayList(nums.length+1);
what about the last quadratic for loop? Will it not make the compelxity as O(N2)??
My man! I've spent HOURS watching you. FYI you can do the counter with collections, and save few lines of implementation
How is that?? Can you pls share
@@yossarian2909 from collections import Counter, then Counter(nums) gives the frequency dict
@@hamzasayyid8152 continue the code
Great explanation!
Thank you for your time teaching. Can i ask what software you use for the black board in the background, or anyone know? Thank you all.
man this was the video , great great explanation
Hi Neetcode, I got following solution. Can please tell me it is optimal or not?
dic = Counter(nums)
dec = dic.most_common()
res = []
for i in dec:
if len(res) != k:
res.append(i[0])
return res
Man I could not do it myself because I didn't understand problem clearly, after 50 seconds of the video I understand and did it with ease, thanks a lot. Now I am going to watch rest of the video to learn optimal solution)
Your video got a me a job as an SDE at AWS!!
Congratulations 🎉🎉
Hey John, which neetcode 150 questions did they ask? I have a phone interview coming up
i didn't see any body came up with this solution in the discussion on leetcode , all of the solutions were use heap or may be some of them use quick select ;
so i was afraid that i analysis my algorithm wrong but after watching you i know that i was right about my solution .
Infact the top solution in the discuss is using bucket sort itself.
I came up with this solution originally but really appreciated the thoughtful description of the linear solution!
result = defaultdict(int)
for num in nums:
result[num] += 1
result = dict(sorted(result.items(), key=lambda item: item[1]))
return list(result.keys())[-k:]
Very amazing explanation bro
I love how your videos are always so damn clear :) ! Thanks alot
Btw, here is a weird(?) quicksort version that beats 93%:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
count=list(Counter(nums).items())
def quick(l,r):
pivot,p = count[r][1],l
for i in range(l,r):
if pivot>=count[i][1]:
count[i], count[p] = count[p], count[i]
p+=1
count[r], count[p] = count[p], count[r]
if p>len(count)-k:
return quick(l,p-1)
if p=ind]
Nice!
Mine beats 98.62 time and 90 on space and is two lines long :)
@@heathergray4880 would you share your code for learning please?
Thanks for the explanation.
What is the space complexity of the solution? Is it O(n + k + n) == O(n)? n for hashmap of counts, then array of k size, but an array item can contain a list of n items in the worst case being all numbers being distinct? How does space complexity work with list of lists? Thanks.
is log(n) better solution than nlog(n) and log(n)?.... can you create a video on how to find big o and which one is better than which? i know there are many videos in youtube, but yours will be the best
Can we not use a heap/priority queue instead of using an arrya? Wouldn't that automatically keep the most frequent elements at the top if the sorting was done according to the count?
Can someone explain why in line 12 of the code you have to -1 from len(freq)? since with the example of nums = [1, 1, 1, 2, 2, 3], the length of freq is 6, if you decrement from len(freq) - 1, which is 5, wouldn't you go 4, 3, 2, 1 and completely miss the largest number? I'm so confused, please help
i love your videos keep leetcoding
I am not able to find the excel file that all the notes are written. Can someone share the link?
Is there C++ code of the bucket sort solution used in this video anywhere? I’m trying to learn to use C++ and I’m struggling with this problem. If someone could assist me, that would be great!
For the heap the time complexity should be n.Log(k) since the max size of the heap can only be k and n is the number of elements
Nope, if you are using a maxHeap it will be k log n. Because for that you need to heapify the whole thing first so n elements would be in the heap.
Then you would pop 'k' times from the heap of size 'n'. So O(k log n).
This is Valid if you use a maxHeap!!!
If you are using a minHeap then you would be correct, then the heap would have a max size of k as you said. And you would loop n times and push then poll when size reaches k.
Awesome video! One small thing, shouldn't it be O(n log k) instead of O(k log n) for the heap solution since there's n elements, heap of size k means log k time to heapify and n calls so n * log k?
Yes, it isn’t O(n) like he said
I did the solution with priority queue and hashmap and it seemed to have better time complexity and space efficiency than using bucketsort. I feel like this is tricky because, when we are solutions for problems, we start analyzing which data structure we are going to use, its time complexity, etc. based on how those data structures are regularly implemented. The thing is, algorithms and built in functions in languages have improved drastically that they take less time than what theoretically they should take.
It's tricky but are those are things that we should consider as well?
Thanks Neet! I burned my brain cells trying to come up on my own. This helped.
where do you get that the max input array size is 6? The problem says the length of the array can be up to 100k
What I don't quite understand is that when I implemented this solution, the speed and memory is not as efficient compared to my initial solution in
```
class Solution:
def topKFrequent(self, nums: List[int], k: int) -> List[int]:
count = {}
for num in nums:
count[num] = 1 + count.get(num, 0)
res = sorted(count, key = count.get, reverse = True)
return res[:k]
```
Even though using sorted() here causing it to be n log n... Can someone explain why this one is appearing to be much quicker than the solution in the video?
sorted() uses quick sort which has O(n log n) time complexity in the python interpreter.
I believe the length of the frequency/bucket array could be lower than len(nums) + 1 because of the rule that the # of distinct elements >= k. This tells us that there will be at least k distinct elems and since each one must have at least one frequency, no single number could occupy all len(nums) spots (unless k is 1). Therefore I think it could be further optimized (albeit minimally lol) to:
freq = [ [ ] for i in range( (len(nums) + 1) - (k - 1) ) ]
Could you explain how the for loop works within the array, I didn't understand that part?
Freq=[[] for i in range(Len(nums)+1)]
how does it work for negative numbers ? For eg : arr [ -1, -1] and k = 1
That's very clever !
Is it the outer for loop or the inner for loop in the nested for loops that determines the time complexity?
We don't need to look at the loops because, it is guaranteed that N number of elements are distributed in the hashmap. For 10 elements say if one entry of the hashmap has 8 elements, then only 2 elements are gonna be present in the remaining hashmap.
is it just me or after struggling on a particular part for a while.... then it hits!!!!! best feeling ever. NeetCode, I am following along every problem and have the confidence that I'll get my dream tech position. thank you, its the same feeling I had when I found out about khan academy in high school
we're both gonna get the dream job homie!
Damn. I solve this problem for hours and a lot of code. It turns out could be this simple. Thankss, I learn something new. THE BUCKET SORT 😇
Just a doubt that can we reduce time complexity here if we use minheap?