VERY USEFUL Python Trick!!
Vložit
- čas přidán 25. 07. 2023
- ⭐ Join the Byte Club to practice your Python skills! ($2.99/mo): / @b001
🐦 Follow me on Twitter: / b001io
Music:
Mystic Mountain by Purrple Cat | purrplecat.com
Music promoted by www.free-stock-music.com
Creative Commons / Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
creativecommons.org/licenses/...
finally, a python trick that doesn't start with "first let's import".
😂😂😂
Haha it depends. For beginners knowing how to do these basic things is a must, but as your progress and you need to write more complicated code we are recommended to import. Importing is just layman terms of not reinventing the wheel.
Right, but in this case, using Counter is absolutely the correct move. This is O(n^2), it’s a staggeringly inefficient way to find the answer.
@@joshualee1685 False. It is important for everyone at any stage to understand how to efficiently handle data. If you are not maintaining the code you're importing, and have not researched how it works, you cannot be sure that it is doing what you think it is.
I thought using counter from itertools is easier
It's also worth learning what's in your standard library for any language. `from collections import Counter`
yep yep
it's also just faster in this case
Every time i see one of these coding shorts the comment section feels like reading through a stack exchange forum lol
I love it so much man
Which is good, Stack Exchange dunking on beginners is bad and criticism of an advanced/master programmers code is good and needed so that beginners and experienced alike would know different opinions/options from various programmers rather than just believing one opinion from a CZcamsr.
The x.count function needs to go through the whole list for each items, so you have a complexity of big O^2 (if there is no cache). It may be faster to use a simple for loop since the complexity would be big O, or even better use the buildin Counter which might be written in C for better performance.
Using a Counter is a great idea! max(Counter(x).items(), key=operator.itemgetter(1))[0] runs in O(len(x)) time rather than O(len(x) ** 2) for passing in x into max or O(len(x) * len(set(x))) for passing in set(x) into max.
Edit: minor code fix
Edit 2: @MeBeBonkers has provided a more Pythonic way to do this below. Thanks for pointing out that solution!
@@jeffreyh.1436Counter(x).most_common(1)
Yeah I would say this trick is “neat” but definitely not “VERY USEFUL”.
How would you implement this his more efficient loop?
@@puskar2.014 You could use collections.Counter which doesn't require a loop at all, the loop is hidden inside the Counter call. Hope it's help
Ur code complexity is n^2, after using the set it's n^2 log n , in the worst case.
Just use A hash map to count the frequency in O(n)
The complexity was O(n²); where n is the length of the list.
After using the set, it becomes O(nm) where m is the number of distinct values. This is better because m is always smaller than n.
If the data-type has a finite set of possible values, m is constant or bounded by a constant.
Using a dictionary for counting seems to be the best idea like you said.
@@aouerfelli I assumed that the set in python is implemented using a balanced tree structure but it is implemented using a hash map, so your are correct the complexity will be n*m.
@@aouerfelli It is still O(n^2) because the list could have all unique numbers. Big O is always about worst case.
@@aouerfelli Like others have commented in different threads, collections.Counter does all the work in O(n) time. It runs one loop over the iterable and constructs different data structures which can solve basically all imaginable problems in constant time, then makes those available as members of the Counter object.
In this case this code would solve the problem fastest:
from collections import Counter
x = [1, 2, 1, 3, 4, 1, 2, 4, 1]
c = Counter(x)
most = c.most_common(1)[0][0]
------------------------------------
most_common(n) returns a list of n tuples where first index is the most common item, and second index is the number of occurrences.
@@wy100101 The big O notation is used for asymptotic complexity. Best case, worst case and average case have each their corresponding asymptotic complexity.
As for worst case, there is no worst case; because n can grow arbitrarily. We can talk about worst case given a fixed parameter that we can track. Like the worst case with given (m,n) or worst case with given n.
The worst case with given (n,m) is O(n m)
The worst case with given n is O(n²)
There is no absolute worst case but there is a worst case for fixed parameters.
If you want the most generality, there will be no worst case. If you want to track the worst case for a given value of n and m, you get O(n m). If you want purposefully to ignore the parameter m and take the worst case as for a specific n then you get O(n²).
A very interesting trick. But not one I never needed in 25 years of being a professional programmer.
I prefer Counter class from collections library
Your work is a Blessing - thank you for sharing!
Iterate through all values. While iterating store in hashmap and store max value in a variable (curr). after every iteration, compare whether map(key) > curr, if so update value or curr. This solves it in one iteration
There's an even faster way to get the counts and most frequent in one pass of the array for true O(n) performance. (Other comments suggest using the Counter() which is O(n) for counting elements, but grabbing the most frequent is another O(n) operation making it O(2n))
For the one pass solution:
1. Initialize "mostfreq" and "mostfreqnum" variables
2. Initialize "freq" dict (hashmap)
3. For Loop over the list
4. For each x check if it is in 'freq', if yes add one to its count, if no add it to 'freq' with count of 1.
5. Access freq[x] to see if it is greater than 'mostfreq', if yes then assign it to 'mostfreq' and assign x to 'mostfreqnum'
6. End loop and return 'mostfreqnum'
In the event of a tie, it will return one of them, which is technically correct. If all the of the most frequents are desired in the event of tie, then fall back to the O(2n) Counter.most_frequent() method.
we can use counter from collection module
In a case where two elements
Appear the same number of times it'll return the first element
If you don't use `set`.
@@megaing1322itll still only give the first one
Please write a book!
These shorts are awesome
Why would be set(x) more efficient? Does the conversion not also need to go through all elements, so it would not matter?
something something don't care for order no index something bla bleh fast
for real lookup operations on sets are fast as holy shit normally
with the original code, max will go through
1 2 1 3 4 1 2 4 1, and for each of those call x.count (which needs to go through each element of x). So you're doing 9 * 9 = 81 iterations
With set(x), you indeed need to go through all the elements to create the set, but then max only goes through
1 2 3 4, and for each of those call x.count: 4 * 9 = 36 iterations.
So it seems like the set(x) code might be doing less work overall. However, in a real world application where performance is important, you should use benchmarks to determine which code is faster for expected inputs.
@@ZantierTasait would be 5*9 because the set call also needs to go through all elements.
@@SourceOfViews Sure, you can call it 4*9 or 5*9. It's the same big-O, so it doesn't matter.
@@ZantierTasa In actual real world situations you will most likely use either numpy, scipy or collections to entirely solve these problems, not hack together a solution with clever use of native data structures.
Lets be real. At best the difference of using sets here is a teaching moment to help understand big O. Which he doesn't do in the video.
from collections import Counter; Counter(most).most_common(1)
rule34b: if problem exists, there is Python package for it
@@ArmirMitrandir its from stdlib
anyone know what font that is?
It's better to use your own function or use a module. x.count will call the counter n times and every time it goes through the entire list.
I didn’t understand something! If you convert the list into a set, the set will remove all the repeated elements, so at the end the count method goes through the set wich doesn’t have any repeated elements. My question is how the count method understands what is the most frequent item?
The set will remove duplicates , and the key will count from original x, not the set(x). So set,(x) will remove duplicates (conserving time)
It still tests each value but only once now. And for each value it counts the occurrence in the *initial* list (x.count not set(x)...)
To be honest this guys i love him he teaches something that no one notices and teache but very important at the same time
a little tricks I didn't konw ! thx +1 ❤️
whats the font?
uhm anybody else gonna mention the mode() function
that's what i'm saying
I wondered about that too. Like the median it is a built-in function in python.
You're creating massive overhead by converting to a set. So what is considered 'optimised' is subjective.
what font are you using
Very helpful
Anyone know what font this is?
Casting list to a set adds to a complexity
Good information
Here's a two line solution which does the same thing but *much faster*:
from collections import Counter
most, _ = max(Counter(x).items(), key=lambda a: a[1])
Even faster:
(most,_), = Counter(x).most_common(1)
Doesn't seem much faster here, in fact both these solutions are slower.
Python 3.11.4 (tags/v3.11.4:d2340ef, Jun 7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)] on win32
timeit -s "x = [1, 2, 1, 3, 4, 1,2, 4, 1]" "max(set(x), key=x.count)"
500000 loops, best of 5: 601 nsec per loop
timeit -s "x = [1, 2, 1, 3, 4, 1,2, 4, 1]; from collections import Counter" "most, _ = max(Counter(x).items(), key=lambda a: a[1])"
200000 loops, best of 5: 1.51 usec per loop
timeit -s "x = [1, 2, 1, 3, 4, 1,2, 4, 1]; from collections import Counter" "(most,_), = Counter(x).most_common(1)"
200000 loops, best of 5: 1.85 usec per loop
And if we make the inputs a lot larger..
timeit -s "x = [1, 2, 1, 3, 4, 1,2, 4, 1] * 1000000" "max(set(x), key=x.count)"
1 loop, best of 5: 221 msec per loop
timeit -s "x = [1, 2, 1, 3, 4, 1,2, 4, 1] * 1000000; from collections import Counter" "most, _ = max(Counter(x).items(), key=lambda a: a[1])"
1 loop, best of 5: 244 msec per loop
timeit -s "x = [1, 2, 1, 3, 4, 1,2, 4, 1] * 1000000; from collections import Counter" "(most,_), = Counter(x).most_common(1)"
1 loop, best of 5: 247 msec per loop
And on Linux:
Python 3.11.4 (main, Jun 7 2023, 10:13:09) [GCC 12.2.0] on linux
timeit -s "x = [1, 2, 1, 3, 4, 1,2, 4, 1]" "max(set(x), key=x.count)"
500000 loops, best of 5: 477 nsec per loop
timeit -s "x = [1, 2, 1, 3, 4, 1,2, 4, 1]; from collections import Counter" "most, _ = max(Counter(x).items(), key=lambda a: a[1])"
200000 loops, best of 5: 1.25 usec per loop
timeit -s "x = [1, 2, 1, 3, 4, 1,2, 4, 1]; from collections import Counter" "(most,_), = Counter(x).most_common(1)"
200000 loops, best of 5: 1.59 usec per loop
timeit -s "x = [1, 2, 1, 3, 4, 1,2, 4, 1] * 1000000" "max(set(x), key=x.count)"
2 loops, best of 5: 180 msec per loop
timeit -s "x = [1, 2, 1, 3, 4, 1,2, 4, 1] * 1000000; from collections import Counter" "most, _ = max(Counter(x).items(), key=lambda a: a[1])"
1 loop, best of 5: 213 msec per loop
timeit -s "x = [1, 2, 1, 3, 4, 1,2, 4, 1] * 1000000; from collections import Counter" "(most,_), = Counter(x).most_common(1)"
1 loop, best of 5: 217 msec per loop
Sort the list and go to the last by index simple
Can anyone explain to me the whole thinking for making the list into a set?
How can this make our code optimized ?
Im curious about your color theme
a for loop does it in O(n) ( or better with some common sense optimization, like stopping counting when the remaining members of the list are less than the difference between the most and second most common so far.
Depending on how your branch prediction and speculative execution play out that may end up performing worse. Really depends on the size and complexity of the data.
Which shell prompt you are using ?
Awesome ❤
mode method: am i a joke to you?
is there more a math library mode function?
What is the theme you are using ,sir
How is it different from iterating over the list and storing the element with the highest frequency? And does this have less time complexity than iterating over the array?
Those who use mode from statistics
how do i uncatenate a string?
what is the time complexity of this approach?
can it rival the speed of numpy?
the time complexity is o(n^2)
@@wy100101 thanks
good solution. i wouldve made a real problem for future me by implementing iteration over a dictionary or something lol
I forgot how fun python was, to the point where I taught myself how to make a whole ass text adventure game off it
Wow. I even know what is going on! I am amazed
Which VS theme do you use???
Synthwave 84
What font is that you’re using?
That was nice but you could have used the boyer-moore majority voting algorithm
I didn’t understand how making it a set works
What theme do u use?
Can you tell me the name of the application you’re using?
just import statistics and
most = statistics.mode(x)
Why not use mode?
hashmap -> greatest value's key ? idk why you would do your solution
Cool 👍👍
what theme is that?
There's no 'mode' in Python?
Seriously! Whole time I was chanting 'mode, mode, mode' to myself. Disappointment. Could be more efficient or something, or plays nice with text? Still feels like a huge oversight. That or it's ignorance-bait (ibait, I'm coining that one here) to get comments/engagement.
@@Internet-Antics haha
How can i do this with c++
In statistics theres Mode
Here we used set ,basically set will remove duplicates finally we get 1,2,3,4 .max(1,2,3,4), output should we 4, but how it comes 1, can you please explain it 🤔
It will be max({1,2,3,4}, key=x.count)
So max uses 1 first, so x.count(1), stores the number of occurrences. Next is 2, so x.count(2), stores the number of occurrences. And so on. Then it selects which number in our set, {1,2,3,4} returned the most occurrences. Which is what gets returned from the max() function. Hope this helps!
@@b001 understood, Thanks 😊
@@b001 it explains very fine how the key helps for the max() func to work...
But I didn't understood what was the difference between using just x or set(x)
Let’s say list “x” was 1000 items (instead of 9 items in the example).
Let’s also say that this list (like the example) is only made up of integers from 1 to 4:
x = [1, 3, 2, 4, 2, 1, 4, 4, 2, 3, 1, … etc for a thousand items].
Using set means the count only tests for the four integers in the set:
set(x) is (1, 2, 3, 4)
…so only four iterations needed over the original list “x”, which remains unchanged, and is still 1000 items long.
“set(x)” only specifies the items to look for (and count), but “x” is still the original 1000 item list, not the set.
Using just plain “x” in place of “set(x)” would require the code to iterate for every item in the list “x” (1000 items), even though there are only 4 unique possibilities.
@@rootytuners oh thanks 👍
That's more clear!
How tf do i Get the [ on keyboard
Why not just use the mode function?
Hii sir, how record and edit your reels and shorts
Please answer 🙏
You’re on CZcams, why not just search for your answer? There’s a ton of methods and guides out there
I wanted to cry seeing how simple python is compared to C. I don't think my brain can handle any more memory allocation 😭
Literally everything is undefined behavior in C. *I still get PTSD flashbacks*
seeing people freak out and whine about how hard C is makes me think I should do C programming and probably make more money with less competition... wtf is so hard about learning good allocation practices? If you can learn to use a new framework you can learn to manage memory..
@@craigslist6988im just guessing but maybe its like one of those little things that add up over time
python is easy to use then js and my first language is javascript. it took me 2 months to learn it.
i already know that, that trick is in python's book fluent python, great book by the way, the most complete and well structures book i have ever read related to phyton, it's huge but worth it
I hope there was a big disclaimer why this is an awful solution and should only be done for lists under your control where you know an upper bound for size.
@@voodoo1094 you're right
can someone tell me how would u find the most common occurrence in O(n) complexity
I think u need to be Ramachandran
Use dictionary and keep elements as key
Value as count of occurrence
Update it using one loop of tracing elements
C = dict()
For i in range(len(x)):
C[x[i]] = C.get(x[i],0) + 1
@@yashasmn245 Works, but you forgot to find the highest value:
most_common = (None, 0)
for key, val in C.items():
if val > most_common[1]:
most_common = (key, val)
---------------------
This is still a O(n) complexity. One might think that since we iterate over the list once and a subset of the list again that it would be something like O(n) + O(log(n)), but the O(n) has so much more influence than O(log(n)) on big datasets that it becomes irrelevant. Therefore it's still considered O(n).
@@yeetdeets
import random
x = [random.randint(5, 7) for _ in range(20)]
C = dict()
for i in range(len(x)):
C[x[i]] = C.get(x[i], 0) + 1
print(x)
print(max(C, key=C.get))
--------
O(N) is asked , this is Big O(N)
where's the "BIM BIM BAM BAM" part?
why didn’t you just use mode instead?
Python doesn't have a MODE function?
The statistics module does.
Subbing as i am learning. Thanks
Serious doubt on the time complexity
what vscode theme do you use?
I need too 😮😮
Isnt it just mode?
Leaving a comment for better algorithm 😅
Why can't we use 'mode' simply
Quadratic complexity. Fine as a quick one liner or for small lists. Otherwise don't do this.
Nah I'd quadratic
This is only useful for small lists.
Does Python not have a built in mode function?
It’s python so you probably got to import it lol
No, but the statistics module does.
Why did it return a byte tho 😭
I tried this trick and Dua Lipa finally gave me a reach around.
Nice
Interesting 🤔🤔
To those egoing about this tip, it’s not an optimized solution and I don’t think the intent is to be optimal but rather just a step above beginner. It’s probably not meant to be a solution to be used in a backend server for billions of requests.
Nice.
Uhh or just use mode?
Not a python programmer but I will find a way to apply this in C++.... maybe....
So we are calling libraries as tricks now huh.
That's called a mode :nerd:
Theme ???
Synthwave 84
Just began with Python and I'm finding it awesome. Its syntax is surprisingly straightforward, which makes it an ideal language for coding starters. What's even better, Python is a highly versatile language, opening ways to many fields.
How is this different mode??
from scipy import stats
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x = stats.mode(speed)
print(x)
pythON
so 1 is not the loneliest number that you’ll ever do
Or just convert it into a pandas series and use .value_counts()
Keep in mind this isn’t an optimal solution but overall it is nice and pythonic so a nice trick to have in your bag.
@@michaw2209 Hi. Populate a dictionary with keys as the elements and values as their frequency, and keep track of most frequent element that way. Requires one pass.
@@michaw2209 Counter(x).most_common(1)
Especially since the runtime complexity of this is O(n), while the runtime complexity of the approach shown in the video is O(n²).
C64 Assembly is so much easier...
what's your color theme name?
I think is something related to synth80's (just a guess)
thx
@@jojota_p
@@football_land560 the correct name is SynthWave84 🌊
Not quite sure about that folk
that's O^2, and it sucks