How Sets Can Truly OPTIMIZE Your Python Code

Sdílet
Vložit
  • čas přidán 29. 04. 2023
  • Today we will be looking at how sets can be used to significantly optimize your Python code.
    ▶ Become job-ready with Python:
    www.indently.io
    ▶ Follow me on Instagram:
    / indentlyreels

Komentáře • 86

  • @Indently
    @Indently  Před rokem +77

    As one of you were kind enough to point out, the speed_difference calculation was completely off. It should have been obvious to me immediately, but sometimes when you're writing tutorials these details pass you. In the example I provided, sets are MUCH faster than 99%. Sorry for that bad snippet of silly math :)

    • @Dent42
      @Dent42 Před rokem +31

      If it makes you feel any better, GitHub has been using the same incorrect calculation for their Copilot efficiency gains for months now, saying their product increases efficiency by 55% (presumably coming from the fact that tasks took 55% less time), when in reality it increases efficiency by 127%

    • @Indently
      @Indently  Před rokem +11

      That's a hilarious little piece of trivia, thanks for sharing 😂

    • @callbettersaul
      @callbettersaul Před rokem +2

      I've thought about it and I'm still quite certain, that the math is correct in the video. My reason for believing this is that the 100% represents the time the list took to complete. That means 100% faster would mean that set took 100% less time to complete (which would be 0s). However, if you compared it the other way around (list took x% longer to complete), then it would be quite a large number, for example, at 3:00 it would be correct to say, that the list was 2,848,073% slower, than the set or that the list took 2,848,073% more time to complete. But the other way around, it can't go over 100%, because it would go into negatives that way.

    • @callbettersaul
      @callbettersaul Před rokem +1

      @@Dent42 Isn't it true tho? When they're comparing to the old value and get the new value, that is 45% of the original value. That means the old value was reduced/improved by 55%. Their efficiency is based on the time it took and the time can't be 127% less than the original time, because nothing takes below 0s to complete.
      Edit: Nevermind, when talking about efficiency/performance increase, then it would be 122% I think (tried 3 different formulas, all gave 122%, not 127%).

    • @Indently
      @Indently  Před rokem +5

      I like the idea of something taking 100% less time to complete

  • @callbettersaul
    @callbettersaul Před rokem +43

    To add to this, when lists check if an element is there, it uses '==' operation on each member, which is why it takes so long. However, set takes the hashcode of the element (hash(element)) and checks if that hashcode is in the table. In your own classes you can modify the behaviour of the operations, by writing your own implementation of the magic methods __eq__() and __hash__().

  • @BrianStDenis-pj1tq
    @BrianStDenis-pj1tq Před rokem +13

    I'd suggest that for many applications, using a dict() is what you want. Not only do you get the speed of a set, you can store a value as well. Dict or list are the go-tos, while set can be used if you really need the speed of a dict and you really don't want to store anything.

  • @GabiruM
    @GabiruM Před rokem +1

    I love your videos, man. They are simple, effective and very useful. Hope you get the recognition you deserve.

  • @alidev425
    @alidev425 Před rokem +1

    I'm, as always, learning something new from your short videos. great content keep it up 👌👌

  • @vuvu7005
    @vuvu7005 Před rokem +3

    if you whant to handle more infos (like counts, index in an array, order, etc...) you can use dicts, the keys of the dict will behave practicaly the same as the set and you will be able to store aditional infos

  • @CatalinSoare
    @CatalinSoare Před rokem +2

    While I did know that sets were a lot faster than lists, I didn't know by how much. This was very interesting and informative.
    However, what I didn't know and just saw in your video was pep-515; being able to separate digits with underscores for visual aid seems awesome!

  • @abdultheseekerofknowledge4453

    Wow bro, everytime i see one of your videos i immediately know i'll learn something new, thank you

  • @k.chriscaldwell4141
    @k.chriscaldwell4141 Před rokem

    Good info. Thanks.

  • @octonwoomy1150
    @octonwoomy1150 Před rokem +1

    W video as always, thanks for the great content

  • @dcollett
    @dcollett Před rokem +4

    Thanks so much! Your videos are the best on the internet.

    • @WinfriedKastner
      @WinfriedKastner Před rokem

      And he speaks the best and understanding English of the whole world!!

    • @Indently
      @Indently  Před rokem

      That's too kind, Winfried :)

    • @WinfriedKastner
      @WinfriedKastner Před rokem

      @@Indently That's absolutely true what I said. I'm assuming you're not British born or something because of your first name. Ialso bought your Udemy course last weekend, even though I know Python well by now. But it is a pleasure to work through and follow. And thank you for your great videos and hopefully many more to come on CZcams!!!

  • @LogicVertix
    @LogicVertix Před rokem

    Thanks For this valuable practical , I have only studied this in books only.
    Do we have any other way to find out the time of execution of a program so that I can find the efficient programs

    • @Indently
      @Indently  Před rokem

      Maybe this will help: czcams.com/video/qXLh5sZLpHE/video.html&ab_channel=Indently

  • @pistolangpaltik
    @pistolangpaltik Před rokem +8

    I'm new to programming and I'm, as always, learning something new from your short videos. However, there is something wrong with the calculation on speed increase. As the result of 'set' approaches zero, the calculation approaches 100% and never beyond. I think you need to have the result of set as base. Thanks for the videos!

    • @Indently
      @Indently  Před rokem +8

      That's absolutely my fault for not double checking that bit, I used ChatGPT for inspiration and didn't notice that until you brought it up. I need to be more strict on the code it generates.
      Thanks for bringing it up!

    • @billvvoods
      @billvvoods Před rokem

      @@Indently Not sure if it was gpt3 or 4 that you used because 3 is notorious for math mistakes. In any case your video was very good. Straight forward and to the point.

    • @callbettersaul
      @callbettersaul Před rokem

      It approached zero, because the list was the base and going above 100% would mean, that the set took negative seconds to complete. If he used set as base, he would have to write that the list was slower by 2,848,073% for example. In video it stated that set was faster by

  • @saiakhileshande8486
    @saiakhileshande8486 Před rokem +1

    Hey @Indently, what if I am given a very long list and asked to check if an element exist in the list or not. In this case, which one is optimal - checking in the list or converting the list to set and checking in the set? The question is that is it better to convert list into set and check?

    • @dmytrk
      @dmytrk Před rokem

      Converting a list to set takes N hashing calls, so you don't win in efficiency. Either store your data in a set, if order doesn't matter, or just use a list

  • @sirynka
    @sirynka Před rokem

    My first thought was that you're going to talk about list vs tupple comparation and I was confused about how immutability could make iteration faster.

  • @devocracy1089
    @devocracy1089 Před rokem +1

    Very cool!

  • @the_infinity_snake
    @the_infinity_snake Před měsícem

    The thumbnail showed the word "why",
    so i thought this video would explain the reason why a set is faster to search for a specific value than a list.

  • @vvv3876
    @vvv3876 Před rokem

    Python 3.10.4
    Windows 8.1
    I dont have the library timeit and when try to install it through pip i get the error:
    No matching distribution found for timeit

  • @NickDanger3
    @NickDanger3 Před rokem +1

    The fact that sets are implemented as hash tables, and that the performance is O(1) are not two separate “reasons”: they are the exact same reason expressed in two different ways.

  • @chx75
    @chx75 Před rokem

    How about dicts?

  • @KeiraraVT
    @KeiraraVT Před rokem

    Looking at this makes me wonder if I can optimize my function by making a set instead of a dict for speed. Or if a dict is faster than a. List but slower than a set.

    • @appelnonsurtaxe
      @appelnonsurtaxe Před rokem

      From an implementation perspective, a set is often implemented as a simple hashmap (dict) where the keys are the set element and the values are all void (empty). So the performance of indexing into a dictionary and checking if an element is in a set is pretty much the same under the hood

  • @mysticplayz5649
    @mysticplayz5649 Před rokem

    Doesn't that mean if sets are unordered the number we are finding will be at 2nd index or in the middle where in list we know that what we are finding is always at the last index so set find it quicker than list ?😮

    • @Efebur
      @Efebur Před rokem +2

      No. If it were in the middle (on average) it would be half as slow as the list, instead it is really fast, always. It's because sets are using a completely different algorithm than lists (hashing).

  • @apsifiable
    @apsifiable Před rokem +3

    You should search zero instead of 999_999. Or in other words, not that your conclusion isn't correct, but it is almost like some marketing department came up with your example: the most unoptimal case from the point of view of lists.

    • @Efebur
      @Efebur Před rokem +2

      That is the point though. You should compare worst cases (or at least average), not best cases. Obviously if he searched for 0, it would come up immediately for his list. Instead, he should shuffle the list and search for multiple random numbers and make an average.

    • @apsifiable
      @apsifiable Před rokem

      @@Efebur Indeed. What I learned is that sarchasm is a difficult sport.

    • @Indently
      @Indently  Před rokem

      The amount of friends I've lost through using sarcasm in text form 😂

  • @laz272727
    @laz272727 Před rokem +3

    One thing sets do as opposed to lists is eat twice the ram. It's not twice of a lot of ram, since lists tend to be storing pointers and those are int sized, but it's still more.

    • @Indently
      @Indently  Před rokem

      Definitely interesting :)

    • @infiniteplanes5775
      @infiniteplanes5775 Před rokem

      There are always trade offs, and I am interested in hearing some more stuff about lists vs sets

    • @laz272727
      @laz272727 Před rokem

      @Infinite Planes
      There are generally two types of lists used commonly.
      Lists you see by default in OOP languages are generally Array-backed - literally an array of a size similar to the list, but rounded to some pre-set number. They're more expensive to edit, since you need to make a new array every time, but can be accessed by index O(1) (it's a sum operation of array's beginning pointer and the index). They are generally the best if you need an array you would want to edit later.
      However, in Functional languages, the default is a Linked List - where every list member has a pointer to the next one. This means editing it is easy, since the next member can be anywhere in memory and you just have to change the pointer; however, finding a specific member is very difficult since you literally have to iterate over the entire piecemeal list. Also, since they store one extra pointer, they're slightly bigger. They are better for FP-like usages of lists, though.

    • @laz272727
      @laz272727 Před rokem

      @Infinite Planes There are two types of construction for tables (the video talks about sets, but values don't have to be unique and the difference in speed is very minor - in fact, many languages implement sets as tables without a value, or with the value being the actual value and key being hashed).
      Note that while calculating space taken by tables is relatively easy (they, like array lists, are slightly bigger than their elements put together), explaining them isn't quite as easy. They have a lot of optimizations related to the fact that they're meant for really fast lookups by keys. Hell, in a perfect world, hashmaps are O(1) (if you use an array indexed by the hash - obviously that is nowhere near space efficient).

    • @IlariVallivaara
      @IlariVallivaara Před rokem +1

      ​@@laz272727 Array-backed lists are not necessarily or even usually more expensive to edit: it all depends on how you edit it. Further, linked lists offer asymptotic removal/addition benefits only if you can obtain the corresponding pointer for free or very cheaply, which is not usually the case. It is a common misconception, though, because people don't realize the cost of traversing the linked list to obtain the pointer.
      You also seem to mix random (index-based) access to finding a specific number in the collection. These are completely different operations, and the latter is O(n) for both (unsorted) arrays and linked lists.

  • @tanmaypatel4152
    @tanmaypatel4152 Před rokem +3

    Which IDE are you using in this video?

    • @mailonbido
      @mailonbido Před rokem

      im curious too

    • @tanmaypatel4152
      @tanmaypatel4152 Před rokem +1

      @@mailonbido I got it. It's pycharm. It looks different because of the latest UI update in the 2023 version I guess.

    • @mailonbido
      @mailonbido Před rokem

      @@tanmaypatel4152 oh nice, thanks!

    • @tanmaypatel4152
      @tanmaypatel4152 Před rokem +1

      @@mailonbido You're welcome

  • @sushantpawar3703
    @sushantpawar3703 Před rokem +3

    What about tuples compared to lists?

    • @k.kaiserahmed8013
      @k.kaiserahmed8013 Před rokem +1

      Somewhat slow... Just a lil bit....

    • @MilChamp1
      @MilChamp1 Před rokem +1

      tuples are faster to create and index, and very similar to search through. Lists are useful for collections that need to be edited.

  • @Mulakulu
    @Mulakulu Před rokem +5

    I believe you made a mistake with the percentage code and result. Your result doesn't show how many times faster the fast code was. That would be determined by 9.16/(3.2•10^-5)=286250, so a quarter of a million times faster. Basically the fast code could run a quarter of a million times in the same time the slow code could do it once.
    I recognize your way of calculating it to be the way to find percentage error, where 100% seems like a reasonable result, but it's not what you're looking for I presume.

    • @James_08_07
      @James_08_07 Před rokem +1

      Just about to say exactly this, the limit in his approach is 100% faster which is accurate but not the best way IMO to show how much faster something is.

    • @Indently
      @Indently  Před rokem

      The math is silly, I was fooled by the shiny ChatGPT-4 model. I need to be much more harsh on it.

  • @artistpw
    @artistpw Před 2 měsíci

    Sets seem far more efficient than lists. Nice you can convert them back and forth to get items sorted.

  • @kbcat
    @kbcat Před měsícem

    what ide is this

  • @computerfun6724
    @computerfun6724 Před rokem

    Love from India bro 🇮🇳

  • @thegothaur
    @thegothaur Před 5 měsíci

    funny. it is O(1) because of sets being hash tables :)

  • @link_safe
    @link_safe Před rokem

    is a frozenset() faster than a set()?

    • @laz272727
      @laz272727 Před rokem +1

      They take slightly less ram, since they don't need to be mutable.

    • @link_safe
      @link_safe Před rokem

      @@laz272727 Thanks 🧑‍💻

  • @Smbrine
    @Smbrine Před rokem +1

    Just use ctypes. Don't be lazy, c/cpp is about 100-200 times faster (it takes about 0.2sec to load a cpp function in python but even considering this it will be 10 times faster at least)

    • @Indently
      @Indently  Před rokem

      We use Python because we are lazy.

    • @Smbrine
      @Smbrine Před rokem

      @@Indently python is also lazy so that sums into a very poor performance :)

    • @Indently
      @Indently  Před rokem +2

      @@Smbrine Poor performance, big salary 😎

  • @ThankYouESM
    @ThankYouESM Před rokem

    Is there a way to generate pixel-by-pixel smooth, colorful images incredibly fast with set()?

    • @waylonbarrett3456
      @waylonbarrett3456 Před rokem +1

      You could have a set for every integer value of a color channel. So, one for 0 red, one for 1 red, one for 2 red, etc. And the same for green and blue channels. Then, in each set, store the pixel indices for that color value.

  • @rishiraj2548
    @rishiraj2548 Před rokem

    👍