Python lists remember what you did to them

Sdílet
Vložit
  • čas přidán 24. 07. 2024
  • Two equal lists can take differrent amounts of memory!
    Two Python lists of the same size can take different amounts of memory, even if they contain exactly the same elements. A larger list can even take less memory than a smaller list. How can this be? Lists appear to remember more than just their elements, and somehow contain some information of their history. In this video we investigate this phenomenon. We see what the system getsizeof function returns and how the internal structure of a list is represented in order to find the answer.
    ― mCoding with James Murphy (mcoding.io)
    Source code: github.com/mCodingLLC/VideosS...
    libcst: libcst.readthedocs.io/en/late...
    AST linting vid: • Python AST Parsing and...
    Structural pattern matching vid: • The Hottest New Featur...
    SUPPORT ME ⭐
    ---------------------------------------------------
    Patreon: / mcoding
    Paypal: www.paypal.com/donate/?hosted...
    Other donations: mcoding.io/donate
    Top patrons and donors: Jameson, Laura M, Dragos C, Vahnekie, John Martin, Casey G
    BE ACTIVE IN MY COMMUNITY 😄
    ---------------------------------------------------
    Discord: / discord
    Github: github.com/mCodingLLC/
    Reddit: / mcoding
    Facebook: / james.mcoding
    CHAPTERS
    ---------------------------------------------------
    0:00 Intro
    1:01 Let's investigate
    3:04 Under the hood of a list
    4:23 How sys.getsizeof is calculated
    5:06 Capacity
    6:13 Why capacity != length
    7:08 Geometric resizing
    7:34 The answer in terms of capacities
    9:19 Reality check
    9:49 Thanks
  • Věda a technologie

Komentáře • 198

  • @scoates
    @scoates Před 2 lety +344

    As requested: I just worked on a micropython ("embedded python") project where it's the actual [re]allocation that took a LOT of time (or so it seems without great profiling tools). I wasn't running out of RAM, but it was the changing of capacity/allocation (in my case, in strings, not lists, but probably similar; maybe just not with spare capacity) that was the bottleneck. I like this deep-dive stuff. Even if it's not immediately practical, it's still useful to know how things work.

    • @gormster
      @gormster Před 2 lety +18

      I would guess that it’s not particularly similar as strings in Python are immutable. There will never be a reallocation to extend capacity for a string because *any* mutating string operation creates a completely new object.

    • @scoates
      @scoates Před 2 lety +1

      @@gormster yeah; that’s a good point. Definitely no over-capacity on strings, but lots of (new) allocation when concatenating.

    • @Konyad
      @Konyad Před 2 lety +2

      In Java there is a special class StringBuilder created for this reason that the strings are immutable. So you build strings from parts and join them once when you need it.

    • @QuantumHistorian
      @QuantumHistorian Před 2 lety +2

      @@Konyad C# has the same thing. Which, considering how much it took from Java, is not exactly surprising.

    • @nonchip
      @nonchip Před rokem

      this keeps happening to me with "std::string" on embedded C++... also one day i'll find out why the heck the standard library hard-depends on crap like glibc-locales on embedded targets and refuses to compile about a random half of string operations...
      but hey at least there it does the thing when you more or less expect because no GC/... so it's not all too hard to at least shove the expensive code somewhere it doesn't run too often

  • @voxelfusion9894
    @voxelfusion9894 Před 2 lety +56

    list a: "I know what you did last runtime."

  • @megaing1322
    @megaing1322 Před 2 lety +76

    One interesting extra detail: For the last example with `for x in range` it probably doesn't reallocated at all, but only allocates once. This is what the `length_hint` special function is for. It's supposed to return a guess of how many elements the iterator returns

  • @jeroenritmeester73
    @jeroenritmeester73 Před 2 lety +133

    This guy really went and took an elaborate deep dive for 10 minutes, only to debunk the relevance of his own video at the end. Well, at least you're humble!

  •  Před 2 lety +237

    It should be said that these are CPython specific internals and other implementations like PyPy or even subsequent versions of CPython can behave completely differently.

    • @dhkatz_
      @dhkatz_ Před 2 lety +6

      Who is using anything but CPython out in the actual world? lol

    • @DanCojocaru2000
      @DanCojocaru2000 Před 2 lety +7

      @@dhkatz_ Anybody listening to Guido's advice.

    • @d3stinYwOw
      @d3stinYwOw Před 2 lety +1

      @@DanCojocaru2000 Can you link this advice somehow?

    • @vinos1629
      @vinos1629 Před 2 lety +18

      @@d3stinYwOw "If you want your code to run faster,
      you should probably just use PyPy."
      -- Guido van Rossum (creator of Python)

    • @pranavnyavanandi9710
      @pranavnyavanandi9710 Před 2 lety +1

      @@vinos1629 Python at the speed of C?

  • @Ba_Dashi
    @Ba_Dashi Před 2 lety +80

    I love these deep dives, but I feel like you could add on your description which OS and Python interpreter version showed this behavior. Both so that we can compare and for future reference

    • @bobuccman1424
      @bobuccman1424 Před 2 lety

      he knows exactly it takes you less than 100 keystrokes to test that on your machine and compare to future reference

  • @TimmyTam
    @TimmyTam Před 2 lety +26

    "List will remember that"

  • @callbettersaul
    @callbettersaul Před rokem +1

    So many people ask that question about the list size differences, but this king actually answered it.

  • @nio804
    @nio804 Před rokem +5

    I think understanding the internals at some level (even if your details aren't 100% accurate) is *extremely* useful because having that vague idea makes it easier to recognize situations where you do need to dig in deeper.
    It's good to always have an idea of what's happening, even if in most cases, to borrow from physics, your "cow" is perfectly round and frictionless.
    Basically, when you're using an abstraction, you should at least understand the limits of that abstraction.

  • @Vijwal
    @Vijwal Před 2 lety +6

    "python lists remember what you did to them" Sounds like a threat

  • @davewagler1092
    @davewagler1092 Před 2 lety +48

    With Python3.8.10 in Linux Mint, the sizes of the three original lists are, respectively, 80, 80, 88.

    • @xtremegamer20k89
      @xtremegamer20k89 Před 2 lety

      yes i got 88 for list comp and 80 for [0, 0, 0], [0]* n (i use 3.8.6)

    • @daskadse769
      @daskadse769 Před 2 lety +5

      For [0,0,0], x=0;[x,x,x], [0]*3 and [0 for _ in range(3)] respectively:
      python 3.6.10 on macOS: 88/88/88/96
      python 3.10.1 on macOS: 120/80/80/88
      I find the amount of fluctuation interesting honestly.

  • @hagen-p
    @hagen-p Před 2 lety +2

    Thanks for the insight! (And yes, in embedded systems, or slow platforms, small differences in memory consumption/runtime can add up, so knowing about this can be very relevant.)

  • @XCanG
    @XCanG Před 2 lety +1

    I just today speaking with my friend who code on Swift about python lists, generators and list comprehensions, what an ideal time to release a video!

  • @CyborusYT
    @CyborusYT Před 2 lety +5

    1:41 ascii encoding of "subscribe" concatenated into a single number

  • @nathanoy_
    @nathanoy_ Před 2 lety +1

    really enjoying your videos, thank you!

  • @andrewglick6279
    @andrewglick6279 Před 2 lety +42

    This was very interesting! Somewhat related, I have heard that Python dicts have faster lookup times than lists, but this doesn't really make sense to me. I would love to see your take on deconstructing the memory and speed differences between using lists and dicts. Great content as always!

    • @LettuceAttack176
      @LettuceAttack176 Před 2 lety +23

      Hi! Regardless of if a video is made for this just thought would help you out a bit. The reason the lookup in a dict is faster than a list is because looking up a value in a list requires you to iterate through each item in the list until you find the value of what you are looking for. This can be costly especially if it’s a big list. However a dictionary (hash map) essentially uses the value of the key you are looking for in the lookup (not the index as you would in a list) and it’s either in the hash map or not, there is no iteration through every item in the dictionary. This means it is faster. If you were using Big-O (notation for how time complex a lookup would be) a list would be o(n) as the speed of the lookup soley relies on how many items there are and worst case scenario the whole list has to be iterated through. However for a dictionary it’s o(1) as the lookup time is constant (as the thing you are looking for in the dictionary is either there or not there)
      Edit: Some python and low level people may pick apart some holes in what I said I.e. it depends on the hashing algorithm under the hood etc, just trying to generalise

    • @Marquesian
      @Marquesian Před 2 lety +34

      @@LettuceAttack176 That would be true if the Python list were implemented as a linked list but instead it's actually backed by an array and it does support random access in O(1) time. And just as a tip, be mindful about using a lower case o in Big-O analysis, as it has another meaning than an upper case O.
      As for why a lookup in a dictionary could be faster than in a list, I've got no clue. Both may require memory to be loaded from swap causing delay, and the dict may suffer from bad hashing so that seems to imply the opposite.
      Bonus: The backing array is implemented with a fixed size and if there's any room left, appending an element to the array takes Theta(1). If the array is full the entire array needs to be copied to another, larger block of allocated memory taking Theta(n), therefore technically appending an element is O(n). However, as this copying only occurs when the array is full, it's running time is sometimes averaged across the constant time operations into what's known as amortised worst case running time, which is ((n-1)*Theta(1) + Theta(n)) / n = Theta(1). Note that it still concerns the worst case, but averaged over multiple operations (as opposed to the average case over multiple different inputs).
      Edit: My bad, when you mentioned lookup I figured you meant random access. If you want to determine whether an element is present in a list then indeed the entire list would be traversed whereas with a dict it takes time depending on the hash function and the existing collisions, but O(1) in the average case. If you need an ordering like a list that supports constant time membership testing, starting from Python 3.7 the dict implementation preserves insertion order! Earlier versions may instead rely on OrderedDict.

    • @andrewglick6279
      @andrewglick6279 Před 2 lety +3

      @@LettuceAttack176 That's very interesting, thanks!
      Why do Python lists require iterating over the items? I always assumed Python lists (and probably also C++ std::vector) were more like C/C++ arrays which (as far as I know) are constant lookup instead of linked lists which require iterating through.

    • @michaelmoore8
      @michaelmoore8 Před 2 lety +14

      @@andrewglick6279 As far as I understand lookup to a specific index in a list is o(1), similar as it would be for lookup by key in a dict. Now if rather than simply looking up you were looking for a specific item in a list it's o(n) whereas for the dictionary it remains o(1).
      By contrast if python lists were backed by linked lists it'd be o(n) to both retrieve the ith item from the list, and search the list for a specific item that may or may not be in the list because with a linked list you gotta iterate in both cases.

    • @andrewglick6279
      @andrewglick6279 Před 2 lety

      @@michaelmoore8 Ah, that makes sense then

  • @thefekete
    @thefekete Před 2 lety +24

    As an embedded c programmer, well, I just statically allocate list sizes... Using malloc() is too much work. Not programming work, having to deal with code reviewers and their inherent fear of that function is a real pain.

    • @berylliosis5250
      @berylliosis5250 Před 2 lety +5

      I can see that in an embedded context, but I can't stand working with static arrays in C. No reentrance, actually fixed size, no nesting... no fun.

    • @thefekete
      @thefekete Před 2 lety +4

      @@berylliosis5250 I once needed a b-tree for a project, but malloc() and friends were expressly forbidden by coding guidelines - no exceptions. So I created my own heap from a statically allocated array and my own allocation and deallocation functions to "dynamically" create/destroy elements. Was actually kind of fun and a good learning exercise.. plus they couldn't say crap when the unit tests passed and I never touched malloc😋

    • @berylliosis5250
      @berylliosis5250 Před 2 lety +5

      @@thefekete I do not understand that restriction, heh. I get it when malloc isn't available, or when there's very little memory, but making your own heap is both necessary and basically just malloc with extra steps ( _more_ potential for bugs than with actual malloc)

    • @bskull3232
      @bskull3232 Před 2 lety +8

      @@berylliosis5250 Big organization 101. No one gets fired for buying IBM. The same goes here. Nobody cares if your code works, they care about your code being compliant, so they do not get blamed when it does f* up. Yes, your malloc is almost guaranteed to be worse than the library version, but the reviewers are not required to enforce that. And believe me, they also know this is stupid, but just no one wants their employment and pension gone. And no, bosses are not idiots. They choose to enforce this also for a good reason -- most people are dumb, so while it does impede innovation, staying with rules actually helps more average people than it hurts much fewer talents. This is sad, but this is human nature. That's why there is social science besides natural science, and there are people making tons of money on jobs like economist and government advisors. As engineers we often write their works off, but without them, we could descent into chaos very quickly if everyone thinks only about the local best solution without regarding to the bigger picture.

    • @jaxfrank
      @jaxfrank Před 2 lety +1

      @@berylliosis5250 It can be better to write your own allocator because you know exactly what the use case is. You can write and tune your allocator to solve exactly the problem that needs solving and not every type of allocation possible like with malloc. However, in my experience, dynamically allocating memory on the heap is basically never needed. You almost always know your worst case and can just statically allocate enough memory for that situation.
      Not being allowed to use malloc in an embedded system is not that much of a loss in my opinion.

  • @comedyclub333
    @comedyclub333 Před 2 lety +2

    Well, in the past I was always complaining about shit in javascript like "[ ] == [ ] is false but ![ ] == [ ] is true" and "NaN === NaN is false" and praised python for its predictability. You prove me wrong every time you upload a video.

  • @spenceabeen
    @spenceabeen Před 2 lety +4

    The code knows what you did
    The code knows what you did
    The code knows what you did

  • @JohnMichaelCagle12
    @JohnMichaelCagle12 Před 2 lety

    This is super interesting. Thank you.

  • @lohannasarah152
    @lohannasarah152 Před 2 lety +1

    Man, this channel is too good!

    • @mCoding
      @mCoding  Před 2 lety

      Thank you for the kind words!

  • @knut-olaihelgesen3608
    @knut-olaihelgesen3608 Před 2 lety +1

    Super good audio quality!

  • @williamowens2063
    @williamowens2063 Před 2 lety +10

    I thought you chose the number at 1:45 because it is the maximum value for an unsigned int but I quickly realized that doesn't make sense and it's way too big to be the max value of an unsigned int. Do you mind sharing why you chose that number?

    • @leo848
      @leo848 Před 2 lety

      please answer if anyone knows

    • @SophieJMore
      @SophieJMore Před 2 lety +26

      @@leo848 It's just an easter egg. Try converting the number to hex and then converting the hex into ascii

    •  Před 2 lety +12

      @@SophieJMore That's it. For anyone too lazy, doing so ends up giving "subscribe". How didyou figure it out, Sophie?

    • @SophieJMore
      @SophieJMore Před 2 lety +6

      @ At first I assumed it was something to do with integer ranges, but it turned out not to be. Then I thought it was some kind of hidden message and tried converting it to different bases and ascii and stumbled upon the solution. In theory it should also work in binary and octal, but I couldn't do it, maybe I was doing something wrong.

    • @ShalevWen
      @ShalevWen Před rokem +1

      @@SophieJMore it works in binary if you add 0 to the start, it doesn't work in octal because the bits don't align (1 character is 2 digits in hex, 8 digits in binary but 8/3 digits in octal)

  • @alessandrocalio1446
    @alessandrocalio1446 Před 2 lety +3

    I love these kind of videos! Keep it up! Great work!

  • @calmpuffin
    @calmpuffin Před 2 lety +8

    What a title

  • @xskerrittx1
    @xskerrittx1 Před 2 lety +10

    is that number when a python regular number becomes a big int?

  • @TuMadre8000
    @TuMadre8000 Před rokem +4

    y'know, the more i learn about python's inner workings, the more i begin to question it.
    i like the idea of code being as simple to read as python does it, but seeing what it takes to get there makes me have second thoughts.

    • @sykescalvin09
      @sykescalvin09 Před rokem +2

      Why? None of this is specific to python, these sorts of considerations are needed in any language that gives you a mutable list datatype. And this is all happening in the interpreter, so it's just as fast as writing your own C code to handle reallocations would be.

    • @TuMadre8000
      @TuMadre8000 Před rokem

      @@sykescalvin09 yeah, i get that, but... idk, part of me prefers having to deal with manual memory allocation on c++ over dealing with python's strange quirks

    • @sykescalvin09
      @sykescalvin09 Před rokem +1

      @@TuMadre8000 If you use STL containers like and friends, this is still going on behind the scenes. If you handle memory allocation yourself, quirks (segfaults!) are far more likely to creep in and unless you can optimise for the specific allocation patterns the roll-your-own solution is probably less performant as well.

    • @starvlingk5122
      @starvlingk5122 Před rokem

      Python makes extremely great pseudo c code. I find it much easier to write an algorithm in python first then rewrite that same algorithm in c next.

  • @kaninchengaming-inactive-6529

    "Python lists remember what you did to them" rather sounds like a threat

  • @Aditya-ne4lk
    @Aditya-ne4lk Před 2 lety +2

    how to do you come across things like this?

  • @iAmTheWagon
    @iAmTheWagon Před 2 lety +3

    I love this channel

  • @antonioberandas9610
    @antonioberandas9610 Před 2 lety +1

    So everytime I reach the capacity limit, the whole list gets reallocated to a new spot?

  • @robertbrummayer4908
    @robertbrummayer4908 Před 2 lety

    Excellent video

  • @Snifo
    @Snifo Před 11 měsíci

    That's why when we use async we need to lock it before editing and appending the list ? So there is a chance that the list could be edited while switching the a new memory size ?

    • @mCoding
      @mCoding  Před 11 měsíci +1

      This would be a potential issue with threading, but not with async since async only swaps control at expicit points like "yield". However, current implementations of python use a global lock to prevent this from being an issue even with threading.

  • @ren200758
    @ren200758 Před 2 lety +3

    thanks, and i can imagine the same(-ish) being applied to other python objects. perhaps another homework time.

  • @SkyFly19853
    @SkyFly19853 Před 2 lety

    Oh...
    I think I like that feature...
    Btw, have you made any videos on how to speed up loops and if statements?
    Thank you in advance!

    • @pianocrasher
      @pianocrasher Před 2 lety +1

      Yeah. He released a video about this topic some months ago.
      Just search for this title: 'The Fastest Way to Loop in Python - An Unfortunate Truth'

    • @SkyFly19853
      @SkyFly19853 Před 2 lety

      @@pianocrasher
      I understand. ✅

  • @Rudxain
    @Rudxain Před 2 lety +6

    My guess for the number-literal "quirk" is that it makes a deep copy of each arbitrary-precision int and therefore each one has a different address and unique pointers to be stored. That explains why when using variables it has the same capacity as when explicitly repeating the element. Maybe because it uses Run Length Encoding to internally repeat the same pointer without making copies of it

  • @geegrant865
    @geegrant865 Před 2 lety +4

    This is so much better than y’all just telling us. How I wish youtube instructors would debug in trials before telling us the answer. Print statements or it didn’t happen

    • @mCoding
      @mCoding  Před 2 lety +1

      Glad to hear you appreciate the style!

  • @b4ttlemast0r
    @b4ttlemast0r Před 2 měsíci

    Is there something akin to C++'s resize, reserve and shrink_to_fit methods? These would allow more direct control over the capacity.

  • @klausvonshnytke
    @klausvonshnytke Před rokem

    The analogy to renting building is funny. Most business owners would rather cram more people in the same building than rent a bigger one.

  • @johnnysun6495
    @johnnysun6495 Před rokem

    "lists remember what you did to them" sounds like a threat

  • @andrey2001v
    @andrey2001v Před 2 lety +1

    from now on I'll preallocate memory to lists instead of append() to empty list in loop.
    Gotta count those nanoseconds.

  • @gianclgar
    @gianclgar Před 2 lety +9

    But… Why does the [0,0,0] takes space for 8? 🙄

    • @rb1471
      @rb1471 Před 2 lety +1

      and why does [e, e, e] take 3? I didn't catch the reasoning for it

    • @gianclgar
      @gianclgar Před 2 lety +3

      @@raccoon1160 Yeah but... still need an explanation, weird things also have an explanation, specially if they repeat the same behavior (seems to be always 8, not just a random number)

    • @gianclgar
      @gianclgar Před 2 lety +1

      @@rb1471 3 makes sense because it takes spaces for 3 elements (3 e's) if i recall correctly

    • @0LoneTech
      @0LoneTech Před rokem

      This one seems to be subtle. Checking the bytecode for a couple of functions with dis.dis, [3,3,3] caused a list to be created then extended with a tuple (3,3,3). [e,e,e] on the other hand used BUILD_LIST with 3 arguments, and BUILD_LIST in turn created the list with reserved capacity using PyList_New() before populating it backwards. extend used list_resize to allocate space, and that added some margin.
      These two behaviours can be demonstrated using l=list(t) vs l=list();l.extend(t)
      That's my testing on CPython 3.9 on Linux x86_64. This is all implementation details that can and do vary between implementations.

  • @remrevo3944
    @remrevo3944 Před 2 lety +1

    I mainly use rust and there the capacity is actually important in some cases. For example when having 100 elements, you know you're going to add to the vector (rusts version of a list) it would be very wasteful if you had to reallocate the vec a few times before actually reaching the needed capacity. Therefore there is a with_capacity constructor.

    • @mCoding
      @mCoding  Před 2 lety +2

      Yes, most compiled languages expose the capacity api as it is very important in performance oriented situations!

    • @remrevo3944
      @remrevo3944 Před 2 lety +1

      @@mCoding Though there is no automatic resizing of rust vecs when they get smaller.

  • @imlovinit1232
    @imlovinit1232 Před 2 lety

    Is a python list contiguous? I don't see the reason it'd need to be to function, but I could see it for speed reasons to make it fast indexing like an array

    • @mCoding
      @mCoding  Před 2 lety

      Yes, python lists are contiguous. However, all python objects are pointers, so it is a contiguous list of pointers that point to allocated memory that is probably not contiguous.

    • @0LoneTech
      @0LoneTech Před rokem

      It's not only speed; discontiguous arrays (e.g. ropes or linked lists) need more memory to track where their subparts are.

  • @XxyehezkelxX
    @XxyehezkelxX Před 2 lety +4

    I know this was more on the informative side of videos, but honestly I was mostly laughing at your comments on screen lol.
    In terms of the vid itself, pretty much what I expected in terms of memory management, but awesome vid nevertheless!

  • @vmarzein
    @vmarzein Před 2 lety +2

    if I had to guess, 21298...4277 is the highest int value before int overflow?

    • @SophieJMore
      @SophieJMore Před 2 lety +3

      Python doesn't really have int overflow. It just converts the number to bigint

    • @mCoding
      @mCoding  Před 2 lety +8

      (Hint) Maybe you should look at it in binary...

    • @kloetikalalloet1245
      @kloetikalalloet1245 Před 2 lety

      @@mCoding At first I thought it is some prime number with a special history where group theorists do some math jokes with it. But google gave no answer to it. Then I though it might be already HEX code, but it wasn't directly apparent, after going from binary to ascii I already smelled the answer. Now I wonder if I subscribed because of the great content or the subliminial messages...

  • @meemkoo
    @meemkoo Před 2 lety

    the list may be gone, but the list always remembers...

  • @xelaxander
    @xelaxander Před 2 lety +2

    Oh wow! CZcams just deleted my comment, linking a Gist with Benchmarks of different types of list construction. Tl:dr was that both list comprehensions or preallocating a list of 10000 zeros is on my system about 28-44% faster than allocating an empty list, then appending each element separately and numpy is 5x faster than that. The gist is still up on my Github, same as my CZcams name.
    So there *is* actionable information in this video :).

  • @QUASAR098
    @QUASAR098 Před 2 lety +6

    good video

  • @kawaiimunism
    @kawaiimunism Před 2 lety +3

    Huh, interesting, I never realized that CPython lists were stored contiguously, I'd always assumed they were linked lists.

    • @mCoding
      @mCoding  Před 2 lety +3

      This is actually a common misconception! I also thought this at one point.

    • @igorswies5913
      @igorswies5913 Před 2 lety +1

      I guess they were called lists because it sounds simpler than arrays, and Python is supposed to be a simple language

    • @0LoneTech
      @0LoneTech Před rokem

      ​@@igorswies5913 True, but there's also an array type (in the array module). Python's array type stores fixed size numbers which are converted to objects when read. This can make it more compact than a list, which has to store references to all elements, where each reference is usually (on 64-bit systems) the largest size available in array. Users are generally directed to use numpy instead, which offers more useful high level operations.

  • @skilz8098
    @skilz8098 Před 2 lety +8

    The entire concept about dynamically allocated arrays as being a single structure or container is directly related to data structures and algorithms. The geometric progression is based on the observations of both minimizing the code or number of instructions while also increasing its performance and efficiency. In other words or in simple terms, it's based on Big O notation for both time and space complexity of an arbitrary algorithm in contrast to the desired container or data structure. Although Python is a High Level scripting or interpreted language compared to other languages such as C and C++ and abstracts away from these details, why would these concepts about memory layout, locality, access times, and geometrical expansion be any different or change? They don't! These concepts even apply in code bases such as JavaScript or any other language that can do computations on datasets. Even database and querying languages such as SQL or PHP even have these principles embedded within their internal frameworks. It's the nature of things as it always goes back to the mailbox problem with indexing into address spaces as well as the phonebook problem with all of the different searching and sorting algorithms as well as others!
    Nice video though! I really enjoyed it!

  • @mujeebishaque
    @mujeebishaque Před 2 lety +3

    Love from Pakistan. Thanks for sharing

    • @mCoding
      @mCoding  Před 2 lety +5

      Glad to reach you all the way in Pakistan! Hope you enjoy!

  • @joaopedrodonasolo680
    @joaopedrodonasolo680 Před 2 lety

    Great video, thank you. Still wondering what that big number means tho

  • @sharkinahat
    @sharkinahat Před 2 lety +4

    Memory? We don't need memory where we're going!
    import antigravity

  • @do0nv
    @do0nv Před 2 lety +1

    List knows what **you** did.

  • @sadhlife
    @sadhlife Před 2 lety +1

    it says subscribe in ascii doesn't it

  • @nO_d3N1AL
    @nO_d3N1AL Před 2 lety +2

    Would be good to get a video about "loadFactor", which is what Java's HashMaps use to determine when to add new buckets. I'm also curious as to how they reached this 9/8th's growth factor. I thought generally the capacity doubles every time (this is the case in Java's ArrayList I think), which is obviously more wasteful of memory, but understandable why they went with this approach.

    • @nulano
      @nulano Před 2 lety

      loadFactor is not used for lists, only dicts. PyPy keeps the dict size between 1/3 to 2/3 full, AFAIK CPython is the same.
      The list type grows with the following code in PyPy, which AFAIK was ported from CPython directly:
      if newsize < 9:
      some = 3
      else:
      some = 6
      some += newsize >> 3
      new_allocated = newsize + some

  • @non5309
    @non5309 Před 2 lety +3

    я все еще плохо знаю английский. но эти видео смотрю с удовольствием, допонимая что не понимаю по контексту. Спасибо за материал!

  • @tsalVlog
    @tsalVlog Před 2 lety

    "please comment now" I feel attacked

  • @Visleaf
    @Visleaf Před 2 lety

    top 10 threatening video titles

  • @AtotheKres
    @AtotheKres Před 2 lety

    Good god that's good!

  • @JB-fh1bb
    @JB-fh1bb Před 2 lety +1

    Since it triggers a copy to change capacity, and copying is expensive, can Python devs *choose* capacity at creation?

    • @mCoding
      @mCoding  Před 2 lety +4

      Not directly, capacity is an implementation detail of CPython so the best you can hope for is to hope that [0]*n gives a capacity of n for all n. But remember that Python also automatically shrinks the capacity so you can't even do [0]*n and then clear(). I've been reading the implementation and, because of the automatic shrinking, providing any kind of control over the capacity would be a big rewrite at the very least.

  • @unflexian
    @unflexian Před 2 lety +3

    if lists could talk they would say really bad things about this guy

  • @teamllr3137
    @teamllr3137 Před 2 lety +1

    Yes I'm an embedded python dev using micropython

  • @reisaki18
    @reisaki18 Před 2 lety

    variable e is 0 vs variable x with three 0...maybe that make sense??

  • @gwentarinokripperinolkjdsf683

    Just taking a guess. Because these are dynamically sized arrays, the capacity is counted towards the size

  • @ZarkWiffle
    @ZarkWiffle Před rokem +1

    Why is the title vaguely threatening

  • @JorgeLuis-ts6qp
    @JorgeLuis-ts6qp Před 2 lety

    This is the kind of stuff you only use in a job interview.

  • @Captain8771OnGithub
    @Captain8771OnGithub Před rokem

    "in C+ (cut) +'s vector"
    something tells me someone forgot a +

  • @unflexian
    @unflexian Před 2 lety +2

    "subscribe" as if i wasn't already🙄🙄

  • @doc0core
    @doc0core Před 2 lety

    Hmm... I hesitated before clicking the video. It turns out exactly as I expected, curious but useless. If one care about exact memory management you should use Python. For certain structured data we use NumPy and/or Pandas. Otherwise, switch back to C and do your own malloc() and free()

    • @mCoding
      @mCoding  Před 2 lety +2

      Is understanding useless? If you think the purpose of this video is to tell you how to save memory in Python then you've completely missed the point.

  • @SpeedingFlare
    @SpeedingFlare Před 2 lety +5

    Key takeaway: don't use list literals!

    • @SpeedingFlare
      @SpeedingFlare Před 2 lety +7

      Before you comment, I'm kidding

    • @mCoding
      @mCoding  Před 2 lety +9

      Several people stopped typing...

  • @avananana
    @avananana Před 2 lety +1

    I can see this knowledge being useful if you for some reason are using a ton of lists and loops and trying to optimize the code. But honestly, nobody codes in python for performance so it's probably, just as you said, good to know but not very useful.

  • @rotemlv
    @rotemlv Před 2 lety +1

    Unfortunately it gets messier when you put lists in your lists. I knew I had an issue with the second version, it hurt me when I saw it, I just remembered why.
    If this comment survives birth, you can run this piece of code and see the problem-output:
    aa = [[]]*3
    bb = [[] for _ in range(3)]
    print(aa)
    print(bb)
    aa[0].append(1)
    bb[0].append(1)
    print(aa)
    print(bb)

    • @TheArtikae
      @TheArtikae Před 2 lety +1

      Mutability moment.

    • @SugarBeetMC
      @SugarBeetMC Před 2 lety

      aa contains the same sublist object three times, bb contains three different sublist objects.

  • @mxbx307
    @mxbx307 Před 2 lety

    Python is just a weird language all round. In the year 2022 it still doesn't understand tabs vs. spaces, for one thing.
    Then when you import another file, any code in that file that is OUTSIDE of a function is just merrily run by the calling file? That quirk of the language has been the subject of quite a few security bootcamp exercises I've worked on.

    • @0LoneTech
      @0LoneTech Před rokem

      We understand that mixing tabs and spaces leads to confusion. And yes, running Python source runs statements. That's kind of a key thing. Are you comparing to programs that don't have top level statements like class and def?

  • @ballin32
    @ballin32 Před 2 lety

    Integer limit@

  • @NikolajLepka
    @NikolajLepka Před 2 lety

    8 bytes seems a bit extreme for a lot of these things.

    • @SugarBeetMC
      @SugarBeetMC Před 2 lety +2

      Most of these things are pointers to objects in memory. Hence, 64 bits or 8 bytes.
      And on 64b processors, smaller value sizes are usually not worth the hassle.

  • @iamkai101
    @iamkai101 Před rokem +1

    simple. getsizeof actually returns a random number

  • @chrisl9001
    @chrisl9001 Před 2 lety +2

    Lists know where you live...

  • @anda3487
    @anda3487 Před 2 lety

    I actually thought lists just doubled their size when hitting max capacity...

    • @emirs769
      @emirs769 Před rokem

      That's a method too. But imagine if you have a list that has a million elements and you want to add another one. Would you increase the capacity to 2 million?

  • @FranoGames
    @FranoGames Před 2 lety

    you chose it because of int limit

  • @TheChemicalWorkshop
    @TheChemicalWorkshop Před 2 lety

    high level language

  • @P4ExHzLRuuiFMg3X4U3v
    @P4ExHzLRuuiFMg3X4U3v Před 2 lety +1

    I was playing around with sys.getsizeof() on other builtin types and I stumbled on something surprising (for me). For strings of course the concept of capacity doesn't apply because they are an immutable type, so it makes sense that the exact amount of memory needed to store a given string is allocated, and no more. What surprised me is that for an empty string, sys.getsizeof returns 49. I expected a multiple of 8 bytes

    • @mCoding
      @mCoding  Před 2 lety +3

      Strings use multiple different in memory encodings. E.g. if your string is all ascii it uses 1 byte per character, but if you use an emoji it increases to 2 or more bytes per character. The string therefore needs to store this metadata about which scheme it is using inside the object, which is why you are getting this answer.

    • @0LoneTech
      @0LoneTech Před rokem

      ​@@mCoding While true, that doesn't quite reveal why it's an odd size. I believe the reason is the encoding chosen was UTF8, and in that case an extra byte is allocated for a C-style NUL terminator.

  • @AmanNidhi
    @AmanNidhi Před 2 lety

    If rust gets a library like STL in c++, i will learn learn Rust.
    i am confused if i should even think about learning python now.

    • @mCoding
      @mCoding  Před 2 lety

      I'm not familiar with rust very much (more to learn in the future i suppose!). Does rust not have a comprehensive standard library? I hear so much good stuff about it I'd be surprised to know it's stl was lacking.

  • @nonchip
    @nonchip Před rokem

    _sees title_
    pre-coffee-brain: "well yeah it's python, no wonder the memory management is hella inefficient as well"
    but joke aside, i wouldn't call that "remembering" or "what *you* did" (please *do not* rely on any of those effects for anything!), it's just a side effect of an allocator not wasting all your CPU all the time to save on 2 bytes of theoretically unused ram, what gives?

  • @MithicSpirit
    @MithicSpirit Před 2 lety +2

    Discord gang

  • @metgaming9545
    @metgaming9545 Před rokem

    *python list will remember that.

  • @CppExpedition
    @CppExpedition Před 2 lety

    Las week i strugle with this. Hate list multiplication aaa=['a']*3 neq ['a' for _ in range(3)]

    • @0LoneTech
      @0LoneTech Před rokem

      You seem confused. Those two are equal, because 'a' is the same immutable object either way. The difference is important if you replace 'a' with an expression that creates a mutable object, e.g. []. In this case they'll still be equal (as all the elements are empty) but different structure: [a,a,a] vs [a,b,c].

    • @CppExpedition
      @CppExpedition Před rokem

      @@0LoneTech oh yeah i've already read it in stackoverflow many month ago! Thx for your help!

  • @hanyanglee9018
    @hanyanglee9018 Před 2 lety

    If this matters, then probably you need c++ directly.

  • @mtstreaman1138
    @mtstreaman1138 Před 2 lety

    88, nice.

  • @MultiXGamer1
    @MultiXGamer1 Před 2 lety

    The title sounds very foreboding...

  • @hajajmaor
    @hajajmaor Před 2 lety

    If you're using python, memory is the thing that you don't care about

  • @therealpeter2267
    @therealpeter2267 Před 2 lety

    They will never find the bodies, though.

  • @SuperlativeCG
    @SuperlativeCG Před 2 lety +3

    Coding is hard because a computer will do EXACTLY what you tell it to do and it will not do EXACTLY what you didn't tell it to do.

    • @SophieJMore
      @SophieJMore Před 2 lety +1

      That's not true for high level languages like Python or JS.

  • @djtomoy
    @djtomoy Před rokem +1

    4:17 eww C, gross

  • @MarkMcDonald1
    @MarkMcDonald1 Před 2 lety +1

    I did what your magic number told me to do! I'll reply with a clue in case it helps others.

  • @mundusesttuum2536
    @mundusesttuum2536 Před 2 lety

    Lol multiply *3 it's a big mistake ..

  • @jhonnythejeccer6022
    @jhonnythejeccer6022 Před 2 lety

    While i like your face, i prefer a clean ide over an ide with a face on top

  • @whimsy5623
    @whimsy5623 Před 2 lety

    They had better

  • @levblit3721
    @levblit3721 Před 2 lety

    How are you getting 120? It's 80.
    Are any of your videos fact checked?

    • @mCoding
      @mCoding  Před 2 lety

      The numbers in the video are ouput from the source code linked in the description and shown on screen, so in that sense the video is self fact checking, though you are welcome to run the code for yourself. If you are getting a different answer you are most likely just using a different version of Python. One of the main points of the video is that the behavior is an implementation detail subject to change from version to version and not a fundamental property.