The Cost of Memory Fragmentation

Sdílet
Vložit
  • čas přidán 7. 06. 2024
  • Fragmentation is a very interesting topic to me, especially when it comes to memory.
    While virtually memory does solve external fragmentation (you can still allocate logically contiguous memory in non-contiguous physical memory) it does however introduce performance delays as we jump all over the physical memory to read what appears to us for example as contiguous array in virtual memory.
    You see, DDR RAM consists of banks, rows and columns. Each row has around 1024 columns and each column has 64 bits which makes a row around 8kib. The cost of accessing the RAM is the cost of “opening” a row and all its columns (around 50-100 ns) once the row is opened all the columns are opened and the 8 kib is cached in the row buffer in the RAM.
    The CPU can ask for an address and transfer 64 bytes at a time (called bursts) so if the CPU (or the MMU to be exact) asks for the next 64 bytes next to it, it comes at no cost because the entire row is cached in the RAM. However if the CPU sends a different address in a different row the old row must be closed and a new row should be opened taking an additional 50 ns hit. So spatial access of bytes ensures efficiency,
    So fragmentation does hurt performance if the data you are accessing are not contiguous in physical memory (of course it doesn’t matter if it is contiguous in virtual memory). This kind of remind me of the old days of HDD and how the disk needle physically travels across the disk to read one file which prompted the need of “defragmentation” , although RAM access (and SSD NAND for that matter) isn’t as bad.
    Moreover, virtual memory introduces internal fragmentation because of the use of fixed-size blocks (called pages and often 4kib in size), and those are mapped to frames in physical memory.
    So if you want to allocate a 32bit integer (4 bytes) you get a 4 kib worth of memory, leaving a whopping 4092 allocated for the process but unused, which cannot be used by the OS. These little pockets of memory can add up as many processes. Another reason developers should take care when allocating memory for efficiency.
    0:00 Intro
    2:40 Memory Allocation
    4:10 External fragmentation
    9:00 Internal Fragmentation
    11:30 Virtual Memory & Swap
    15:20 Page Size
    18:00 The genius of memcached
    19:50 How CPU reads data from RAM
    25:00 The cost of fragmentation in memory
    29:00 MySQL 8.x regression
    34:00 Summary
    Discovering Backend Bottlenecks: Unlocking Peak Performance
    performance.husseinnasser.com
    Resources
    bugs.mysql.com/bug.php?id=93734
    / 1730573712313741545
    pclt.sites.yale.edu/memory-an...
    Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon)
    backend.husseinnasser.com
    Fundamentals of Networking for Effective Backends udemy course (link redirects to udemy with coupon)
    network.husseinnasser.com
    Fundamentals of Database Engineering udemy course (link redirects to udemy with coupon)
    database.husseinnasser.com
    Follow me on Medium
    / membership
    Introduction to NGINX (link redirects to udemy with coupon)
    nginx.husseinnasser.com
    Python on the Backend (link redirects to udemy with coupon)
    python.husseinnasser.com
    Become a Member on CZcams
    / @hnasr
    Buy me a coffee if you liked this
    www.buymeacoffee.com/hnasr
    Arabic Software Engineering Channel
    / @husseinnasser
    🔥 Members Only Content
    • Members-only videos
    🏭 Backend Engineering Videos in Order
    backend.husseinnasser.com
    💾 Database Engineering Videos
    • Database Engineering
    🎙️Listen to the Backend Engineering Podcast
    husseinnasser.com/podcast
    Gears and tools used on the Channel (affiliates)
    🖼️ Slides and Thumbnail Design
    Canva
    partner.canva.com/c/2766475/6...
    Stay Awesome,
    Hussein
  • Věda a technologie

Komentáře • 14

  • @hnasr
    @hnasr  Před 4 měsíci +1

    backend.win

  • @jlejlahabib595
    @jlejlahabib595 Před 4 měsíci +2

    Fantastic Course as always .. I'm waiting impatiently the OS course

  • @engineeranonymous
    @engineeranonymous Před 4 měsíci +4

    NUMA can kill your performance without you noticing it. If you are really serious about memory performance you either go HBM or design it yourself with FPGA like they do it in Military, Telco applications.

  • @user-wn5td2zb7o
    @user-wn5td2zb7o Před 3 měsíci

    Amazing topic Hussein. Thanks for sharing the knowledge.

  • @pshaddel
    @pshaddel Před 4 měsíci +1

    Thanks for the fantastic video! Is there a way to experiment this stuff in our RAM? I mean, creating a fragmented memory or continuous and see the difference?

  • @105_saswata
    @105_saswata Před 4 měsíci +2

    Question @2:00 timestamp. No, there is no additional cost. That's the magic of virtual memory (and paging)! The processes assume contiguous memory chunk is available but internally OS maps different pages to different physical addresses.

  • @sameerakhatoon9508
    @sameerakhatoon9508 Před 4 měsíci +4

    a new course on os coming soon?🎉🎉

  • @mr.wwhite
    @mr.wwhite Před 4 měsíci

    @hussein can you make video on Oracle Exadata

  • @jks234
    @jks234 Před 4 měsíci +1

    Hello, I’d like to confirm that the referenced TLD is actually TLS (Thread-local storage)?
    I looked up TLD and I got Top-Level Domain in domain name structure.
    Thanks for your videos.

    • @d3crypted
      @d3crypted Před 3 měsíci +1

      Im pretty sure he talked about TLB (Translation lookaside buffer)

  • @sebastiansydow7505
    @sebastiansydow7505 Před 4 měsíci

    I may have an answer to your question on whether physical memory fragmentation causes a performance penalty, but I'm not an expert by any means on memory or backend applications, so expect potential mistakes in this calculation.
    You've said yourself that a memory page is typically 4KiB in size, so the MMU can only cause physical memory fragmentation at 4KiB borders. Which implies that within a single page the physical memory must be contiguous. The wikipedia article "Synchronous dynamic random-access memory" states:
    > For reference, a row of a 1 Gbit[6] DDR3 device is 2,048 bits wide [...]
    Assuming your system is equipped with that particular 1Gbit DDR3 memory module, to read an entire 4KiB page you would already need to activate 2 physical rows in your memory. And to read the next page you would need to activate a different row anyway, so physical fragmentation should not matter at all. What I'm unsure of is how to extrapolate this example onto DDR5, as with its introduction we switched from a single 64-bit memory channel per DIMM to two 32-bit memory channels, and a prefetch (= row) buffer size of 8n for DDR3 to 8n or 16n, whatever that n means I am unsure of. If modern DDR5 chips do have row size larger than 4KiB, as you stated in the order of 16Kib, we could now be running into that particular problem, but I've had significant difficulties getting any information on row sizes.
    However, I could imagine physical memory fragmentation playing a role in which bank your data will be located. From what I've understood, each bank group may operate independently of each other on activate, read and prefetch commands, but share the same data and command bus. So if physical memory fragments in just the right way, most of your data could end up on the same bank, significantly degrading performance as bank switches would need to be serialized instead of being able to operate in parallel.

  • @jelliedfish6845
    @jelliedfish6845 Před 4 měsíci +1

    Doesn't memcached just use a memory pool? What's so genius about it? That's such a common thing

  • @roastyou666
    @roastyou666 Před 3 měsíci

    I literally ran into this issue (causing OOM) on my ESP8266 😢 and I had to debug for an hour

  • @deepanshusharma1619
    @deepanshusharma1619 Před 4 měsíci

    Love you bro