The Cost of Memory Fragmentation

Hussein Nasser

zhlédnutí 6 587

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 7. 06. 2024
Fragmentation is a very interesting topic to me, especially when it comes to memory.
While virtually memory does solve external fragmentation (you can still allocate logically contiguous memory in non-contiguous physical memory) it does however introduce performance delays as we jump all over the physical memory to read what appears to us for example as contiguous array in virtual memory.
You see, DDR RAM consists of banks, rows and columns. Each row has around 1024 columns and each column has 64 bits which makes a row around 8kib. The cost of accessing the RAM is the cost of “opening” a row and all its columns (around 50-100 ns) once the row is opened all the columns are opened and the 8 kib is cached in the row buffer in the RAM.
The CPU can ask for an address and transfer 64 bytes at a time (called bursts) so if the CPU (or the MMU to be exact) asks for the next 64 bytes next to it, it comes at no cost because the entire row is cached in the RAM. However if the CPU sends a different address in a different row the old row must be closed and a new row should be opened taking an additional 50 ns hit. So spatial access of bytes ensures efficiency,
So fragmentation does hurt performance if the data you are accessing are not contiguous in physical memory (of course it doesn’t matter if it is contiguous in virtual memory). This kind of remind me of the old days of HDD and how the disk needle physically travels across the disk to read one file which prompted the need of “defragmentation” , although RAM access (and SSD NAND for that matter) isn’t as bad.
Moreover, virtual memory introduces internal fragmentation because of the use of fixed-size blocks (called pages and often 4kib in size), and those are mapped to frames in physical memory.
So if you want to allocate a 32bit integer (4 bytes) you get a 4 kib worth of memory, leaving a whopping 4092 allocated for the process but unused, which cannot be used by the OS. These little pockets of memory can add up as many processes. Another reason developers should take care when allocating memory for efficiency.
0:00 Intro
2:40 Memory Allocation
4:10 External fragmentation
9:00 Internal Fragmentation
11:30 Virtual Memory & Swap
15:20 Page Size
18:00 The genius of memcached
19:50 How CPU reads data from RAM
25:00 The cost of fragmentation in memory
29:00 MySQL 8.x regression
34:00 Summary
Discovering Backend Bottlenecks: Unlocking Peak Performance
performance.husseinnasser.com
Resources
bugs.mysql.com/bug.php?id=93734
/ 1730573712313741545
pclt.sites.yale.edu/memory-an...
Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon)
backend.husseinnasser.com
Fundamentals of Networking for Effective Backends udemy course (link redirects to udemy with coupon)
network.husseinnasser.com
Fundamentals of Database Engineering udemy course (link redirects to udemy with coupon)
database.husseinnasser.com
Follow me on Medium
/ membership
Introduction to NGINX (link redirects to udemy with coupon)
nginx.husseinnasser.com
Python on the Backend (link redirects to udemy with coupon)
python.husseinnasser.com
Become a Member on CZcams
/ @hnasr
Buy me a coffee if you liked this
www.buymeacoffee.com/hnasr
Arabic Software Engineering Channel
/ @husseinnasser
🔥 Members Only Content
• Members-only videos
🏭 Backend Engineering Videos in Order
backend.husseinnasser.com
💾 Database Engineering Videos
• Database Engineering
🎙️Listen to the Backend Engineering Podcast
husseinnasser.com/podcast
Gears and tools used on the Channel (affiliates)
🖼️ Slides and Thumbnail Design
Canva
partner.canva.com/c/2766475/6...
Stay Awesome,
Hussein
Věda a technologie

Komentáře • 14

@hnasr Před 4 měsíci ⁺¹
backend.win
@jlejlahabib595 Před 4 měsíci ⁺²
Fantastic Course as always .. I'm waiting impatiently the OS course
@engineeranonymous Před 4 měsíci ⁺⁴
NUMA can kill your performance without you noticing it. If you are really serious about memory performance you either go HBM or design it yourself with FPGA like they do it in Military, Telco applications.
@user-wn5td2zb7o Před 3 měsíci
Amazing topic Hussein. Thanks for sharing the knowledge.
@pshaddel Před 4 měsíci ⁺¹
Thanks for the fantastic video! Is there a way to experiment this stuff in our RAM? I mean, creating a fragmented memory or continuous and see the difference?
@105_saswata Před 4 měsíci ⁺²
Question @2:00 timestamp. No, there is no additional cost. That's the magic of virtual memory (and paging)! The processes assume contiguous memory chunk is available but internally OS maps different pages to different physical addresses.
@sameerakhatoon9508 Před 4 měsíci ⁺⁴
a new course on os coming soon?🎉🎉
@mr.wwhite Před 4 měsíci
@hussein can you make video on Oracle Exadata
@jks234 Před 4 měsíci ⁺¹
Hello, I’d like to confirm that the referenced TLD is actually TLS (Thread-local storage)?
I looked up TLD and I got Top-Level Domain in domain name structure.
Thanks for your videos.
@d3crypted Před 3 měsíci ⁺¹
Im pretty sure he talked about TLB (Translation lookaside buffer)
@sebastiansydow7505 Před 4 měsíci
I may have an answer to your question on whether physical memory fragmentation causes a performance penalty, but I'm not an expert by any means on memory or backend applications, so expect potential mistakes in this calculation.
You've said yourself that a memory page is typically 4KiB in size, so the MMU can only cause physical memory fragmentation at 4KiB borders. Which implies that within a single page the physical memory must be contiguous. The wikipedia article "Synchronous dynamic random-access memory" states:
> For reference, a row of a 1 Gbit[6] DDR3 device is 2,048 bits wide [...]
Assuming your system is equipped with that particular 1Gbit DDR3 memory module, to read an entire 4KiB page you would already need to activate 2 physical rows in your memory. And to read the next page you would need to activate a different row anyway, so physical fragmentation should not matter at all. What I'm unsure of is how to extrapolate this example onto DDR5, as with its introduction we switched from a single 64-bit memory channel per DIMM to two 32-bit memory channels, and a prefetch (= row) buffer size of 8n for DDR3 to 8n or 16n, whatever that n means I am unsure of. If modern DDR5 chips do have row size larger than 4KiB, as you stated in the order of 16Kib, we could now be running into that particular problem, but I've had significant difficulties getting any information on row sizes.
However, I could imagine physical memory fragmentation playing a role in which bank your data will be located. From what I've understood, each bank group may operate independently of each other on activate, read and prefetch commands, but share the same data and command bus. So if physical memory fragments in just the right way, most of your data could end up on the same bank, significantly degrading performance as bank switches would need to be serialized instead of being able to operate in parallel.
@jelliedfish6845 Před 4 měsíci ⁺¹
Doesn't memcached just use a memory pool? What's so genius about it? That's such a common thing
@roastyou666 Před 3 měsíci
I literally ran into this issue (causing OOM) on my ESP8266 😢 and I had to debug for an hour
@deepanshusharma1619 Před 4 měsíci
Love you bro

Další v pořadí

Automatické přehrávání

Deep dive on how static files are served with HTTP (kernel, sockets, file system, memory, zero copy)