How Memory Usage Slows Down Your Programs

Jacob Sorber

zhlédnutí 19 513

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 22. 07. 2024
Patreon ➤ / jacobsorber
Courses ➤ jacobsorber.thinkific.com
Website ➤ www.jacobsorber.com
---
How Memory Usage Slows Down Your Programs // We usually think about how much memory we're using, but not about how we're using that memory. This video shows a simple example of how changing access patterns (row-wise vs column-wise) in a matrix or two-dimensional array can significantly impact runtimes.
Related Videos:
2D Array/Matrix video: • Working with a Matrix/...
***
Welcome! I post videos that help you learn to program and become a more confident software developer. I cover beginner-to-advanced systems topics ranging from network programming, threads, processes, operating systems, embedded systems and others. My goal is to help you get under-the-hood and better understand how computers work and how you can use them to become stronger students and more capable professional developers.
About me: I'm a computer scientist, electrical engineer, researcher, and teacher. I specialize in embedded systems, mobile computing, sensor networks, and the Internet of Things. I teach systems and networking courses at Clemson University, where I also lead the PERSIST research lab.
More about me and what I do:
www.jacobsorber.com
people.cs.clemson.edu/~jsorber/
persist.cs.clemson.edu/
To Support the Channel:
+ like, subscribe, spread the word
+ contribute via Patreon --- [ / jacobsorber ]
Source code is also available to Patreon supporters. --- [jsorber-youtube-source.heroku...]

Komentáře • 68

@diconicabastion5790 Před 2 lety ⁺¹²
It can make a hell of a difference. I created a program a while back that created roughly 40,000 rooms connected together in a maze. Initially it could take hours. By the time I was done it was under 1 second. That was on a single thread.
@internetsfinest8839 Před 2 lety ⁺¹
It’s amazing how fast you can make a program. I was writing a program that will seek out a certain line, but it was extremely slow. The solution was to put the entire file into an array, and the program went from a couple minutes to a single second.
@JacobSorber Před 2 lety ⁺⁶
Yes indeed. The more you understand how the underlying layers work (and don't work), the more often you can see opportunities to improve things.
@hidden8990 Před 2 lety ⁺²⁹
I like this one a lot. It moves the direction of data oriented design. I don’t know what your experience is with that, but I think an introduction to that vs object oriented, and what use cases would make one more efficient than the other would could be a good one-off video.
@rogo7330 Před 2 lety ⁺³
I thinking about how to make DOP and OOP help each other. DOT is more about how to actually place data in memory, but it lacks readability in complex things. OOP on the other hand can be done in C with structs and pointers to real objects, but i curious how much it will cost if i will use OOP-ish structs in runtime.
@badwolf8112 Před 2 lety ⁺³
i feel like OOP doesnt necessarily stop you from optimizing. you can use OOP in many ways, and C++'s creator talks a lot about performance and reliability for systems, so i doubt using OOP makes things worse (many of C++'s abstractions have no overhead)
@sanderbos4243 Před rokem
@@badwolf8112 Using OOP doesn't stop you from optimizing, but it's just that things like for example virtual function calls can both bring the performance down and make it much harder to debug. Think of a line, where OOP is on the left and Data Oriented Design is on the right. The left is generalized to be very extensible, but can be slower and harder to debug. The right is more specialized to a specific task using knowledge of what you will *actually* use it for in practice, and can be faster and easier to debug. This principle of flexibility vs excellence also holds for a lot of other things, like AI: you either have an AI that is meh at a ton of different tasks, or have a state-of-the-art chess AI that excels at the one thing it was made for. If you try to design a real-life factory that is able to create *any* type of car, then your factory will be incredibly complicated and slow. The clue is that the more you decide to constrain what your program is capable of doing, the more performant and simpler you will be able to write it. OOP applies less constraints, which comes at several costs. This is an overgeneralization, because pretty much every programming technique has its place, but look up "object oriented programming vs data oriented design" for countless articles and videos if you want to know more.
@RobBCactive Před 8 měsíci
OOP is for long term adaptability & encapsulation, when you're solving real problems correctness is key. Requirements change with time and you may need to add features and data types later, which were not even considered when the project began without updating 100's of files.
A data oriented design works well for things like games, the data tables might even be extracted and processed from build edit tools that use objects to produce the level data that the runtime uses for high efficiency max fps with specificity not flexibility.
@reptilicusrex4748 Před 2 lety ⁺²³
This video was very informative and really gets the point across using a well-chosen simple example. Thanks for the effort.
@sparten1527 Před 2 lety ⁺⁶
I'm taking an advanced computer architecture course which deals with optimizing memory access times in the cache.
This is really interesting how it ties in with the knowledge I've learned in class!
@cernejr Před 2 lety
Helpful reminder. And good to see actual numbers to get a feel for how much the cache actually matters.
@umpoucosobreconhecimentos Před 2 lety
Thank you for this very useful explanation about how memory locality can affect program performance. Amazing explanation
@pathayes2224 Před 2 lety
As always, an excellent presentation. Caching is key to performance. I once developed with a processor which had this option switched off inadvertently,. As a result it crawled along, despite its fast clock speed. Also, file IO effects speed, because it is slow. This is a big area
@nunorodrigues5628 Před 2 lety
OOOOOOH i get it! And yes, please do a video on C++ and vectors regarding this issue, i would love to learn about it.
@emreozdemir9358 Před 2 lety ⁺²
So someone finally point out the importance of cahce with an excellent example. Thank you so much sir.
@JacobSorber Před 2 lety ⁺¹
You are welcome.
@iltrovatoremanrico Před 2 lety
Very informative, thank you!
@realdragon Před rokem
It makes sense, I would never think of that
@foadsf Před 2 lety
I don't do C for work. I just watch your videos as meditation. you are amazing! 🖖
@gardenhouseNemurin Před 2 lety
Nice one Jacob
@debanjanbarman7212 Před 2 lety ⁺⁴
Sir, please make a series on Linux kernel development.
@billowen3285 Před 2 lety ⁺⁴
I love your channel Jacob!
@raghavsrivastava2910 Před 2 lety ⁺³
Yes this professor is amazing
@JacobSorber Před 2 lety ⁺²
Thanks, Bill.
@matthewkott8863 Před 2 lety ⁺²
Very interesting. Intuitive, once (if) you think about it. By the way, Jacob, have you done a video on Big O already? That's a topic that really baffles me, as I simply have a hard time recognising the patterns that determine the most significant terms.
@JacobSorber Před 2 lety ⁺²
I haven't. I'll add it to the list and see what I can do.
@raghavsrivastava2910 Před 2 lety
Great video.
@happyTonakai Před 2 lety
Good video, number 1000 thumb up!
@farikunaziz6504 Před 2 lety ⁺¹
Can u give overview about rust language
@dwaynestgeorge2558 Před rokem
Thanks
@matteolacki4533 Před 2 lety
Just out of curiosity, how fast was it without allocating anything?
@TheVertical92 Před 2 lety
This always segfaults on my machine if my ROWS/COLS are too high.
Seems like max stack allocation makes trouble🤷‍♂
Edit: Oh i see. When i declare the array globally it works, if i declare it within main() it segfaults.
@jenidu9642 Před 2 lety
Why don't you time it with clock()? That way you can make sure the initialization is measured
@sman3424 Před 2 lety
Does spatial locality still hold with malloced memory since virtual memory could only be contiguous in the address space, but not physically?
@user-lt9oc8vf9y Před 2 lety ⁺¹
Yes it does. Your cache line is usually 64 Bytes long (could differ especially on embedded systems) meaning every time you access somewhere in memory it loads 64B into the cache. If your malloc'ed data is larger than you will have a delay each time you enter a new section that wasn't loaded into cache.
Also note that whatever you load into cache will be removed as soon as new data is coming in but this behavior depends on your cache sizes and strategy.
@dmitripogosian5084 Před rokem
Back n the 80-s, anybody who had 2 hours of C instruction, would never write the second loop. It was just an common knowledge that C is stores 2D arrays in column major order, while Fortran is row major, so you iterate in C on the second index inside the loops, and in Fortran on the first
@benjaminshinar9509 Před 2 lety
could you show the hit-rate of the two loops with cache-grind?
@JacobSorber Před 2 lety
Probably. I'll take a look.
@badwolf8112 Před 2 lety ⁺¹
(how) do we know that the faster way to access would be faster on all computers?
can you make a video about profiling, and in particular, how do we know profiling on one computers makes a difference across all computers?
@JacobSorber Před 2 lety ⁺²
Good topic idea. Thanks! In C, arrays are laid out in a specific way. So, this example is going to see pretty consistent performance across nearly all machines, since the same elements will be close to each other each time. But, yeah, there definitely are situations where performance will be more machine specific. I'll see what I can do.
@lawrencedoliveiro9104 Před 2 lety
I once took some MATLAB code written by a client, made a series of tweaks to it, and showed it running a lot faster.
I did all my testing on a Linux box. Unfortunately, the client needed to deploy it on a Windows installation. And on that machine, all the versions ran at pretty much the same speed.
∗Sigh∗ ...
@tauqeerakhtar2601 Před 2 lety
This is all about spatial locality. It is implementation based. May be the opposite will happen if implementation of memory is opposite.
@skilz8098 Před 2 lety
It's not just memory, but any kind of system resource such as file access, threads, etc that can affect your applications performance...
@nngnnadas Před 2 lety
cache invalidation
@leokiller123able Před 2 lety
So all global and static variables are stored on the cache?
@JacobSorber Před 2 lety ⁺²
Caching happens at a lower level-at the load/store level. It doesn't know anything about whether the variable is global, local, static...whatever. If your program accesses memory it be cached.
@leokiller123able Před 2 lety
@@JacobSorber so when you store memory in heap using malloc the memory also gets cached? I quite don't understand when memory is stored in cache or not
@JacobSorber Před 2 lety ⁺¹
@@leokiller123able It does when you use it.
@leokiller123able Před 2 lety
@@JacobSorber okay so then if all memory gets cached when you use it, why do people say that accessing memory on heap is slower that on stack? If it goes in cache anyway shouldn't it be the same speed resuired to access it?
@JacobSorber Před 2 lety ⁺²
@@leokiller123able This comes down to two things. First, there's the cost of allocating and freeing memory. That adds overhead. Second, the heap can get a bit fragmented with bits of data scattered all over the place. The call stack is a simple linear data structure, and all of your local data for a function call will all be colocated in its stack frame. So, you often get better locality (hence better cache performance).
@LoesserOf2Evils Před 2 lety
Would using dynamic allocation address this issue? I fear -- I mean, 'think' -- not because of stopping to allocate a new cell in twoarray every so often. I fear -- I mean, 'think' -- dynamic allocation would increase the execution time partly because of the access time.
@JacobSorber Před 2 lety ⁺¹
Allocating the space would take extra time, but once you have the memory allocated, I don't know that it would be any slower. Maybe a little if having the accesses further apart affects cache use, but I would expect them to be nearly the same (except for the extra cost of the malloc/free calls). I guess I'll have to try it out.
@LoesserOf2Evils Před 2 lety
Thank you, Dr. @@JacobSorber. In summary, execution would slow because of the allocation time for each cell but not because of memory access time, right?
@user-ph1yj9rr2p Před 2 lety
It would be slower due to additional inderection. And allocator could give you any address so locality goes out of the window. That's a serious problem in Node based data structures like trees and such
@jwbowen Před 2 lety
Now do the same thing in Fortran and see how the timings shake out
@CR3271 Před 11 měsíci
8:45 I know this wasn't the main point, but I just have to say stl vector -- at least the MS VS implementation -- is a scam. I regularly get 4x speed out of my home-spun dynamic array template. Maybe you could give the boys at MS a lesson in memory access 😂
@LinucNerd Před 2 lety
'Mike Acton' made a presentation here on youtube about Data-Oriented Design (czcams.com/video/rX0ItVEVjHc/video.html),
and he touched a bit on the mentality that "The compiler can fix it"...
Although the compiler can do a lot, there are a lot of things it can't do.
The compiler is not some magic software that knows everything, although built by intelligent people, there are limitations.
There also seems to be a growing trend nowadays against OOP and C++, and I'm not entirely sure why...
I don't know C++, it's too hard for me, but I wonder why it's become so trendy to hate on it.
@47Mortuus Před 2 lety
That's it? Matrix traversal? There's so much more to this...
Let's just hint at the "Von Neumann Bottleneck", which suggests that ultimately, any program is limited by memory bandwidth when it comes to performance.
@HimanshuSharma-sd5gk Před 2 lety
New video idea.
arbitrary precision arithmetic in c.
@irwainnornossa4605 Před 2 lety
Openning braces belong to a new line!
@embeddedbastler6406 Před 2 lety ⁺¹
As we are talking about cache locality. Wouldn't a video about data-oriented programming be very insteresting?
@JacobSorber Před 2 lety
It would.
@ohwow2074 Před 2 lety
His second for loop probably increased cache miss rate to %100 lol. That's why it was slow.
@adityavikramsinha408 Před 2 lety
likeeee uuu smmm
@HydeFromT70s Před 2 lety
I appreciate the effort but this video shows HOW to use the arrow operator and not WHEN to use it...

Další v pořadí

Automatické přehrávání

When to use the Arrow Operator in C and C++