Memory, Cache Locality, and why Arrays are Fast (Data Structures and Optimization)

Sdílet
Vložit
  • čas přidán 22. 07. 2024
  • Why is the first loop 10x faster than the second, despite doing the exact same work?
    Follow me on:
    Twitter: / iced_coffee_dev
    Github: github.com/simondevyoutube/
    In this video we talk a bit about memory, the cpu caches (specifically the L1/L2/L3 cache), hardware prefetching, and how all this comes together for arrays. We'll be covering each of these topics in enough detail for you to get a solid understanding, working through real world examples to help illustrate the point. We'll talk about the most fundamental data structure, arrays, how they work, what situations they're great in, and when they suck. We'll also touch on some algorithmic complexity. Finally, we'll be talking about why understanding this is important and how this leads in to more advanced topics and data structures.
    What's covered:
    * How memory allocation works, memory addresses
    * Contiguous memory
    * CPU Caches, L1/L2/L3 cache
    * Hardware prefetching
    * Array operations, what's fast and what isn't
    * Closing thoughts, why this is important to understand, how this relates to more advanced data structures
  • Věda a technologie

Komentáře • 383

  • @simondev758
    @simondev758  Před 3 lety +84

    If you enjoyed this, help support me: www.patreon.com/simondevyt
    As for the question at the beginning, as a first hint, look at how the indices for the arrays are generated in both loops. Write out a few, and think about the pattern in memory. Actual answer below.
    Both loops touch all memory exactly once.
    First loop generates indices sequentially (ie. 0, 1, 2, 3, 4, ...), meaning memory accesses will also be sequential.
    Second loop uses a large stride (0, 4000, 8000, 12000, ...), so cache misses will be common.

    • @simondev758
      @simondev758  Před 3 lety +1

      @@heinzerbrew It's the indexing, just unroll the loop a few times to confirm. Sequential, predictable memory access vs jumping around, main topic of video. Php? Don't know much about it, other than it's an interpreted language. Guessing the overhead of that drowns out other things.

    • @simondev758
      @simondev758  Před 3 lety +1

      @@heinzerbrew Again, you can't use php to test this. And this has nothing to do with memory allocations. Additionally, please don't use the pinned comment for this.

    • @hww3136
      @hww3136 Před 3 lety +1

      would a modern c/c++ compiler be smart enough to detect and optimize it?

    • @simondev758
      @simondev758  Před 3 lety +10

      This is a pretty contrived example. I tried to think of the absolute, most basic thing that would illustrate the point without needing to build a lot of extra boilerplate code around it. But in this example, using "gcc -O3" doesn't optimize it, but gcc -Ofast does, because it disables strict standards compliance. So it does sorta catch it, but is held back because you have to explicitly tell the compiler that you don't care much about the order. Remember that floating point operations are NOT associative, so it tends to be the compiler reordering them without explicit permission from the programmer is a no-no. V8 doesn't seem to catch it, but can't say for sure why.
      So the difference in timings between the loops should still largely come down the difference in memory access patterns.
      We could make the setup a little more complex with some structs with useless info to pad them out, or an array vs linked list, or something to that effect. Maybe it would have been a better choice? Would be more legit, but harder to parse for beginners. I dunno, I'm learning how to present this as I go heh.

    • @Victor_Marius
      @Victor_Marius Před 2 lety

      @@hww3136 what if you intend to iterate that way? You'll get a bug

  • @WojtekKozowski
    @WojtekKozowski Před 3 lety +484

    “why the code on the top runs faster? watch the video and find out” ...and no mention of the code again. Otherwise nice video.

    • @simondev758
      @simondev758  Před 3 lety +219

      Hah yeah, I feel kinda stupid for not looping around, live and learn. I'll continue the series and make sure I actually answer it next time!

    • @metagen77
      @metagen77 Před 3 lety +22

      @@simondev758 Nah you managed to explain without relying on code. If anything cut the intro.

    • @bonehelm
      @bonehelm Před 3 lety +85

      One is accessing the array in row major order, and the other is accessing it in column major order. The top one is faster because the memory is being accessed in a contiguous way, which means you'll get a lot of cache hits (very fast, going to main memory much less). The bottom is is NOT accessing memory in a contagious way, meaning every access is a cache MISS. en.wikipedia.org/wiki/Row-_and_column-major_order

    • @tagg1080
      @tagg1080 Před 3 lety +10

      @@bonehelm maybe I am blind but the only thing he switched was the + and *

    • @bonehelm
      @bonehelm Před 3 lety +27

      @@tagg1080 That's exactly the only change. He's using a formula to treat a 1D array as if it were a 2D array. But the order of + and * determines whether it's accessed row by row, or column by column.

  • @User-ty2ml
    @User-ty2ml Před 2 lety +21

    Best Director Award goes to Simon, i have never seen such a beautiful explanation without answering question, amazing!!!!

  • @xeridea
    @xeridea Před rokem +48

    You didn't explain why the original code posted was 10x slower. The reason being that the method in which the array was effectively traversed was not sequential or easily predictable, causing a lot of cache misses.

  • @sukkrad7797
    @sukkrad7797 Před 3 lety +16

    This cleared up so many things that my professor failed to teach me (or even understand it himself)
    We truly need more videos explaining how the cpu works when we code
    Thank you so much!

    • @simondev758
      @simondev758  Před 3 lety +2

      Nice, lemme know if there's other subjects I should cover too.

  • @Chadderbox
    @Chadderbox Před 3 lety +195

    I feel like this is going to be an awesome series, especially for people like me who learnt programming by themselves and never really had a formal education teaching them about what parts of your code does under the hood.

    • @simondev758
      @simondev758  Před 3 lety +93

      I did study computer science, but none of this was covered. This is mostly from years of experience learning at companies and applying these optimizations.

    • @carterearles9528
      @carterearles9528 Před 3 lety +1

      Me too!

    • @Ainigma
      @Ainigma Před 3 lety +8

      @@simondev758 i share this experience. in computer science, we also never covered that. only when i had to do projects and facing performance optimization problems, i was motivated to tackle the fundamentals of data structures and how to squeeze more and more efficiancy out of my code.

    • @barddz4646
      @barddz4646 Před 3 lety +6

      @@Ainigma wait really? I’m at the first semester of computer engeneering and i’ve alterady learned about this, Idk if the way they teach is different here in Brazil or not

    • @simondev758
      @simondev758  Před 3 lety +18

      Different universities teach different curricula. I went to a university that's considered pretty good (ie. often included in top lists), the focus was much more on theory and mathematics. I mentored a few Stanford grads while I was at Google, came off with the same impression. Super strong bases in theory/math.
      Also, computer engineering != computer science. I can totally believe that an engineering oriented curricula would learn way more about the underlying hardware.

  • @MrLazini
    @MrLazini Před rokem +11

    This channel is pure gold man. Although I already know most of these concepts, you explain them pretty clearly and in a very understandable way ! Well done friend, subbed!

  • @AllAboutCode
    @AllAboutCode Před 3 lety +34

    4:20 you ain't cheap bruh, your content is way better than expensive internet courses.

    • @simondev758
      @simondev758  Před 3 lety +3

      ty! If you have suggestions for future vids, let me know.

    • @AllAboutCode
      @AllAboutCode Před 3 lety +3

      @@simondev758I'd love this series just keep doing it

    • @valencefootball9740
      @valencefootball9740 Před 3 lety +1

      hahaha weed number

    • @masterofdizzzaster
      @masterofdizzzaster Před 3 lety +1

      @@simondev758 I hope you will grow so much that you will be able to afford one of those macs without feeling its price, i watch your videos for quite some time now and you deserve it.

  • @kosnk
    @kosnk Před 3 lety +1

    An amazing series start, Simon! Thank you for this dive into memory allocation/access details, very helpful! Surely looking forward to a (contiguous?) continuation and your other quick projects & cool tips! Very kind of you to put in so much work and share it. Again, thanks!

  • @Rssks
    @Rssks Před 3 lety +117

    Wow this video is almost 10 minutes but it felt like 1 minute! I really enjoyed it (Y) will definitely rewatch this later today!

  • @vitalino1981
    @vitalino1981 Před 3 lety +73

    "let's head to apple store. I don't actually own one, because they are super expensive and I'm super cheap, but I like to look at them" 😂🤣😆
    Man this was soooo relatable on so many levels 😂

  • @cuteswan
    @cuteswan Před rokem +2

    I dutifully paused the video at the beginning and took _way_ too long to spot the differences between the code samples. (I've never been much of a programmer.) After getting it I still learned a lot more from your discussion on the caches and all that. Thanks.

  • @tonnose
    @tonnose Před 3 lety +1

    I don't remember I watched this video before until I saw the L1/2/3 Cache example. The drawing is making the video so interesting. Now I come back for the new videos after finishing my algorithm final exam!

  • @osambrojevanisam3575
    @osambrojevanisam3575 Před 3 lety +18

    This has got to be the best educational video on youtube, fuckimg amazing

  • @eddiemuller3157
    @eddiemuller3157 Před 3 lety +6

    Good Stuff, my man! The explanation is very clear and the dry humor is appreciated. I went into this video not knowing what a contiguous array was and now I have an idea of what it is.

    • @eddiemuller3157
      @eddiemuller3157 Před 3 lety

      Question; I understand that the L1, L2, and L3 caches reside on the CPU itself. Is there a typically a controller on the CPU like a northbridge? Or is it the CPU itself that access the caches hence the low latency; no middle man.

    • @simondev758
      @simondev758  Před 3 lety +1

      That's awesome! That's right, caches typically reside on the CPU. They're also built with a different kind of memory (SRAM), faster. Northbridge is the interface between CPU & main memory and resides on motherboard. That's my understanding anyway.

  • @Dionny
    @Dionny Před 3 lety +48

    But why is the code example from the very start of the video faster than the other example? I feel like that wasn't covered or I missed the point.

    • @simondev758
      @simondev758  Před 3 lety +21

      Youness covered it well, but I left it open as sort of an "exercise for the viewer". Kinda wanted the video to be in depth enough that you could answer it yourself.

    • @mike.emery.
      @mike.emery. Před 3 lety +42

      @@simondev758 Leaving as an exercise to the viewer is fine, but as a suggestion for the format it would be good to circle back to the question at the end of the video and call it out. The way it's structured now it seems like the question is just forgotten about.

    • @simondev758
      @simondev758  Před 3 lety +34

      Great feedback Mike, 100% agree! Tried it out but next time I'll make sure to close the circle.

    • @bonehelm
      @bonehelm Před 3 lety +2

      One is accessing the array in row major order, and the other is accessing it in column major order. The top one is faster because the memory is being accessed in a contiguous way, which means you'll get a lot of cache hits (very fast, going to main memory much less). The bottom is is NOT accessing memory in a contagious way, meaning every access is a cache MISS. en.wikipedia.org/wiki/Row-_and_column-major_order

    • @TiDuZzO
      @TiDuZzO Před 3 lety

      @@bonehelm I still don't get it, both FOR starts looping on x and then y. So both should have the same speed. The only difference here is that on the first one you do the multiplication before the addition... And on the second one the other way around... And it doesn't seems to be a continuous problem here....

  • @kik8bg
    @kik8bg Před 3 lety

    A simple and interesting explation of arrays, I came here to watch this video just to make sure I knew exactly how everything works. Having hardware classes with programming coursesreally, does help out a lot to understand computers, both on hardware and software level.

  • @katarinazivkovic3102
    @katarinazivkovic3102 Před 3 lety +4

    This is explained so clearly, I'm now going to binge watch your videos 😃 Thanks to YT for recommending me your channel. Subscribed! 🌟

  • @Greatbubba747
    @Greatbubba747 Před 3 lety

    This is a very great high-level view of arrays and CPU caching! I wish I had this video as an introduction when I was taking my Computer Systems class in college!

  • @thegibusguy4969
    @thegibusguy4969 Před rokem +2

    Hey, thanks for giving me a good explanation on what cache hits/misses were. I've seen it being tossed around but never knew what it meant until now.

  • @nathanp6928
    @nathanp6928 Před 3 lety +3

    I have a final tomorrow over this stuff and this video just happened to pop into my recommended. Probably cause I was googling virtual memory, caching, etc. Super informational video, and I was able to study and be entertained at the same time. I’m a fan 👍🏼

  • @miteshkumar5557
    @miteshkumar5557 Před 3 lety +24

    Arrays are also slow when you have multiple threads, each bound to its own CPU, trying to access a global array. Suppose you have an array of size 4 int32_t, and 4 threads labeled 0 to i. The i th thread accesses the i th index in the array and modifies it repeatedly. This will cause major slowdowns as each modification will mark the cache of the other 3 threads/CPUs as modified, and each thread will have to do extra work to update their cache, even though the value that the thread cares about isn't being modified. This essentially makes the parallel code sequential. A linked list would be much better to use here since the memory locations will not be adjacent to each other, or each thread can modify a local variable and only write to the array at the end.

    • @xeridea
      @xeridea Před rokem +5

      I have code that uses multiple threads to generate large arrays using perlin noise, and get near perfect scaling with threads. If you were doing something that multiple threads had tons of modifications, the L3 cache would still be valid, as that is shared between cores, so it wouldn't be that big of an issue. The bigger issue would be multiple threads constantly updating the same data causing race conditions, or bizarre behavior due to how memory reads and writes work.

  • @danishfayaz8077
    @danishfayaz8077 Před 3 lety +1

    Absolutely loved it. Just the right way to explain a concept top to bottom. Keep it up

  • @aufkeinsten7883
    @aufkeinsten7883 Před 3 lety +2

    you are made for creating videos like these, I could watch this for hours, thanks so much for the content

  • @pranjalagnihotri6072
    @pranjalagnihotri6072 Před 2 lety +1

    I love how he goes into depth of things, please continue your great efforts sir 🔥

  • @EpsilonKnight7
    @EpsilonKnight7 Před 5 měsíci +1

    Posting this for anyone else out there like me.
    As someone who knows JUST enough code to be dangerous, I feel dense for not getting it sooner.
    My brain absolutely looked at the two for loops, saw them moving through the same numbers, and even though I recognized there was a difference in the last line, my thought process collapsed to "alright so you move through all the same array positions eventually," and I spent ten minutes trying to figure out, on my own, why changing the order of operations would make it faster while ignoring, y'know, the actual result of iteratively running the loops.
    The first comment I saw mentioned row vs column major ordering, and even then, I thought "that's silly, it's a 1D array!" and failed to make the connection.
    Took me seeing someone list the resulting indices in order to finally click. Despite so much of video focusing on contiguousness.
    So for anyone else that did the same, we're a special kind of stupid, but at least not a unique kind.

    • @simondev758
      @simondev758  Před 5 měsíci

      It's hard, which is why optimization is often a specialty.

  • @pinekel8987
    @pinekel8987 Před rokem +1

    Thank you so much for creating this series, this is what ( even now ) I think more programmers ( and sometimes myself ) should pay attention to.

  • @sva3338
    @sva3338 Před 3 lety +3

    Holy f. This is so good, youtube finally recommend me something that worth my time. Please, release more episode!!!

    • @simondev758
      @simondev758  Před 3 lety

      Second video in the series is the linked list one.

  • @mahmoudhammmad8089
    @mahmoudhammmad8089 Před 3 lety +6

    AMAZING!!!
    Waiting to watch the rest of this series

  • @boxicool
    @boxicool Před 3 lety +2

    Man. Your channel (even i don't use js so much) is freaking awesome!! I would buy your course or book without hesitation! Thank you!

    • @simondev758
      @simondev758  Před 3 lety

      No courses at the moment, I just put it all online for free :)

  • @honey661
    @honey661 Před 3 lety +2

    Even if I know this topic, your video was so interesting that time has passed unnoticed and I watched whole video. Wow

  • @pist5343
    @pist5343 Před 3 lety +15

    Awesome! Looking forward to this series :)
    It's really not easy teaching about memory and data structures without the audience falling asleep, but you've done a great job!
    Unfortunately JS arrays are messy and not exactly contiguous... :(

    • @simondev758
      @simondev758  Před 3 lety +7

      Hah yeah, I actually had a small section on JS arrays in there just to clarify about them, but cut it due to time (and laziness).
      You're totally right, they're non-contiguous "array-like" objects. The TypedArray is closer to what you'd get in C++ or something.

  • @mahima7638
    @mahima7638 Před 2 lety +1

    Loved this video! Will enjoy watching the whole series definitely.

  • @als5321
    @als5321 Před 3 lety +1

    This was amazing. THANK YOU. More like this for Javascript. Keep it coming and don't stop!

  • @mehmedkumali1135
    @mehmedkumali1135 Před 3 lety +1

    Very helpful video, you explained everything clearly.. Reminded me of my university teacher back in the day. Thank you for the upload!

    • @simondev758
      @simondev758  Před 3 lety

      Awesome, if you have suggestions on topics to cover too, let me know

  • @bakerfx4968
    @bakerfx4968 Před 3 lety +2

    I wasn’t ready for it to be over!

  • @shaurya9055
    @shaurya9055 Před 3 lety +3

    If you'd have been my tuition teacher I'd have been a topper in my class
    Your way of explaining is so cool man👌
    I'm inspired

  • @pauljordan3064
    @pauljordan3064 Před 3 lety +1

    Great explanation. Clear and concise. Thanks!

  • @legendary9145
    @legendary9145 Před 3 lety +1

    your videos deserve a way bigger audience

  • @RobertAlexanderParraCiro
    @RobertAlexanderParraCiro Před 3 lety +1

    Problably the best vide have ever watched about how arrays works

  • @tomi.mocnik
    @tomi.mocnik Před 3 lety +2

    Got it as a recommendation, instantly subscribed.

  • @JNunoCoast
    @JNunoCoast Před 3 lety +1

    Already knew these concepts but still an amazing video, definitely helpful to cs students!

  • @nissieln
    @nissieln Před 3 lety +1

    Superb series! thank you for the content!

  • @BipinOli90
    @BipinOli90 Před 3 lety +2

    Loving this series man. Please keep going. Would love to see more of real world optimization like you did previously. But yeah fundamentals need to be cleared up first.
    Btw I remember there was one cpu level hack which exploited this eager loading behavior in cpu. Some private part of memory was located by measuring the access speed, if the access was faster that meant the memory location just after current location was already cached by the cpu even though later on it was restricted to actually use due to other privacy constraints.

    • @simondev758
      @simondev758  Před 3 lety +1

      More real world optimization? Sure, I can probably swing that.
      Sounds a lot like Meltdown: en.wikipedia.org/wiki/Meltdown_(security_vulnerability)

    • @BipinOli90
      @BipinOli90 Před 3 lety

      @@simondev758 ah yeah, its Meltdown.
      Thanks man. Love your videos.

  • @djflugel79
    @djflugel79 Před 3 lety +3

    8:24 the animation made me smile

  • @yourfutureself4327
    @yourfutureself4327 Před rokem +1

    6:02 snipe'ns a good job mate

  • @lilwage6910
    @lilwage6910 Před 5 měsíci

    How does the processor look up data in the L-caches? I thought it was looking not for specific data, but for whatever data is in a specific data address. Do the L-caches keep the data alongside the address of that data in memory, where it came from?

  • @closery
    @closery Před 3 lety +1

    Sir, your channel is a gem! thx for these videos

  • @shivankchopra8552
    @shivankchopra8552 Před 2 lety

    Thanks for sharing this amazing tutorial!

  • @piyushjaware
    @piyushjaware Před 2 lety

    Hi Man. Very nice explanation. I was curious how you made the animations in your video.

  • @ezequielgarrido51
    @ezequielgarrido51 Před 3 lety +1

    Great series!

  • @stLmpp
    @stLmpp Před 3 lety

    Your videos are amazing! Thank you for this content

  • @rewrittenbytes1616
    @rewrittenbytes1616 Před 3 lety +6

    Arrays are fast, except when they aren’t ;)
    Great video!

  • @afsarzan
    @afsarzan Před 3 lety

    love your explanation and examples..

  • @gendalfgray7889
    @gendalfgray7889 Před 2 měsíci

    For cache boost program need to designed in certain way? Using arrays when possible automatically boost performance?

  • @Chapali9a
    @Chapali9a Před 3 lety +1

    Learned a lot. Thank you

  • @_DD_15
    @_DD_15 Před 3 lety

    Nice video, thanks for that!

  • @gligoradrian784
    @gligoradrian784 Před 3 lety +1

    Took me a good minute to find the difference between the 2 codes.

  • @yac7571
    @yac7571 Před rokem +1

    Great job, I like your style. Easy like and sub 💯

  • @codeit9502
    @codeit9502 Před 3 lety

    May you tell me which recorder and editing software you use, Because mine are completely bonkers.

  • @xicheung7867
    @xicheung7867 Před 3 lety

    Your content PLUS the video production = 🙌 SUBSCRIBED

  • @binaryburnout3d
    @binaryburnout3d Před rokem +1

    Here king, you dropped this. . .

  • @amimulahsan1289
    @amimulahsan1289 Před 3 lety +6

    Wow, man. I mean literally you just made it simple; not like other people who go through so much technical stuff and use buzz words.

    • @simondev758
      @simondev758  Před 3 lety +3

      I mentored a junior guy recently who did that, spoke entirely in jargon and buzzwords. I always wondered if he specifically practiced doing that.

  • @HoangNguyen-nz4xe
    @HoangNguyen-nz4xe Před 5 měsíci

    I'm kinda new to c/c++, please tell me why top code is faser than the bottom code :'(

  • @AviPars
    @AviPars Před rokem +1

    Nice video ! Thanks

  • @inconito7281
    @inconito7281 Před 3 lety +1

    This video was amazing, you have my subscription

  • @raimanish4843
    @raimanish4843 Před 3 lety +2

    this was very helpful. thank you

  • @jonasls
    @jonasls Před 3 lety +1

    I want more!! Can't wait

  • @mfam98
    @mfam98 Před 2 lety +1

    Had a chair on advanced computer architecture, this covered some weeks on a 10 min video. Wish I had this kind of resources back then. Also had to make a simulator with several caching sizes and strategies, good times 😂

    • @simondev758
      @simondev758  Před 2 lety +1

      Ooh that simulator sounds kinda fun to play with.

  • @HersonBagay
    @HersonBagay Před 3 lety +5

    My only formal training in programming is an elective course in C in college. I've loved it so much I end up writing device drivers for baremetal code running on top of ARM Cortex M processors as a hobby. There is no substitute for learning the fundamentals. It pains me to see some universities starting Computer Science students with a course in Java...

    • @simondev758
      @simondev758  Před 3 lety +1

      My first exposure to programming was Java in my intro to computer science class heh :P

    • @HersonBagay
      @HersonBagay Před 3 lety +1

      Some people are just naturally good at logic and programming. We can count ourselves in that group

  • @calebkrauter4027
    @calebkrauter4027 Před 9 měsíci

    Great video!

  • @IgorSadovskii
    @IgorSadovskii Před 3 lety +1

    Omg is very useful, thanks a lot !

  • @agsystems8220
    @agsystems8220 Před rokem +1

    Great video. You may have forgotten to revisit and explain in detail the code at the start though...

    • @simondev758
      @simondev758  Před rokem

      Yes... yes I did heh. Answer is pinned.

  • @IrwinRodriguez
    @IrwinRodriguez Před 3 lety +1

    excellent explanation! new sub!

  • @bwanamaina
    @bwanamaina Před 3 lety +1

    Well explained 👏👌👍

  • @giorgibatsiashvili4270
    @giorgibatsiashvili4270 Před 3 lety +1

    i found this channel and its briliant

  • @almazmusic
    @almazmusic Před 3 lety +1

    Okey, you got me :) awesome videos!

  • @Nostrum84
    @Nostrum84 Před 6 měsíci

    Never even saw the difference in those highlighted for loops in the beginning, until I saw the answer :(

  • @NamLe-to2db
    @NamLe-to2db Před 3 lety +1

    I love this. Thank you for sharing.

  • @Otakutaru
    @Otakutaru Před 2 lety +3

    why does the code on the top run 10 times faster? I thought you were going to answer it in this video. By looking at it I can deduce that it has to do with the nested loop. The first code goes in contiguous 4.000 data points, then moves onto another 4000 batch, which should be already cached. The second skips between batches, grabbing the first from all of them, then the second, etc. I don't see why it wouldn't be prefetched if it's repeteable.

    • @clearsky4049
      @clearsky4049 Před rokem

      Yup, the bottom code block is leapfrogging by 4000 positions in every iteration of the nested loop but the top code block accesses the array sequentially. Its basically the difference between grabbing one sandwich from the plate and eating it completely before moving onto the next VS taking a bite from a different sandwich every time until your plate is clean.

  • @perelmanych
    @perelmanych Před 3 lety +1

    For all those of you who are wondering why he accessed 1d array like if it is 2d, here is the real world example. Assume that in C++ you write a class that represents a matrix. You can go with 1d array or something like vector< vector< int > >, which you can think of as being 2d array. So why the hack then to use 1d array. There are many pros of using this approach. One of them is much faster work of constructor and destructor. Assume that you need to allocate memory for NxM matrix m. With 1d array it is one call of malloc (which asks the system at allocate a chunk of memory for you). With 2d array you should call malloc N times, once for each row. The same logic applies when you want to free memory. So now once we chose to represent our matrix as 1d array, assume that the user of our class wants to sum all of its elements. The user has no idea how did we choose to represent the matrix under the hood, he only knows that he can access an element of the matrix by using operator() as m(x, y). Your operator() in turn will have to calculate the position of an element by evaluating y * M + x and in the essence this will be exectly the same code as he showed in the video.

    • @simondev758
      @simondev758  Před 3 lety +1

      Awesome explanation, exactly what I had in mind when making the example.

  • @meto4545
    @meto4545 Před rokem

    Great video! But I still do not understand the Intro example, is the array in the bottom a dynamic array or just the top one a hash table?

  • @DARKCOP2011
    @DARKCOP2011 Před 3 lety

    This is super nice!

  • @hardwired89
    @hardwired89 Před 3 lety +2

    more js content sir love it❤️ thank you

  • @obsoletepowercorrupts
    @obsoletepowercorrupts Před 3 lety

    There are pros and cons to this worth noting. It can be a valid form of redundancy to have an array as a 'look-up' to speed up data coming from a query in a relational database (rather than hitting the query over and over). That helps performance sometimes. However, the cache vulnerabilities can be exploited. An array does not have the same benefits of a linked-list in terms of being able to specify pointers for pass by address versus pass by reference. Also though, in some languages a linked list can be a problem as it needs objects, and that means garbage collection (Java) can happen at anytime, thereby making performance hard to predict and sometimes rather stuttering.

  • @user-gk4zw9sf4r
    @user-gk4zw9sf4r Před 3 lety

    So please clarify this for me: the code at the top is faster because every single loop addressing in the array is contiguous, so it can use the CPU's prefetch and hit the L1 L2 L3 cache? But the code at the bottom, since the index is jumping, so CPU can't prefetch, it has to access memory every time?

  • @geekazodium
    @geekazodium Před 6 měsíci +1

    if you aren't constrained by the requirement of the array must keep it's ordering, you could also just take the value then swap in the last item in the array when deleting right?

  • @prodsefv
    @prodsefv Před rokem +1

    thanks simon

  • @ahn_buguei
    @ahn_buguei Před 3 lety +1

    Thanks! I couldn't understand cache misses or contiguous arrays before your video. It felt like 3Blue1Brown of Computer Science

    • @ahn_buguei
      @ahn_buguei Před 3 lety +2

      I just applied this on my internship and I solved a serious bottleneck!!! Thanks Simon

    • @simondev758
      @simondev758  Před 3 lety

      That's really cool! What kind of problem were you solving?

    • @ahn_buguei
      @ahn_buguei Před 3 lety +1

      I optimized some existent numerical C++ code (for signal processing) from 420ms per iteration to 45ms, but the goal was 30ms per iteration. There is a giant nested for loop that was taking 44ms to run. So I applied the change specified in the video to reduce cache misses. Now it's taking 22ms. Maybe I could further optimize it by writing a special data structure (probably just a linked list), but that's good enough for the manager by now and he has other priorities for me.

  • @trieutruong2410
    @trieutruong2410 Před 3 lety +1

    Great video. I’m your fan for now

  • @jarvispact
    @jarvispact Před 3 lety +1

    thank you simon! does this cpu caching also apply to javascript? i created a testscript in node and only get a 2x performance improvement. i guess this only applies to compiled / strongly typed languages? i would really appreciate a answer.

    • @melkileo
      @melkileo Před 3 lety

      It also appear in Javascript since js is also using the cpu and the arrays are mostly contiguous in js too.
      The performance boost can differ based on your cpu and what you do

    • @simondev758
      @simondev758  Před 3 lety +1

      Yeah it applies to JS too, that code snippet I had at the beginning was in JS :)
      To add to leomelki's response, which was great and covered this, remember that JavaScript arrays are more like "array-like" objects, see the docs: developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array
      They're sparse and potentially non-contiguous, so yes these optimizations will apply but they'll also be partially cancelled out by just overall noise from JavaScript. If you use a TypedArray, you're getting much closer to what a C++ array is, and that's actually what I used in the example.

    • @jarvispact
      @jarvispact Před 3 lety +1

      @@simondev758 thank you for answering my question. i know that js also uses the cpu and has arrays :) but since js is a interpreted language and not a compiled one, i was thinking that the js engine may does some things in the background, that could get in the way of efficient cpu caching. im pretty solid in js but i have no idea what the js engine does in the background. i tried using a normal array and not a typed one, that maybe was the reason i couldnt get the 10x improvement. anyway thanks a lot for all your great content.

    • @simondev758
      @simondev758  Před 3 lety +1

      Np, also keep in mind it'll be super dependent on what cpu you have, etc. You'll get a win with perfect caching vs all cache misses, the magnitude will vary from cpu to cpu.

    • @jarvispact
      @jarvispact Před 3 lety +1

      for all wo are interested in this. i did some more tests and could also get a 10x improvement on performance. my tests used a more real world example for my ecs system ( array of structs/objects instead of numbers) and you can see very well how the performance changes simply by adding / removing properties. i can now add some performance improvements to my ecs system just by watching your video ☺️

  • @apoorvaranjan9921
    @apoorvaranjan9921 Před 3 lety +1

    This blew my mind

  • @FullThrottleTaco
    @FullThrottleTaco Před rokem +1

    This video kicks ass.

  • @irfanukani
    @irfanukani Před 3 lety

    Awesome content ❤❤

  • @tuckerbeauchamp8192
    @tuckerbeauchamp8192 Před 3 lety

    Your videos are awesome

  • @nkululekomthimkulu1248

    I just subbed. I will be checking out your videos.

  • @abdelrahman5094
    @abdelrahman5094 Před 3 lety +1

    Great explanation, thanks for your effort
    I hope to see future videos on how javascript garbage collection work behind the scenes

    • @simondev758
      @simondev758  Před 3 lety +1

      Ooh that'd be an interesting subject.

    • @BipinOli90
      @BipinOli90 Před 3 lety +1

      I think it uses the reachability idea. If there is no way a part of memory can be reached it will be garbage collected.
      They didn't use reference count approach because that doesn't handle 2 pieces referencing each other but are completely isolated and not reachable.

  • @NorppaCast
    @NorppaCast Před 3 lety +1

    Great content once again. Thanks. I would like to see a three.js tutorial about creating model outlines / silhouettes. I know about outlineEffect and outlinePass but I'd like to see any other techniques (I've heard about stencil buffer but I have no idea if it's possible in three.js). My problem is that the outlineEffect is crappy for boxes and the outlinePass is too expensive (for my needs). I'm wondering if there is a cheaper method to draw an outline.

    • @simondev758
      @simondev758  Před 3 lety +1

      Great suggestion, I'll see what I can do!

  • @harshitjoshi3082
    @harshitjoshi3082 Před 3 lety +1

    Great video 🔥👍

  • @desertshadow72
    @desertshadow72 Před rokem

    Was hoping to see some architectural variations in c++ on heap and stack that optimize cache hits.
    Great vid tho.