Are Vectors Slower than Arrays?

Sdílet
Vložit
  • čas přidán 1. 11. 2021
  • Patreon ➤ / jacobsorber
    Courses ➤ jacobsorber.thinkific.com
    Website ➤ www.jacobsorber.com
    ---
    Are Vectors Slower than Arrays? // A lot of programmers assume that they are significantly slower. In this video, I test it out to see how significant the difference actually is.
    Related Videos:
    Memory locality video: • How Memory Usage Slows...
    ***
    Welcome! I post videos that help you learn to program and become a more confident software developer. I cover beginner-to-advanced systems topics ranging from network programming, threads, processes, operating systems, embedded systems and others. My goal is to help you get under-the-hood and better understand how computers work and how you can use them to become stronger students and more capable professional developers.
    About me: I'm a computer scientist, electrical engineer, researcher, and teacher. I specialize in embedded systems, mobile computing, sensor networks, and the Internet of Things. I teach systems and networking courses at Clemson University, where I also lead the PERSIST research lab.
    More about me and what I do:
    www.jacobsorber.com
    people.cs.clemson.edu/~jsorber/
    persist.cs.clemson.edu/
    To Support the Channel:
    + like, subscribe, spread the word
    + contribute via Patreon --- [ / jacobsorber ]
    Source code is also available to Patreon supporters. --- [jsorber-youtube-source.heroku...]

Komentáře • 85

  • @TonyXiang8787
    @TonyXiang8787 Před 2 lety +78

    Maybe you can also add -DNDEBUG to your compiler flag for benchmark. Most implementation of std::vector will check bounds in std::vector::operator[] in DEBUG mode, which is an extra step compared to raw arrays. If you switch on -DNDEBUG, the difference might be even smaller.

    • @lordadamson
      @lordadamson Před 2 lety +8

      I think they only check bounds on windows. in my experience they'll just overrun on mac and linux in debug mode all the same

    • @bitw1se
      @bitw1se Před 2 lety +8

      @@lordadamson GCC doesn't check bounds for the std::vector::operator []

    • @noop9k
      @noop9k Před 2 lety +7

      In any case his benchmarks include the cost of memory allocations and rand() calls. This is like a joke video or something.

    • @aakashs1806
      @aakashs1806 Před rokem

      [] operator seems to never check bounds, .at member function has it I think

    • @PEGuyMadison
      @PEGuyMadison Před 9 měsíci

      @@noop9k memory allocations are part of the use model, while rand is not.. for heavy hitters we always use our own allocator in C to keep the performance in check.

  • @xeridea
    @xeridea Před 2 lety +39

    I default to vector unless I know the size I need, and that it will never change, but try to be conscious of memory allocations in program design. Memory allocation is the only real slow part of vectors, so it helps to preallocate extra if you know you may need them, to minimize reallocations. The non contiguous containers you need to consider performance more, but of course they offer much more flexibility. The extra program size is from including the header, and will be a one time cost.

  • @obinator9065
    @obinator9065 Před 2 lety +18

    FYI do a .reserve(x) in this kinda case. This is probably the optimization that the compiler did.

  • @reubenfrench6288
    @reubenfrench6288 Před 2 lety +23

    Your code at 5:13 in the first loop is copying v into matrix[i]. This is slow because it requires allocating a new array and copying all the contents of v into it. Better options would be to either (1) resize and fill matrix[i], (2) matrix[i] = std::move(v), or (3) matrix[i].swap(v).
    Moving or swapping just transfers ownership of the underlying array to the new vector. The advantage of swap is it works on C++98, while move semantics were added in C++11.

    • @herrdingenz6295
      @herrdingenz6295 Před 2 lety +2

      i agree .. having a vector outside of the loop and growing it bigger with every loop iteration if it is too small might speed it up a little bit

    • @JacobSorber
      @JacobSorber  Před 2 lety +5

      Good point. Thanks!

    • @aadilshabier
      @aadilshabier Před 2 lety

      That should be optimized away in O2

  • @antonw8134
    @antonw8134 Před 2 lety +20

    Wouldn’t it be a better approach to measure the single operation you’re trying to optimize (addition of matrix)? Right now you’re measuring total time for setup (allocation and initialization), the operation (addition of matrix) and teardown (deallocation and and application termination). I find when optimizing for speed it’s best to linearize any logic (unroll any 2d loop into a 1d array) and to prefer the most likely outcome of a conditional operation at the start of a code section. Additionally in the case of arithmetic operations the use of vectorized instructions should squeeze out more performance.

    • @noop9k
      @noop9k Před 2 lety +4

      Yes, his tests are downright silly.

  • @shadamethyst1258
    @shadamethyst1258 Před 2 lety +2

    I'm glad you accepted the lack of significant timing difference between the different methods. I've seen several other channels that stop at the first difference/result they can see and call it a win, leading to the comment section getting furious about it (there's for instance a dozen of videos out there comparing C# with C++ and saying that C# is much faster because they forgot to turn on optimizations in C++). You've earned yourself a new subscriber :)

  • @MalamIbnMalam
    @MalamIbnMalam Před 2 lety +11

    I remember from my Data Structures & Algos class a few years back that when you use a vector you are essentially just resizing an array dynamically by doubling its size.

    • @taragnor
      @taragnor Před 2 lety +6

      Yes. Under the hood, a vector is just a pointer to a heap-allocated array. The advantage of a regular C array can be that it's on the stack (so no dynamic memory allocation needed). If you know the size of your vector beforehand you can eliminate the need to resize by just reserving enough memory from the start.

    • @muadrico
      @muadrico Před 2 lety +1

      @@taragnor so, there is std::array, which is the C++ equivalent to a fixed size raw C array.

    • @taragnor
      @taragnor Před 2 lety

      ​@@muadrico Ah yeah. I forgot about that. I rarely use basic arrays for anything honestly.

  • @atillakayrak4293
    @atillakayrak4293 Před 2 lety +5

    Comparing vector avec array don't make sense, it's like comparing malloc with static array. Of course, the array is better in term of performance but the vector is more flexible

  • @mario7501
    @mario7501 Před 2 lety

    A lot of the time when you know you're not constrained with memory but you need the performance, it helps preallocating space in a vector. You can do that using the reserve method, which allocates but does not initialize the memory.

  • @nikhilsathe5956
    @nikhilsathe5956 Před 2 lety +1

    Sir Jacob for some reason when the word ends with letter S, youre mic make loud noise and it sounds annoying, please try to fix this issue. Thanks for great content!!

  • @manoharpanwar1265
    @manoharpanwar1265 Před 2 lety +1

    Thank you ❤️

  • @LoesserOf2Evils
    @LoesserOf2Evils Před 2 lety +3

    Thanks!
    This goes to my question from that video: does all that space allocation cause a hit in execution speed? Now I know.
    Thanks again.

    • @taragnor
      @taragnor Před 2 lety

      Really though, if you code it right you should only be allocating space once. It's possible to set the vector to reserve a specific amount of space when you create it (or even after). If you know the projected size of your vector, you shouldn't be resizing it. And if you don't know the maximum number of elements, then you can't use a regular array anyway, you must use a vector (or some other structure capable of dynamic growth).

    • @LoesserOf2Evils
      @LoesserOf2Evils Před 2 lety

      @@taragnor, yes, that’s true. But sometimes you can’t precisely predict the amount of memory the application will consume during execution and therefore you have to decide in development which data structure will be best. Dr. Sorter’s example gives guidance on factors to consider.
      That’s what I was really asking about.
      Thanks for your comments. They too help.

  • @JohnZakaria
    @JohnZakaria Před 2 lety +6

    I heard that std:vector is a bad idea and instead we should do the indexing ourselves.
    What is your opinion on that?

    • @gvcallen
      @gvcallen Před 2 lety +6

      This definitely is the case. If you look at the simple example of an int[][], the access is super slow due to the double dereference and number of cache misses (because the memory for each array in the 2nd index can be in a completely different location each time for each member in the 1st index). The same rule applies for a vector of vectors, suggesting it isn't the good idea as well

    • @ShivamJha00
      @ShivamJha00 Před 2 lety

      @@gvcallen what are cache misses

    • @gvcallen
      @gvcallen Před 2 lety +3

      @@ShivamJha00 the CPU automatically "caches" blocks of memory (in chunks of 64 bytes) meaning it stores it in the cache which is much faster to access than normal RAM. So every time you read from memory, you are usually reading from cache first. But the cache is obviously much smaller in size, so the CPU needs to be smart about how to stores memory into the cache. This means that often you'll try read from an address that isn't in cache, resulting in a "cache miss". The CPU then has to do the slower read from main memory. We want to write code which has data stored closer together so we avoid cache misses when possible

    • @muadrico
      @muadrico Před 2 lety +3

      If you need a matrix, use a matrix, not a vector of vectors. You can use a linearized vector as a base for your matrix implementation.

  • @randall172
    @randall172 Před rokem +1

    all a vector is, is an pointer to some memory, the size of that memory, and the number of allocated elements into that memory, and with behavior to increase the size (generally by powers of 2), and reclaim that size

  • @kebien6020
    @kebien6020 Před 2 lety +1

    std::vector is life. std::vector is love. Just don't forget to .reserve() if you know how many elements you are going to put into it. Not reserving when you know the amount of elements is basically a performance bug.
    std::array is love when you know the size at compile-time. Something like a list of compile-time constants or configuration values.
    I've heard that std::deque is cool, basically a bit slower than a vector but has faster insertions at the beginning, do profile before using it, default to vector and profile for performance-sensitive parts.
    std::map is bad, just default to std::unordered_map unless you know you need sorted keys, and look at third party libraries for more specific implementations if you have a performance-sensitive use case. std::list is implemented fine, but it's just bad because linked lists are bad in modern computers.

  • @wChris_
    @wChris_ Před 2 lety +3

    It would be nice if you used google benchmark for this! or quick-bench for that matter

  • @noxagonal
    @noxagonal Před rokem

    Well. The first example is a little silly, which I guess is kinda the point, this is something a C++ newbie could do. You really need to know what each type is meant for. Vector for when you don't know how much space you need, std::array for when you always know the size... etc. The vector has a little more overhead overall but when accessing items, it should result in same or very similar assembly to C. Overhead is often from program size and having interrupts enabled.

  • @voytechj
    @voytechj Před 2 lety +1

    CPP flag in a Makefile is for cpp pre-processor. C++ compiler should be specified in CXX flag and his flags in CXXFLAGS. Also "clear", "all" and "timing" its better to mark as a PHONY.
    My previous post was deleted, maybe to much code in a comment and some youtube algorithm didn't like it? Any idea wats going on?

    • @noop9k
      @noop9k Před 2 lety

      Channel owner can also delete comments ;)

    • @voytechj
      @voytechj Před 2 lety

      @@noop9k It seems not in this case. My posts about make that I wrote in few different channels were deleted as well :-/. Could be something in a makefile syntax that triggers algorithm, who knows.

    • @noop9k
      @noop9k Před 2 lety

      @@voytechj I sometimes have this exact situation with my comments silently deleted, no matter what content, until I change VPN server.

  • @tsunekakou1275
    @tsunekakou1275 Před 2 lety +1

    is rand() good enough for this benchmark?

  • @31redorange08
    @31redorange08 Před rokem

    Why would you compare the performance when they have different use cases?

  • @noop9k
    @noop9k Před 2 lety +3

    I haven’t watched your video completely yet but I can guess you are going to get very similar results or even results varying significantly as you change the total size, because this is not how you write stable benchmarks.
    The main bottleneck here is memory bandwidth and possible cache aliasing. It is (likely) going to hide the perf differences between vectors, arrays etc.
    These would be visible with much smaller arrays and large number of repetitions.
    And you don’t really get stable measurements from single-shot runs in any case.
    Update: haven’t noticed you are also measuring the speed of memory allocation, not the just the performance of actual algorithm.
    And the rand() calls are also a part of the measurement.
    This is silly and a waste of time.

  •  Před 2 lety +1

    Why not allocate in all versions the same arrays and vectors and just don't use the ones you don't want to check?

  • @jacko314
    @jacko314 Před 2 lety

    theoretically vectors should be marginally slower. to run this test properly you need to page in the memory first. ie: overwrite the array /vector multiple times with random values and then run the test. and yes compiler optimizations are super important. that is because without it the code to let you debug stl is a huge overhead. I'm a kernel engineer so i don't use stl or templates, but the whole idea behind them is you can get static checking, algorithm copying ect for free. so in theory you could create a template for a 2d matrix that the compiler reduces down to c, and then you won't ... or shouldn't have performance differences. but yes you would have to do the template coding to support this. for many specialized cases this is trivial.

  • @michalski9141
    @michalski9141 Před 2 lety

    havent seen you cover cpp before

  • @draconicepic4124
    @draconicepic4124 Před 2 lety

    When I saw this video the first thought that came to my head was, "Why is this even a question?" The allocation process for arrays will be faster. It would either simply be allocated off the stack or allocated in memory with the program image. There is no comparison between those allocation methods and dynamic memory allocation. In terms of performance, A vector will be slower because of any overhead the design might have. I would suspect that at the minimum the vector would at least check if the index is in-bounds. On the other hand, arrays are rather dumb objects with no build-in security features within C/C++. It will blindly trust the programmer not to destabilize the program. That isn't to say that vectors are useless. If you need to dynamically allocate an array of unknown size, they're perfectly fine. I personally never use them since I prefer allocating and managing the memory myself.

  • @herrdingenz6295
    @herrdingenz6295 Před 2 lety +2

    why don't you do some real comparison of your code of the different types (array vs. vector) on different compilers usind godlbolt (like Jason Turner does)?!

  • @mikeyangyang8816
    @mikeyangyang8816 Před 2 lety +1

    I implemented a 256bit encryption algorithm. After trying arrays and vectors extensively on files from 6mb to 9gb, I have to say there is absolutely no need to use arrays what so ever. Because when I try to ensure the free and reallocation of array memories are secure, I have to make a class to handle the arrays. I watched one the c++ talks by the funder, vector is really there to be a drop-in replacement for arrays, that handles exactly what I wanted for arrays. Also, be sure that you preallocate and preload things, asking the OS for memory can be slow, more so with vectors.

  • @anjangoswami1462
    @anjangoswami1462 Před 2 lety +1

    Hi Dr. Sorber can you start a c++ series? We have been extremely benefitted by your c series.

    • @herrdingenz6295
      @herrdingenz6295 Před 2 lety +2

      checkout the cherno: czcams.com/video/18c3MTX0PK0/video.html

    • @ShivamJha00
      @ShivamJha00 Před 2 lety +1

      @@herrdingenz6295 cherno is really the best

    • @muadrico
      @muadrico Před 2 lety

      Hopefully, not. 💁‍♂️

  • @paherbst524
    @paherbst524 Před 2 lety

    The interesting thing to look at for vector would be time for realloc and memory fragmentation.

    • @mitya
      @mitya Před 2 lety

      Vectors use contiguous memory allocation because they are implemented using arrays, of course.

    • @paherbst524
      @paherbst524 Před 2 lety

      @@mitya , right, but over time you'll be realloc'ing all over the place, making malloc slower, no?

    • @ohwow2074
      @ohwow2074 Před 2 lety

      @@paherbst524 if you know how many elements your vector will hold at max then you can use reserve() function after the declaration of vector to reserve enough space for it from the beginning. It will not reallocate again unless the capacity is exhausted again. Then it will reallocate again but your program will not die LOL. But while using stack based arrays, after the array becomes full there won't be any extension of capacity. The program will encounter many problems.

    • @mitya
      @mitya Před 2 lety

      @@paherbst524 The same is true for regular C arrays, I guess.

  • @filips7158
    @filips7158 Před 2 lety +11

    I can tell that C++ isn't your thing, that's not really how you should use vectors 😅

    • @JacobSorber
      @JacobSorber  Před 2 lety +4

      Please be more specific. Are you concerned that I'm using a vector of vectors? Critiques without explanations or recommendations aren't very helpful to other viewers.

    • @ohwow2074
      @ohwow2074 Před 2 lety +21

      @@JacobSorber a big mistake of you:
      Instead of writing:
      std::vector< std::vector > matrix(ROWS);
      you should write:
      std::vector< std::vector > matrix(ROWS, std::vector(COLS));
      So the matrix vector and its internal elements(of type std::vector) will be allocated all at once. The program won't need to reallocate matrix every time you want to push another std::vector object to it. You won't even need to push anything. So much much faster.

    • @muadrico
      @muadrico Před 2 lety +1

      @Jacob Sorber Really? Come on, be honest like L. Torvalds and tell us frankly that you hate C++. You surely know, that vector is not intended to be used, like you did use/ it. Furthermore, I am sure, you know how to properly benchmark, as well. However, your C related content is great.

    • @PavitraGolchha
      @PavitraGolchha Před rokem

      @@muadrico you are right about C++. It's 💩

    • @muadrico
      @muadrico Před rokem

      @@PavitraGolchha Nope.

  • @chipcode5538
    @chipcode5538 Před 2 lety

    Jacob your conclusions are a bit silly. The singlearr and singlevec tests both use static allocation. Why would you use a vector in this case? A vector uses more memory. Then you switch on the optimization. The compiler optimizes the code and makes it about the same speed. The conclusion should be use the optimizer for faster code and smaller size. Comparing the code size instead of the data memory size is also silly. For a small program like this the majority of the code size is the C-runtime.

  • @dingo9696
    @dingo9696 Před 2 lety +1

    Dynamic allocation is slow in general

    • @knufyeinundzwanzig2004
      @knufyeinundzwanzig2004 Před 2 lety +3

      It's definitely slower than not doing an allocation. I don't get the point of this video.

  • @ahmadalastal5303
    @ahmadalastal5303 Před 2 lety +2

    Well yes slower for the following reasons: [I did not watch the video]
    1. Arrays are stack located while vectors are heap located.
    2. adding elements in the vector that exceeds the size of the vector requires allocating new heap location and copy all elements in the first vector to the second one, usually capacity and size are not equal so allocating more capacity in std::vector requires bigger heap request to OS.
    3. allocating in heap require OS heap allocating request, in Windows OS this heap allocator is a kernel object, meaning if you want to allocate some space in heap you have to jump from user-space to kernel space and do lots of checking like security checks, call stack check (to check this is indeed coming from user-space not an application that runs in kernel mode).
    4. the Windows OS do some tricks to reduce heap fragmentation, like virtual memory with windows handles, sometime pushing allocated data to open a room for next allocation request and this is costy.
    5. In windows 10 and after, there is a new technique in handling heap called Segmented-Heap, if enabled a new trick is added to handle heap allocation requests.
    So, don't use heap unless you have to, if you want to allocate dynamic memory on stack use alloca() or use assembly function like below, the following code only works on Windows due to x64 Windows Calling Convention that is not the same as Linux, where Linux pass function arguments to rax first
    AllocateHeap proc
    sub rsp, rcx
    ret
    AllocateHeap endp
    Last warning, if you are using stack make sure you to check for stack overflow exceptions, this exception can be easily reached specially in recursion functions[That's why I hate recursion, unless I have to of course]

  • @creeperbros-dg9jr
    @creeperbros-dg9jr Před 2 lety

    I don't know why this is recommended but I know that vectors are what accelerator controls

    • @gregoryfenn1462
      @gregoryfenn1462 Před 2 lety +1

      I think you might be thinking of mathematical vectors as a member of a the set {(x,y,z) : x, y, and z are real numbers}.
      This video means vectors as lists of variable (finite) length eg [683, -73, 74, …, -82] or [‘g’, ‘9’, ‘ ‘, ‘€’]. The first is a vector of integers and the second is a vector of characters.

    • @creeperbros-dg9jr
      @creeperbros-dg9jr Před 2 lety

      @@gregoryfenn1462 oh though it makes even less sense that this got recommended

  • @dortmall
    @dortmall Před 2 lety

    answer: no

  • @ssnobaby
    @ssnobaby Před rokem

    No.

  • @muadrico
    @muadrico Před 2 lety +1

    You should not use vector of vectors, use something appropriate. A C++ programmer should know that. Also, this comparison does not look like a representative valid benchmark (for several reasons). Looks, like you are ranting C++, lately 🤔. Really disappointing, because considering C, I enjoyed your content, so far.

    • @JacobSorber
      @JacobSorber  Před 2 lety

      How is this a rant? It's a simple comparison (never meant to be a definitive or scientific comparison), and the take-away (if there is one) is that the cost is insignificant. It's not anti-C++ in any way.

    • @muadrico
      @muadrico Před 2 lety

      @@JacobSorber You answered your question yourself in the very same comment. Thanks. You made a comparison in a way in an improper way, therefore you can't conclude anything from it. But you didn't explain that in the first place. Inexperienced programmers take you conclusion for real a d now thin C is better than C++ and std::vector is bad and they use plain C arrays instead. IMHO, I don't think, that was your intention, but you need to be more sensitive about such things.
      With "ranting" I refer to your latest videos in general, and I really don't get, what you are up to. Especially, because until recently, I really liked (still like) your content.

    • @coshvjicujmlqef6047
      @coshvjicujmlqef6047 Před rokem +1

      vector is extremely important and correct answer for a graph

  • @wesleyoliveira6570
    @wesleyoliveira6570 Před 2 lety +5

    UwU