Software Drag Racing: M1 vs ThreadRipper vs Pi
Vložit
- čas přidán 9. 06. 2024
- Dave pits a new Apple Silicon M1 vs an AMD ThreadRipper 3970X while a Pi3B+ an Pi4 try to tag along! See the surprising results and the reasons behind them in this episode of Dave's Garage Software Drag Racing.
Code for this project is available here:
github.com/PlummersSoftwareLL...
0:00 Start
2:50 Single Core Workloads Defined
4:00 BT and BTR Instructions
5:50 Install C, C++ etc
6:20 PI3B+
8:00 PI4
8:35 PIs compared
9:00 M1
11:08 Spoiler results
11:20 Github details
13:00 Python Apologetics
14:30 Closing - Jak na to + styl
This channel makes me happy.
I am not even a programmer but somehow listening to Dave is interesting and calming at the same time
Totally! 🥰
Me too
He made me appreciate windows .... that says alooooot
It's so calming, entertaining, educational and just plain fun.
Thanks for giving your time freely to play with this sort of stuff.
CZcams is an amazing medium for us mortals to engage with interesting people like yourself.
Keep up the great work 👍
Mellow piano music, sparkly lights.. new Dave's Garage episode! ... It feels like Christmas! Dave thank you so much.. as always, top notch content.
- Tell me you're a Windows developer without saying "I'm a Windows developer"
- OK.exe
./no
@@mek101whatif7 imma need you to $rm -rf / right tf now
windefproc
Exactly what I came to the comments for
@@jabalahkhaldun3467 Useless unless you --no-preserve-root
Dave you talk in perfect speed. For once I don't have to speed up the video I'm watching 🤣🤣
That's funny ;-). Yup, I default to 1.25X I think!
Agreed
Zoomers
Hahah :D
I watch these at 2x speed. But then again, I watch most others at 3x.
That's the comparison that we needed but didn't know it!
So I changed the vector to std::array, and got ~13000 passes on my m1 air. Fyi, it was ~4500 passes with vector.
that probably would lead it to also be faster on the other implementations, since it becomes static memory
This has quickly become my favorite channel.
Hi, would be nice if the github url was mentioned in the description. Otherwise nice episode.
Looks like he fixed that.
This guy is what CZcams should be
Thanks for the kind words!
This is bloody brilliant.
Also, the fact that Nano was used as the editor made my day. Kudos to you sir!
nano is so nice :D
@@bobbydazzler6990 No.. masochists!
Great video, as always! Maybe another metric to consider: price per pass? :)
For example: the Pi 3B+, $35/305 ~ $0.11/pass
And Watts consumed per pass ;-)
@@donaldklopper yeah, outlay is usually nothing compared to power, in an industry. Outlay is usually only an issue for home and small businesses that let equipment sit idle 99.99% of the time, even while "working".
This will be of no interest to anyone but a Pi 1 Model B (from 2012) achieves a score of 97
It makes me happy to know! Thank you for sharing!
I don't usually notice background music without hating it but I think you found the right balance of musical complexity and intrusiveness
I'm a simple man. I see Dave drop a video, I watch it. It's really not complicated. Your a legend dude 👏
I appreciate that!
Thanks for making these, as a constantly learning programmer these are invaluable.
Thanks Dave I am a Software Engineer, just graduated from college and am starting out. I love your content. I once had a professor who said "Programming is wizardry, and programmers are wizards." Someday I hope to be as great a wizard as you buddy.
All the best 👍 Per (DK)
Great work once again, Dave!
The subtitles are helpful, especially because I watch at 2x speed. There were a couple places where they were missing. I remember one when you were talking about BTR in the beginning, and one when you were talking about the bugs found in your code.
Edit: and the entire Python apologetics chapter
Thanks for this episode. Looking forwards to see how different compilers perform.
Thanks for the quality content. This is both entertaining and educational.
Oh damn this is gonna get wild
I really appreciate you and your channel. This is a great example of a proper benchmark
I just ran across your channel a week ago, and I'm really enjoying hearing your take on different programming issues! I used to work out the details of an algorithm using whatever scripting language was available on the platform, and once i had a solid plan, I would go back and rewrite it using C or FORTRAN or whatever else. This proved an effective way to cook up some great code that could do the job. Thanks for all of the great comments during your videos!
I love that we are mathing it up on different systems.
Dave, I am really enjoying your videos! I am currently studying Computer Science in school and hope to pursue a career in programming and your videos are inspiring me to continue my pursuits!
As a car/drag racing enthusiast and hardware engineer learning to code this was an excellent episode. Just subbed!
99.2 K Subs as I type this! You found your groove and your channel is growing nicely! I remember (as it was not so long ago) joining when your sub count measured in the hundreds. I do hope that you will continue to feature automotive content and tech projects as well. Well done, Dave!
Really entertaining - the right balance of tech with humor i enjoy - and always stay for the outtakes - Thanks Dave
Glad you enjoyed it!
This is so detailed and neerdy. I love it!
I love that your terminal window is blue with light grey text.
Congrats, you are the first youtuber who convinced me to click on the Like button upfront.
ikr
i'm aspiring to take my interest in tech further, and this channel is a reason for that!
I wrote a multithreaded solution to prime number generation in C++ a few months ago, it's actually not too hard to implement. Would be interesting to see how much the threadripper outpaces the M1 when you use all the cores lmao and would perhaps be a good next-step up from this.
Single thread performance is still super important. So much software is single threaded.
@@tommcintosh4705 Sure, it's important, but it's not more important than multithreaded performance. Things that tend to take a long time (e.g. compilation, 3d rendering, encoding video files, etc.) also tend to benefit from multiple threads, plus with more threads you can run more software concurrently (e.g. even if most software _was_ single threaded, being able to run more of it simultaneously could be a huge benefit).
Also all current implementations of x86 has SMT: an optimization around the weakness that it has in purely single-threaded workloads by allowing a single core to do a bit more than one thread's worth of tasks at once (essentially, a lot of the core's resources are left idle by it's design, and that idle portion can be used to execute another thread at the same time). The M1 specifically has a relatively large advantage in that _one_ aspect, but essentially you're handicapping x86 by not letting it use it's benefits as well.
Based on that, it's pretty misleading to show off single-threaded performance and act as if it's _that_ important of a metric.
Edit: to be clear, I'm not saying Dave is being misleading here, but that Apple's sudden surge of "hey, check out the single-threaded performance of our M1 part and see how powerful it is, also do benchmarks with single threads plz thx bye" is misleading and the fact it's worked: many people are suddenly trying to come up with super synthetic benchmarks that show off this weakness of x86 and push it as a huge problem, when it is typically _not_ that huge of a deal in practical usage.
@@tommcintosh4705 Well yeah
@@tommcintosh4705 I find that very little software is still single-threaded nowadays. Even games which are often very intensive on a particular single thread are usually multithreaded.
@@nephatrine Yup, no matter how much optimization you do on a single threaded code, it'll be hard to beat just spawning a crap ton of threads, even with bad optimization (if you can that is).
I recently had a .Net code run on a single thread for almost 50 minutes (that was optimized), but running it on 12 threads got it below 5 minutes. Try doing that on 1 core I dare you. (Also, later I got it running on my GPU using OpenCL, run the same task in under 10 seconds XD)
I LOVE the fact you talk at a nice, normal pace. There are some channels I watch at 1.5x speed just to get them to talk at a normal pace.
The showdown of the decade
This was unexpected. I ran the CPP code on a WSL 2 terminal running Ubunutu. The CPU on the box is an AMD Ryzen 3800X running at stock speeds. And still, it outpaced the Threadripper. The first run turned in a score of 9622!
Passes: 9622, Time: 5.000000, Avg: 0.000520, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1
Would be cool to see an optimized version of a wasm and Node benchmark in addition to the vector optimizations you made to the CPP benchmark!
I actually like the speed you talk at. You're the only videos which I can watch a regular speed instead of 2x like most others and 1.5x for everything else.
1:16 Hell yes! Thumbs up and subscribed right away. You manage time very well in all videos i have seen so far.
Watching you never gets old.
10:23 Nice of you to have mentioned the std::vector thing, that was discussed in some comments of the previous video.
It would be interesting to see whether its template specialization in your STL implementation was done actually with bitfields (and if so, what are the differences compared to your bitfield manipulation), or using actual 1-byte bools (that would be then byte-aligned)...
Thanks @DavePL, there goes a few hours on my long weekend playing with this :) Great content BTW now one of my favourite channels.
I was about to go and write GoLang, PHP, Pascal implementations, then I saw all the existing implementations and now I'm not sure its worth just being another "me too" :)
Interestingly the CPP versions of this achieve 4820 on my super old i7-870. FYI I achieved 8221 on my i9-9900K
I would love to see a drag race between C++ and Rust!
great video, very interesting comparison and i love the jazz in the background
Glad you liked it!
Dave you rock! I love your channel!!
Thank you Dave for sharing this video.
Love this follow-up to the first SW drag race video...and we get bloopers! Great work Dave (and production staff?) :)
Just me and a couple of shop dogs! Maybe at 200K I can hire a student editor :-)
the Threadrippers and zen2 in general are such beasts man.
M1 is still very impressive for a very new product in it's first life cycle. Also factoring in the power consumption makes it look even more impressive.
also cost makes it impressive for its performance you could get almost 3 Mac minis for the cost of just the threadripper chip
@@michaelhenecke the threadripper is a server chip, no person needs that many cores
@@jan-lukas Yep, and we can get a decent gaming laptop with mac price
It'll be interesting to plot the same chart but divide by Watts used by the CPU.... Surprising results...
And you mentioned Turbo Pascal! I like you.
Dave, I love the content and the upvote is worth it just because you bothered to make chapter markers in this video!
Sorry about your stroke, Dave. Rapid recovery! 😁
Juhu don't know why a Video like that makes me that Happy
Great channel Dave, lots of great info. Hope you can help folks porting Windows to the raspberry with your knowledge.
Love your videos Dave all the way from uk
Mr. Dave you're one of the best content creators that I had the pleasure to find on CZcams
becoming one of my favorite channels.
Got ~10k on an old 6600k and was sort of surprised, but in the end it makes sense as it's a single core workload. Great video.
You and curious Marc are my favorite CZcamsrs right now
Enjoy the channel. Good stories and random bloopers. Cheers!
The bloopers got me! Whole ep of gag reel please lololololol
Dude, well done.
Hey, thanks!
I'm really getting a lot out of your content, Dave. Many thanks.
Thanks!
My time feels valued
Dang! That's just peachy, a (former) Microsoft employee has forced me to upgrade once again. I just upgraded to a subscriber.😁 Thank you for the great content.
The .exe extension at 6:36 does reveal your Windows roots..
Well presented and articulated though, as always.
Great job!
I've watched so many of your videos that I was amused that I was not already subbed. Well I fixed that bug. Speaking of bugs, could you do a video about all the rare bugs you know about? Always found that fun.
3 haters who don’t have any clue what he’s talking about. I mean I know what he’s talking about but don’t know how to do it...but I don’t hate Thanks for entertaining content!
Great video and test!!
Thanks for sharing :-)
Love your content 👏👏👏👏
Great video! I'd like to see a CPP vs Rust vs Go showdown
Good stuff.
at 10:03 your testing of index % 2 == 0 and index & 1 == 0 - only makes a difference if you are running in debug, not in release mode, as release mode will always compiled SomeVariable % 2 == 0 to the more optimized version (i.e. not use modulo explicitly as it is a very costly operation, in relative terms).
For the record, gcc and clang won't use modulo explicitly in debug builds if index is unsigned, msvc will. However if index is signed, msvc and gcc wont use modulo but clang will.
@@pikachulovesketchup666 of course all compilers does it, my observation was simply about debug vs release builds, and as Nathan showed thats not the entire story today.
love it, was curious about the M1...don't have one...not in a hurry to get one...but curious where Apple is headed with it. Looking forward to your compiler comparison. Also something I don't get to look at much...in my world it's visual studio...and you live with it. But I know from prior experience that is not the only game out there.
I gave that feedback about talking speed, and he kept that in mind 😀. Hats off sir.
Glad you liked it! I'm always paying attention and trying :-)
Even though I am current swinging in a hammock, in front of a volcano in Costa Rica, I could not miss a Dave's Garage premiere.
Living the dream!
I may be joining you, Liberal Lunatic Free Zone...
Nice information, glad you brought up that Python isn't the answer to all code. Lately with all the do it in python rant in alot of the developer areas, its nice to hear use the language that makes sense for the task at hand. Thanks again!
Some coders want everything available in the language they already know. That's how we got the do it all in Python crowd and do it all in JavaScript crowd as well.
I do heaps of programming with deep learning, sometimes Web server logic, etc. A lot also includes prototyping, so my calculations of "speed" always include how long I need to code.
Sure, had I written my code in pure C/C++/etc., it probably would have been 100 times faster than it is now. But I need to get stuff done instead of obsessing on how low-level I can get. Had I done that, I would probably have finished 10% of my work shortly before retirement in a couple of decades.
It's perfectly sensible that there's languages on so many levels (no pun intended). No point on starting a war over _that_, too.
Except for R. This just sucks. ;)
LOVE YOUR VIDEOS ❤️❤️
super interesting shtuff
I smashed the thumbs-up button. I couldn't argue with your logic.
You smashed it? Do I sound like Peter McKinnon? You can just lightly click it. But I thank you nonetheless!
We do like charts!
You have no idea how apt the drag racing analogy is. I've been working on my own cars for more than 40 years. I know my way around an engine. But the idea of tearing down and rebuilding a 10,000 HP engine in 45 minutes is basically sci-fi to me. Similarly, I've been playing with computers since my folks bought us an Apple IIe back in the mid '80s. But what you do here is basically voodoo. Sure, I understand the concepts. It's the depth and breadth of the minutia that impresses me. Fun stuff.
I love this!
For giggles I run Dave's code on my computers here. The Windows boxes (Ryzen 5 and Intel i5) run the g++ in Debian under WSL2, the other machines run Debian or Raspberry Pi OS on bare metal. To be honest, I'm very impressed with the Ryzen.
AMD Ryzen 5 3600X 6-Core Processor => Passes: 9605
AMD Athlon(tm) II X3 460 Processor => Passes: 3642
Intel(R) Core(TM) i3-4005U CPU => Passes: 2551
Intel(R) Atom(TM) CPU N270 => Passes: 911
Intel(R) Atom(TM) CPU N450 => Passes: 871
Raspberry Pi 3 => Passes: 764
Naming the output .exe is well played ,)
I run a Ryzen 1600 (14 nm version no OC 3.2-3.6 GHz clock speeds). And I got this result with g++ -Ofast
Passes: 8427, Time: 5.000000, Avg: 0.000593, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1
I would expect it to be a lot lower.
Got a similar result on my AMD Ryzen 7 4800H with Radeon Graphics, no OC in a Laptop.
Passes: 9840, Time: 5.000000, Avg: 0.000508, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1
I got 8200 passes on Ryzen 3600X but compiled with MSVC. WTF?
Using clang in Ubuntu 21.04, my Ryzen 4750GE w/o overclock:
Passes: 10777, Time: 5.000000, Avg: 0.000464, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1
I was wondering what was going on; glad to see I'm not alone. 3900X @ 4.2GHz all-core OC -> Windows 10 -> VirtualBox VM running Mint 20.1 = 9384 passes.
Running PrimeCPP on my iMac with a 10700K CPU results in:
Passes: 8607, Time: 5.000000, Avg: 0.000581, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1
It would be interesting to include x64/Rosetta vs. arm64/native on the M1...
I agree the M1 is definitely doing something interesting for x86 emulation, though it appears to be just adding hardware support for strong memory ordering when running code intended for the x86, which given the cache heavy nature of this benchmark probably wouldn’t have much effect.
This was the straw that broke the camel's back in favour of me buying an M1 Mac after a decade of netbooks and secondhand business laptops from Japan. The high performance with long battery life and low heat output got me close, but not close enough to fork out the $$$ until I saw even the x86 emulation was sometimes faster than on x86 hardware.
@@andrewdunbar828 what makes Dave’s tests here interesting is that the M1 is a Laptop CPU... the Threadripper is a Desktop CPU. It will be fun to see what Apple do in the Desktop space with their ARM implementation!
@@blooddude I might be mistaken, but from what I know the ARM based architectures don't scale that well.
@@blooddude if anything
Dave, I can''t program anything more advanced than a PLC, but when ever a page with your videos load, I hit the thumb ups regardless, as you always increase my understanding of the stuff I have no knowledge in. Thank you !
Code a prime calculator in ladder logic ;)
@@stonent I do most of the stuff in FB, but point taken lol
If you had been the professor in college, I would probably paid attention more 🤣
Hi Dave,
thanks for producing this channel! Very enjoyable!
I ran PrimeCPP on my 5950X in WSL2:
Passes: 11267, Time: 5.000000, Avg: 0.000444, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1
Passes: 11327, Time: 5.000000, Avg: 0.000441, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1
Passes: 11346, Time: 5.000000, Avg: 0.000441, Limit: 1000000, Count1: 78498, Count2: 78498, Valid: 1
Cool! I've seen a 12000 as well from another viewer, but I think he was overclocked!
@@DavesGarage , User_Overclocked_Error - Only Machines Should Be Overclocked (0xB00B1377)
Apreciate your efford to include a subtitle in an informative video like this. You talk like a C program runs on a newest CPU when my brain is a pentium 3 running Java which is constantly overheating
Closin in on 100k!
This beats watching mindless TV. I am learning something about some thing I truly enjoy, computers.
Neat. Somehow, _all_ the results were actually impressive.
The lowly Pi 3 is impressive for how narrow the delta actually is between cheapest possible self-contained computer and a TOTL desktop CPU.
The Pi 4 for how much tighter that gap.
The M1 for being a brand new product with the slider pegged dead in the middle between "optimized for low power" and "optimized for high performance."
And, of course, the Threadripper for having the biggest 🥜 of just about any CPU available. haha
perfect. tks
Hey Dave ;) So bloody sweet to see a new upload,
Regarding the M1 did you compile with gcc or clang or etc?
gcc and xcode (which I think uses clang) but gcc was faster of the two
@@DavesGarage interesting
I'm not a native english language speaker, but I'm mostly ok with your speed, except in those rare circumstances where you talk native slurred american without much emphasis on words :D Thats really hard to get for me. And yeah... a programmer thats been on the dark side now spilling the beans.. how can i not subscribe
Where do I get the shiny lights that you have in the background? I enjoy them. Also love your channel, find your content enjoyable, your presentation soothing and engaging.
First of all I really enjoy the content you produce.
An idea for the topic
on y cruncher program
multi-threaded Pi calculation