CPU? GPU? This new ARM chip is BOTH
Vložit
- čas přidán 3. 03. 2020
- Hostinger Deal:
Go to www.hostinger.com/coreteks and use code CORETEKS to get up to 91% OFF yearly web hosting plans. Succeed faster!
Support me on Patreon: / coreteks
Buy a mug or a t-shirt: teespring.com/stores/coreteks
Visit: coreteks.tech/
I now stream at:
/ coreteks_youtube
Follow me on Twitter: / coreteks
Follow me on Instagram: / hellocoreteks
Sources & Further Reading:
• "Opening Address: From...
• Striving towards high-...
• A64fx and Fugaku - A G...
www.titech.ac.jp/english/grad...
www.fujitsu.com/downloads/SUP...
www.fujitsu.com/downloads/SUP...
www.fujitsu.com/jp/solutions/...
Some footage from Charbax:
• Fujitsu A64FX Post-K S...
• Fujitsu at CEATEC 2019...
• Fujitsu Post-K ARM Sup...
- Videos from the AMD official CZcams channel were used for illustrative purpose, all copyrights belong to the respective owners, used here under Fair Use.
- Videos from the Intel official CZcams channel were used for illustrative purpose, all copyrights belong to the respective owners, used here under Fair Use.
- Videos from the NVidia official CZcams channel were used for illustrative purpose, all copyrights belong to the respective owners, used here under Fair Use.
- A few seconds from several other sources on youtube (*including other youtubers*) are used with a transformative nature, for educational and illustrative purposes. If you haven't been credited please CONTACT ME directly and I will credit your work. Thanks!!
#A64FX #FUGAKU #FUJITSU - Věda a technologie
A64FX.....Why have I heard that name before?
Oh yeah! Athlon 64 FX!
Yeah, tha name remind´s me that too XDD
So thats where the deja vu feeling came from
What goes round...
I knew I'd seen that somewhere before but I couldn't put my finger on it ! Thanks for the reminder ! :)
FX you...
RIP SATORU IWATA.
A BRILLIANT AND UNIQUE MIND.
His father never wanted him to pursue a games career.
There's only one video that comes to my mind at this point.
czcams.com/video/j2dxX5DIEMQ/video.html
R.I.P. to both.
Intel is better
And I'm glad he didn't listen.
@@masternobody1896 intel is nr.2.
Nice Haiku
Dedicated my life's 20 MINUTES.. Worth it as always..
Absolutely!
Only dedicated 10 minutes. x2 speed is great.
20:36 actually for me..cause I wanted to see my name on the Credits... 🤣😝
Always guarantee when you watch a (Jim) #AdoredTV video ❎
@@johnnyxp64 Being a Coreteks patreon means having big pp
This video sounds more like a nonfiction crime tvshow than something about processors.
can u share us ur pic?
You make me laugh but at the same time I'm annoyed. This dude has a wealth of knowledge and insights, but he's HORRIBLE to listen to.
17:28 sure, streaming today data would be instant with tomorrows technology, but what about tomorrows data? The extinction of load times is far away. More powerful computers? That will just be an excuse to use more detailed textures :'D
loading nowadays is no longer limited by file size, it is limited by bad code. NVMe SSDs do many GiB/s, no game asset needs more than the blink of an eye to load. Sadly, developers have some of the fastest possible hardware available (especially in big-budget games and programs), so they have no need to optimize. Running the same code on an average PC then makes it unusable.
The biggest issue is poor telecoms infrastructure. Even in the UK it varies massively in speed, they're already trying to save cost and not put in full fibre.
@jayViant Talking of holograms:
czcams.com/video/V7V05T4DhrU/video.html
@@Mil-Keeway compression and bad structuring of data makes for terrible load times rven on high end NVMes. Games before the next get werent optimized for this, maybe except star citizen and arkham knight.
Just compare it to just 10 years ago when some websites would take years to load half the time, or 10 years before that when printing a jpeg was faster than viewing it on a webpage. We're really stretching conventional processor capabilities thin but there will definitely some fundamental shift in the industry that keeps the performance train chugging along. Could be a beefed up ARM chip, desktop chips made from different materials (silicon ain't the most performant, it's the most flexible) or something completely different if synthetic neurons or quantum computers have an early breakthrough. Internet banwidth is also constantly improving.
Honestly the only thing slowing us down is companies milking their current technologies like crazy. Let's all thank AMD's Threadripper for shoving 32 inefficient cores into pro-sumer PC's and speeding up global warming lol. And let's not forget Intel's tiny generational improvements. There are certain sollutions which could be implemented pretty soon but who has time to research other options when they have to pump out 3 useful and 7 useless chips a year?
TL;DR : Tech seems to be improving faster than consumer needs because it never improves fast enough for professional needs, driving researchers to find new and better sollutions. But capitalism's a bit of a bitch sometimes and is getting in the way.
7:55 Not quite. Supercomputing applications actually have limits to their parallelism. There is also a need for heavy communication traffic between cores. Hence the fast interconnect, which is a major component of the build cost of a super.
For an example of a massively parallel application which doesn’t need such heavy interprocessor communication, consider rendering a 3D animation. The renderfarms that are deployed for such an application are somewhat cheaper than supercomputers.
Can't imagine a japanese chip without TOFU interface
iExplorer has SHAKRA engine (as shown in TaskMgr)
and if it was coded in Germany it would have a SAUSAGE Cache pipe... LOL
@@glasser2819 No, it would have a Bierfass (beer jar) pipeline ;) I am german, I should know. ^.^
The driver of AE86 was a tofu delivery man
SUSHI coming up next.
They make the Best capitors .
Thank you for doing what you're. I learn a lot from your videos.
8:53 That doesn’t make sense. “Teraflops” is a unit of computation (“flops” = “floating-point operations per second”), not of data transfer. Data transfer rates would be measured in units of bits or bytes per second.
Yeah, A64FX has 1TB/s theoretical bandwidth and 840GB/s of actual bandwidth.
So,, ARM, AMD and Fujitsu teamed up for a super APU, that's in some ways more epic then EYPC..., I will call this colab FARMeD!
Proof?
@@absoluterainbow it was a tongue in cheek summery of this video and an pun at the end
your video presentations are so well done. I always look forward to watching them! such an interesting product. thank you for covering this!
Great Video! Thanks!
I like that progress bar ads
Woot !! Coreteks is back, feels like its been forever...
Don't you just love their Masonic logo? The honeycomb hexagon also known as the Cube, a reference to Saturn and the system we live in. Just so happens to also be in the beehive colours.
Loved this video so much, watched it twice in a row.
I once heard that HAL got its name by grabbing IBM's and ticking the characters because they saw themselves as "one step ahead of IBM". Seeing this, I truly believe it.
ticking them back, not forwards
If you haven't already, you may want to look into RISC-V's upcoming Vector extension. It does all that SVE does, but better.
Better how?
@@Toothily There are a couple of independent things. For one thing, there's no architectural upper limit to the number of vector lanes. Another thing is that the dynamic configuration of the vector registers allows better utilization of the register file (for example, if only a couple of vector registers are used, they can subsume the register storage of the other registers to get much, much wider vectors). Also, while that part of the specification is still a bit up in the air, there is an aim to provide for polymorphic instructions based on said dynamic configurations, which means that it's far easier for it to adopt new data types with very small architectural changes. They also aim to provide not only 1D vector operations, but even 2D or 3D matrix operations, which could provide functionality similar to eg. nVidia's tensor cores, except in a more modular fashion.
There are more examples too, but I think this post is running long enough as it is. I recommend reading the specification.
@@chuuni6924 that sounds really cool spec wise, but do they have working silicon yet?
@@Toothily The spec isn't even finalized yet, so no, there's definitely no silicon yet. However, the Hwacha research project is being carried out in parallel and I know there's a very strong connection between it and RV-V, and I believe they have working silicon in some sense of the word. It's a research project rather than a product, however, so not in the ordinary sense of the word.
Really wanted to know, what you guys think/opinion about this computer compared to nvidia dgx a100?? Does it has equal performance or something? I really excited to know this. Thx :)
You're finally back. Thanks again for the amazing work.
Congratulations on 100.000 subscribers !! I love your videos and i came a long way in computer khnowlage because of you , i hope you have a great year ! Love you from EU Si ♥️😊
17:01 Reseat that RAM!
Kenta Aoki lol I noticed that too
Oof
What you describe sounds like a modern version of the PS3's cell chip.
Kind of, yes! The PS3 used several DSP-like processors connected onto a ring bus - except that rings, as well as other pure bus like topologies, while being the simplest way to interconnect multiple regions on a chip have several inherent limits which restrains this kind of topology to a limited amount of locally adjacent cells which is why the kind of processor presented here not only has one ring, but a hierachy of rings topology:
See this paper as an example for examining & describing different hierachical ring topoligy variants as on-chip interconnection networks, also called NoC = "network on chip"
"Design and Evaluation of Hierarchical Rings
with Deflection Routing": pages.cs.wisc.edu/~yxy/pubs/hring.pdf
This has been a hot reseach topic in hp & scientific computer engineering for several years now.
Another really old, formerly rejected but increasingly interesting & related research topic is "computing-in-memory", also "processing-in-memory" or "near memory processing" because the costs to transfer data between processing units & memory is, as mentioned in this video, increasingly becoming a limiting factor, see
"Computing In-Memory, Revisited": ieeexplore.ieee.org/document/8416393 but also semiengineering.com/in-memory-vs-near-memory-computing/
& while the recent emergence of array processors like Googles tensor cores & other forms of neuromorphic processing units is clearly at least partly due to that, this problem isn't limited to applications using AI but applies to a much broader category of problems - the "bandwidth wall" is a thing.
@@FrankHarwald One of the biggest headaches of working with the cell BB was the relatively tiny amounts of accessable memory each SPU had (256kb IIRC). This meant you couldn't use a lot of general purpose algorithms and instead had to modify them to be streamable with high locality of reference - for some algorithms it just isn't possible to optimise in such a way.
@@SerBallister indeed, but modifying algorithms so that they run with a high amount of locality is something that you'll have to do for all data intensive algorithms anyway - no matter how much of it is done automatically, profiler-assisted or by hand - regardless of what the underlying architecture is because while all shared memory architectures will start hitting the bandwidth wall at some point, distributed memory architectures will be the only way to circumvent these limitations. & yes, this also means that algorithm that access a lot of memory from the same chunk in a purely serial way will either have to be modified to access data in parallel from multiple chunks (if possible) or remain bandwidth limited (if this is acceptable or if the algorithm is inherently serial).
@@FrankHarwald You should aim for that yeah. The SPU local memory presented an addressing barrier instead of a cache miss like on a multicores, all data has to be present in that block. Take a ps3 game for example. Some systems like physics and pathfinding can be hard to compress your game world in 256kb, the PPU had to work on that stuff and you then had the headache of pipelining the output of that into the SPU (e.g. animation) if you want to avoid stalls. Interesting chip but can be hard work, task scheduling and synchronisation is also not straight forward. I would prefer working with modern desktop multicores with shared memory.
of course it is
Man your videos always inspire me to read more computer architecture .
I have computer architecture as a subject in my bachelor's and i don't like it but your videos always inspire me to read it more.
Awesome video bud, some very interesting info there. Thank you
And ARM already has announced the SVE2 extension which is a replacement for their NEON instruction set (for home/multimedia usage instead of SVE1 which is tuned for HPC workloads). Interesting times are ahead and can't wait for ARM storming the PC desktop...
Thank you for yet more fantastic content!
I read you make (at least some?) of your own background music
(WOW!)
Thank you for your educated entertaining info!
Simply mind blown! Wow!
Thank you, amazingly educational video!!!
7:15 Finally the memory bottleneck is being some what addressed.
Thank you again for your insight, and all the info.
I am a fan. Good content man. Thanks for your research and sharing the knowledge.
Glad to see your videos again!
Damn this video had aged well... so good. Wish more videos like this were made and popular on CZcams.
Finally the first video since I subscribed! I watched all your previous videos lol
Fascinating documentary, you clearly put a lot of work into this
I stumbled upon your channel when viewing your interview of Jon Masters and have binge watched 3 episodes losing sleep. Kudos. I am learning a lot! Thank you. I havent binge watched in a long time.
Awesome video man! Ty for the great content
Impressive as always, sir. Thank you.
Those chips are cool and all, but did you see THIS? 18:04 That truck has FOUR WHEEL STEERING, now THAT is innovation
They have been on the roads for years...
there is also all wheel steering at the wheels at the back too, look at this tatra video czcams.com/video/U-ujpvOeydk/video.html
lots of 3-axle garbage trucks in europe have frontmost and rearmost steering, pivoting around the middle axle basically.
@@onebreh pog realy ?
You call that innovation? Get back to me after you google 'Spork'
Was worrying you disappeared. Glad we got a new video.
Very excited for the video on neuroscience and computing!
I've been seeing this coming for years now.
Hey, I was about to take a nap. ;D Thanks for the content!
gratzz for 100k...
Nice vid dude. Any idea on power figures between V100 and the ARM chip?
"The future is Fusion", the slogan was just 12 years ahead of the technology
Great video as always! Def a top fav tech channel! Keep em coming! ❤️👌🔥
Thanks for the fascinating video!
Amazing and interesting. Thanks for the video
Just got here. After watching this it's an instant sub. I can't wait for more.
Blew my mind, as every time.
If the utilise the newest HBM version instead of traditional DRAM for Cache it would vastly increase its processing speed and reliability but also dramatically increase production costs
I am no engineer in any shape, but with Coreteks videos I am getting such a digestible form of explanation that teaches me, even thaw i am 37yo) Thank you so much!
37 is not too late. God willing you will be learning well past 37 and even at 73.
I'm 101 years old and still learning!
@@Seskoi in base ten?
@@IARRCSim they opened schools on Mars - finally)
@@Seskoi
I'm 1,009,843,000 seconds old and I push myself every nanosecond to learn more and more
Thanks for another fantastic video!
Informative af
The shape of things to come ...
Your channel is a gold mind for computer engineers I really like your analysis and getting into details more then other channels do.
In other note, I really want to see a video about RISC V and its future in personal computing and IoT I'm currently learning RISC V assembly and planning on building a small RISC V CPU on a FPGA but I'm very curious about its future and if it worth the effort.
The quality of production and quality content are next level. Always worth the wait. Remember guys quality takes time.
@coreteks Would love to see a video on the topics from this video as relates the nVidia ARM acquisition.
congrats for your 100k subs!
YES YOURE BACK
Very interesting video and I believe you are spot on in your predictions.
I've been waiting to see hbm used on a processor! Awesome job, it was exactly what I was predicting. As always great video Coreteks.
Well, this is Absolutely Amazing.. Thank You Very Much for Sharing..
Greetings from México... !!!
The comparison with the duel intel xeons is a little silly now that they have already been blown out of the water with eypc.. still an interesting CPU tho..
I think people are going to get surprised when AMD announces Milan this year.
Also, the Frontier 1.5 exaFLOPS supercomputer will use a CPU chiplet + 4 GPU chiplets + memory in the same AMD chip.
The question is, what is more EPYC?
I agree. With how many problems Intel has been having for the last 4-5 years stagnating them on 14nm, comparing anything besides other x86 CPUs to Intel feels disingenuous.
If they compared this ARM chip to the actual current x86 performance leader (a 2U Epyc Rome server with 128 cores) it would be beaten by at least 2-3X. Maybe performance per watt would be better on the ARM chip, but the performance density would almost definitively be unbeaten.
@@BrianCroweAcolyte This isn't the first time ARM was expected to be dominate. It happened int he 90's as well. In fact Microsoft made Windows NT compatible with ARM back then. There was big promise that RiSC cpus would take over the world. Well, that didnt happen, and i still dont think it will happen today or in the future.
@@aminorityofone ARM will probably continue to dominate the market where chips are designed for purpose (unless RISC-V takes that market), mostly because x86 isn't licensed to anyone new.
Great video as always! ;)
I used to live next to the 京コンピューター Kei Supercomputer in Kobe. Very cool to see these new advancements. 👍🏻
Consumer processors will probably use HBM as sort of an L4 cache, or a base memory with a tiering system, and then still have traditional memory channels, though maybe less channels
I didn't think it was humanly possible for your voice to get any lower... You proved me wrong :)
The end sounds like Cygnus X - Positron but a bit different. Has that anything to do with the presumed change in computing you laid out in this video? If so, that is a masterful match!
5:36 Actually, the plural of “die” is “dice”.
Yes, those dice. As in the phrase “the die is cast”, which means instead of throwing several dice, you have thrown just one, and must stand by whatever it shows.
The "die is cast" comes from the middle high German/English Gutenberg printing. The printed page came from a single die cast, which is why it was slow and expensive (though cheaper than the Monks drawing each page by hand). This allowed Bibles to be printed, helped people learn how to read, and bring education to the people.
@@ehp3189 That can’t have been right. Guternberg’s innovation was the invention of movable-type printing, as in having separate pieces for each letter that were assembled to make up a page. Printing an entire page from a single block was a technique that had been invented by the Chinese centuries earlier.
@@lawrencedoliveiro9104 Granted, but the expression goes more towards the assembled type set being cast together in a block and any changes to that during a printing run were not to be allowed. It was difficult enough that breaking apart the group and then reassembling it for one letter change was more expensive than it was worth. At least that is my understanding. I liked philologogy in college but they only offered one class ...
I always thought that this might be possible but I never imagined it coming so soon. Just... wow.
A certain amount of lust for that dual socket water cooled board with the dual a64fx chips good video and thanks
Damn great video dude
A great video bro. I am a novice, I am into processors , but really confused in risc and cisc , and how they work??
When I heard "CPU and GPU" I was thinking AVX turned to 11
13:17 A64FX in the cloud.
And a few months later we see many design choices here (especially the on-chip memory) in Apple Silicon M line.
Cool, thanks.
One of your best, Mr. Soares.
The future with ARM processors looks great with this one.
Interesting that Nintendo has an indirect connection to this too. One could imagine their NSO servers running on top of an A64FX processor, maybe a future console? 🤔
Congratulations on 100K!!!!!!!!!!!!!!!!!!!!!!!
i really hope this future is not far away. it is great
Well, what were the APU ranges?
another great video!
I wonder why everyone isn't talking about this. This is fascinating and exciting.
So that is where Arnold Schwarzenegger's Terminator got (or will be getting) its CPU from.
The ending music is cool. What's the name of it?
i was looking at this video and instantly fall asleep.
I am in work.
Fuck
Time flies on youtube
Great video :)
@8:53 - 3TFlops of peak bandwidth?
Floating Point Operations x Bandwidth?
Wouldn't it be 3TB/s of Bandwidth ou 3TFlops of Throughput?
Great ep worth sharing, thank you!👏👍
A64FX is truly something very special (GO ARM)!
This advanced Fujitsu's A64FX chip variation will be in a Nintendo Switch revision in the future as a complement to the ARM's CPU and Nvidia's GPU
Where can I read about that?
I wonder if Coreteks has any more material for a follow up video. For example, how does this architecture compare to Xeon Phi (Knights Landing) - and how do these differences matter?
Incredible specs. All the computing power of a GPU with SIMD intrinsics and all the software support it already has available, I really look forward programming on these chips.
Another great video.
I didn't understand a thing in this video but I am curious to learn..where to start ?
Amazing video
Great video Celso, sounds like the A64FX chip could make for a great games console processor too??
When do you think we will see memory on chip on desktop CPU? Zen 4? Would this be more like another cache level and we will keep regular DDR system memory as I just can't see 32GB fitting on a CPU die.
By embedding the hbm on the soc, the processor has a direct link to it and can therefore shift data at much faster rates. It eliminates a huge bottleneck by shortening the bridge between the two.
With all of these extra cores, specific tasks can be assigned, and change more dynamically than ever. Think of it as a logic function: the various cores can change form based on whatever task they are doing, and without it bogging down the system like emulation, for example. All computing is algorithmic, this is just a more advanced form of it.
@@redrumtm3435 Sure, I get that, and that is a great feature/asset to have which is one of the reasons I've been super excited about this chip and following it since the mid 2018 but what I am saying is 32GB of HBM per die seems kind of small for data centre work loads so when you have a data set or even single file that is more than 32GB you will have to split it and send it to different dies which will surely be much slower than have the whole of the data sat in normal system memory where any free CPU can work on the data?
@@Speak_Out_and_Remove_All_Doubt The entire system works so efficiently that data can be transferred much faster. It can achieve much more, but with a lower buffer. The cores are designed to work as one, and vary when and where necessary to suit a particular task.
Basically, the entire thing is the beginning of ARM's transition into a hybridised cloud system that works across all devices. This is how we will cheat Moore's law, and continue to push the envelope so to speak.
@@redrumtm3435 I know Coreteks is a firm believer that ARM is coming for x86, and I agree it will take over certain areas, but I think the PC space is so heavily entrenched in x86 I really can't see it happening but it's great that it is happening on the server side if its faster and using less energy. Coreteks also loves RISC V, i'm not sure which is has the better chance of challenging x86's dominance.
100K subs soon, nice
I'm intensely curious how it handles multimedia workloads. (My bias is towards things like digital music production, where the overall description of what this chip is and does would be ideal)