CXL in Next-gen Servers Will Make Today's Servers Obsolete

Sdílet
Vložit
  • čas přidán 4. 07. 2024
  • STH Main Site Article: www.servethehome.com/compute-...
    STH Merch on Spring: the-sth-merch-shop.myteesprin...
    STH Top 5 Weekly Newsletter: eepurl.com/dryM09
    In this video, we discuss CXL, what it is, and some of the high-level conceptual models (using limes and tacos.) We also discuss why you need to be ready for CXL as we move into the Intel Xeon Sapphire Rapids, AMD Genoa, and future PCIe Gen5/ DDR5 platforms.
    ----------------------------------------------------------------------
    Timestamps
    ----------------------------------------------------------------------
    00:00 Introduction
    01:19 The Limes
    02:02 What is CXL and PCIe 5.0
    05:47 Why has it taken so long?
    07:39 CXL Protocol Trifecta CXL.io CXL.cache CXL.mem
    11:52 CXL 2.0 Adds Switching and Pooling
    12:53 CXL Type Examples
    18:23 A Game-Changing CXL Example Exercise
    22:53 Why you need to plan today for CXL
    24:27 Wrap-up
    ----------------------------------------------------------------------
    Other STH Content Mentioned in this Video
    ----------------------------------------------------------------------
    - What is a DPU? • What is a DPU - A Quic...
    - Samsung CXL Memory Expander www.servethehome.com/samsung-...
    - Phison S10DC SSD www.servethehome.com/testing-...
    - CXL on STH www.servethehome.com/?s=cxl
  • Věda a technologie

Komentáře • 273

  • @beauregardslim1914
    @beauregardslim1914 Před 3 lety +128

    This is, by far, the most effort I have ever seen someone put into expensing lunch.

  • @dustinphillips605
    @dustinphillips605 Před 3 lety +131

    I don't think the taco, lime, "soda" mechanism made it easier to understand. But I do fully support this if it was used as a justification for getting delicious tacos.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety +9

      Great analysis :-)

    • @dktol56
      @dktol56 Před 3 lety +3

      Sadly, Rubios succumbed to the pandemic in my town :-(

    • @cdoublejj
      @cdoublejj Před 3 lety +2

      I thought that too and got lost BUT, at the end with with only get one lime plus the beer made so much more sense. Basically it's just resource sharing whether they're on the host or not

    • @DivaAnnFisher
      @DivaAnnFisher Před 6 měsíci

      What junk. How does this guy make a living with such gibberish?

  • @wmopp9100
    @wmopp9100 Před 3 lety +33

    I understood CXL up until the lime analogies started

  • @rem9882
    @rem9882 Před 3 lety +86

    From your explanation, you've pretty much said I need to get some tacos

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety +22

      I did this a few days ago... and I feel like I need tacos again.

  • @michalpbielawski
    @michalpbielawski Před 3 lety +41

    When life gives you limes, make some weird server hardware analogies

  • @ReQuiem_2099
    @ReQuiem_2099 Před 3 lety +60

    I’m really loving the extra effort put into the STH channel and think that you guys get criminally under appreciated. There is so much half-assed crap out there playing the CZcams algorithm game with little to no added value to the community. Hoping for bigger view counts, etc. but mostly just do you and know you’re appreciated!

    • @Warrigt
      @Warrigt Před 3 lety +3

      criminally?! good luck with that suit.

    • @jolness1
      @jolness1 Před 3 lety +3

      @@Warrigt It's an expression. You must be fun at parties.

  • @HydraulicDesign
    @HydraulicDesign Před 3 lety +15

    This comically tortured analogy is awesome.

    • @PoeLemic
      @PoeLemic Před 3 lety

      Yeah, it was a little of a stretch, but he expenses off a big lunch ... I bet ... he he he ...

    • @ttb1513
      @ttb1513 Před rokem

      If he had used lemons more instead of limes, I would have soured on the analogy.

  • @vld
    @vld Před 3 lety +10

    Technical content is great. Drop the limes. Cute, but gets in the way.

  • @stevetaylorftw
    @stevetaylorftw Před 3 lety +12

    For me, the lime metaphor made it harder to understand. I applaud the effort, but would appreciate a non-lime version.

  • @firworks
    @firworks Před 3 lety +21

    "So what if we take that lime that's in our *host taco* and utilize that lime juice for our beverage."

  • @colonelangus7535
    @colonelangus7535 Před 3 lety +6

    I get a kick out of the range of people doing wonderfully thought out and informative youtube videos.
    Craft Computing, "Let's mix a double strong cocktail during the video"
    Serve The Home, "The lime goes in the soda"
    :)
    Looking forward to 2028 when I can afford servers from 2022 with this tech.

  • @andrewwong2000
    @andrewwong2000 Před 3 lety +5

    Now I get where I've been going wrong all this time. Lime goes in the Taco, not the Soda

  • @VraccasVII
    @VraccasVII Před 3 lety +8

    What does the security side of this look like? This sounds amazing, but die idea of every device sharing the same memory also sounds like an attack vector
    Excellent explanations by the way

    • @creker1
      @creker1 Před 3 lety +1

      The standard supports encryption and data integrity. All communication between devices can be AES encrypted with integrity checks. The threat model mainly covers sniffing on the bus and injecting something. It also mentions replacing devices with malicious one's but it doesn't cover how exactly it protects against that. Public keys are not in the scope of the standard and key exchange looks to be dynamic.

    • @AndrewFrink
      @AndrewFrink Před 3 lety +1

      @@creker1 I think the bigger issue here is how do you control what memory can be accessed by each device. I'm sure the idea here is to limit the copies between devices as well, CPU to nic: "Here's a memory address, send XYZ number of bits down the network cable please. NIC: sure thing, but don't pay any attention to me reading XYZ+256bits... CPU:???

    • @creker1
      @creker1 Před 3 lety +1

      ​@@AndrewFrink that's outside the scope of the current standard. I don't think they will try to cover it. It explicitly says that all the connected devices are in the trust boundary and it doesn't cover badly implemented devices. The only thing it covers is sniffing and injecting data while it travels between devices and swapping devices with malicious ones. You can always use other techniques to further protect the communication. Like IOMMU. It allows CPU to map a part of its memory to certain virtual address space that a device can access. That way it can't read some random address like you can with regular DMA. Device can't just go and read CPU memory. It has to through memory controller.

    • @creker1
      @creker1 Před 3 lety

      @@BrianStewart126 what else can they do? CXL is an interconnect. It can only protect what goes over it. Like HTTPS protects your from eavesdropping but can’t protect you from site making attacks against you. CXL mainly reuses what PCIe IDE provides and it also doesn’t cover anything beyond inflight data encryption and integrity. They also can’t do anything beyond that. If you can’t trust devices then maybe don’t put them into your servers? You have to have trust boundary somewhere.

    • @Waitwhat469
      @Waitwhat469 Před 2 lety

      @@creker1 is the interconnects between devices managed by the host, or is it more like an open switch?
      If controllable then you could limit connects between difference devices (the big reason I see this being useful is more so getting closer to trustlessness in vms and containers, where you could adjust what can use certian devices based on what is currently using them
      so say process A and B is ran by one tenet, but process C is ran by another
      You may trust the hardware, but when tenet 1 is running on the hardware and given some control of it, this break the trust boundary of tenet 2, but you don't want to limit the number of process that can be ran on them to 1 as that would mean you would need to double the hardware to run tenet 1's workload. With proper tagging, though, you could, like you can with entire nodes in kuberenetes, taint hardware and only allow some process to schedule to them based on acceptable taints.

  • @MarcDoughty
    @MarcDoughty Před 2 lety +3

    I can see CXL being very useful in consumer hardware. Imagine a CPU with 8GB of very fast RAM on the package and a CXL bus to expandable memory. Imagine rethinking 'vram' 'swap' and 'hibernate' entirely by tiering memory between the on-package, expandable, and storage-backed options over a CXL bus.

    • @axe863
      @axe863 Před 2 lety

      Machine learning applications are insane. There's a new field of ML using Semi-Randomized sparse ReLUs feature expansion with sparse linear regularization. The Semi-Randomized sparse ReLUs feature expansion approaches Deep Neural Networks with the sparse linear regularization weeding out shitty features in a very fast manner. It's insanely RAM intense but, by circumventing costly nonlinear optimization, for more computationally inexpensive.

  • @Sirus20x6
    @Sirus20x6 Před rokem +2

    sometimes the best analogy is not to use any analogy, but just explain the technology straight up without abstractions

  • @MrBikeagraman
    @MrBikeagraman Před 3 lety +6

    "Soda" has food value, but food has no "soda" value.

  • @tinfever
    @tinfever Před 3 lety +5

    In order to understand CXL, we will be using a hypothetical perfectly spherical lime. Implementation using non-spherical limes or other fruits will be left as an exercise for the reader.

  • @ArchaeanDragon
    @ArchaeanDragon Před 3 lety +5

    CXL.io is essentially the out-of-band signaling for setting up, configuring, discovery, and tear-down of CXL devices, while CXL.cache/CXL.memory are the in-band transfers and communication of actual data.

  • @vtr8427
    @vtr8427 Před 3 lety +2

    Wonderful man this talk you paced yourself well . 👍

  • @3dduff
    @3dduff Před 3 lety +4

    I am dealing with servers as part of my render farm for render nodes as well as storage. The HUGE revelation I had listening to your lime example is Video ram. we are sometimes limited on what we can render all in one frame. A centralized CXL changes all of that because it would be all SYSTEM memory and the GPUS would draw out what they need. and in speaking of memory pools, This would be huge for the simulation side too. Simulations can take up hundreds of gigs a list space making their simulation caped at what the CPU or GPU can handle on one task, CXL could lift that limit. and speaking of limits keeping that rendering and simulation data LIVING in a centralized location, accessible to multiple computing devices would have a huge impact on speed. I'm sure it's going to take a few years for the hardware to trickle down to studios like mine, but I would not be surprised to see a whole range of new software that is made to exploit this new technology. But this sounds familiar....how does this compare to HP's original vision for their ProLiant servers?

  • @davidtolley1374
    @davidtolley1374 Před 3 lety +1

    Crushed (squeezed?) it with the lime analogy. Made sense to me. Great video.

  • @Banner1986
    @Banner1986 Před 3 lety +4

    I'd try making analogies to networking for for explanations like this, where trying to explain either a protocol/communication methodology - for instance, how does something know that something else has cxl available? Broadcast advertisement. Etc.
    Most folks in the IT world I'd think would have enough understanding of networking to make quick and easy sense using such an analogy, and it's easy to lose track when taking something completely unrelated.
    Especially when hungry. I feel like I understand cxl pretty well, but I'm starving and had to watch it a few times to make sure I could follow where you were going with it.. kept thinking "man... chicken tacos sound soooo good right now..." 😅

  • @Night_A
    @Night_A Před 3 lety +8

    This sounds a lot like those "Server on a Chip" (or module) concepts from several years back.
    Do you think this might lead to a form of mainframes making a comeback in the general market?

  • @NTmatter
    @NTmatter Před 3 lety +2

    20:55 Following the metaphor, the DPU would encapsulate the lime in a coconut for permanent storage. Given the latency involved, would this process end with a callback telling the processor what to do?

  • @Nobe_Oddy
    @Nobe_Oddy Před 3 lety +1

    That was a pretty good analogy... I think lol - I'm pretty sure I understand what CXL is, well the basics of it...
    When I first saw the limes on to of the camera, where the light would be, I thought you were gonna say 'It's time for CXL to be in the LIMELIGHT!" lol :D

  • @stalbaum
    @stalbaum Před rokem +1

    Btw, after parallelization and virtualization, this is next level of how we mitigate (I almost said defeat) Von Neumann bottlenecks and even Moore's law.

  • @conquerordie230
    @conquerordie230 Před 3 lety +1

    Thank you for making this explanation approachable for the layman. How do you see this technology affecting Resizable BAR (is that even a good analogy?)?

  • @mlaubenthal
    @mlaubenthal Před 3 lety +1

    I love the tacos and lime analogy!

  • @chromerims
    @chromerims Před rokem +1

    I'll play . . . after peak deployment of CXL in the future, there well could be a reactionary trend to revive and mix in a sprinkling or more of onboard, "un-disaggregated" memory . . . to achieve greater performance and even more robust (possibly battery-backed) persistency.
    Excellent video 👍
    Kindest regards, friends and neighbours.

  • @d-fan
    @d-fan Před 3 lety +2

    brb, plugging a lime wedge into my RAM slot

  • @CrazyLogic
    @CrazyLogic Před 3 lety +3

    Super - does this mean that in the future, with multiple CXL connected processors, RAM, and other devices, that we'll be heading back towards 'mainframes' where everything is essentially run on one self consistent host rather than say a cluster of hosts with a few NUMA nodes each? Is it similar to how IBM does it's Z mainframe?
    Can you imagine how much compute power a cluster/swarm/mainframe will have if you took any number of cs-2's from Cerebras, and CXLed then with say 1000 EPYC 64 core... Just mind boggling.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety +1

      Mainframes are a bit of a different game, but there are key concepts especially as next-gen servers scale (a lot)

    • @creker1
      @creker1 Před 3 lety +2

      CXL is about node-level, connecting devices inside a single server. What you're thinking is Gen-Z which is about taking that idea and stretching it between nodes. But even then, no, I really don't think we will have illusion of rack or multiple racks working like a single host. Maybe we will have NUMA division where scheduler is very careful about migrating workloads between nodes. It's simple physics. Inter-node latencies are too big to simply treat it as a single box. You will always have to think about it and make necessary optimizations to keep data local to compute and go over fabric only when absolutely necessary.

    • @CrazyLogic
      @CrazyLogic Před 3 lety

      @@creker1 totally agree with latency issue, but even if they are split time domains/NUMA nodes, CXL looks to be a lower overhead for node to node, rather than IP or infiniband ect. The future looks fast :)

    • @creker1
      @creker1 Před 3 lety +1

      @@CrazyLogic CXL probably can't scale beyond one server simply due to protocols. If we look at Gen-Z, they don't use ethernet or infiniband but they did implement something similar to them. To build a node-to-node fabric you need to be able to route packets, address many devices, handle failures (optics can and will fail). That's why Gen-Z comes with a whole bunch of infrastructure hardware - bridges, switches, gateways.

  • @ElijahPerrin80
    @ElijahPerrin80 Před 3 lety +1

    I wondered if this would ever happen, thank you.

  • @1funnygame
    @1funnygame Před 3 lety +4

    I hope this fixes the large amounts of ram assigned to you when you use high end CPU's in the cloud. Will hopefully lower costs for compute intensive workloads

  • @PrestonBannister
    @PrestonBannister Před 3 lety +2

    The local Rubios thanks you for your product placement. :)

  • @thomasbonse
    @thomasbonse Před 3 lety +1

    I think a hotpot would've been a more appropriate metaphor for csx.

  • @zrodger2296
    @zrodger2296 Před 2 lety +1

    The problem with this is that I think the lime transferred some beer into your taco! (Though maybe that's not a bad thing.) 😏

  • @youtubecommenter4069
    @youtubecommenter4069 Před 3 lety

    Hey Patrick, nice explanation, 6:29. Do you see NVMe M.2 SSDs now at PCIe x 4 remaining so or will these do PCIe x 2 with Gen 5.0/ CXL "to free up CPU lanes" as per the dead cool fact you dropped, 6:44?

    • @creker1
      @creker1 Před 3 lety +1

      M.2 are being fazed out of server market. EDSFF is going to replace it and there x4 would be the minimum. It doesn't really makes sense to reduce lane count. PCIe drives scale with new PCIe versions and people need all the performance they can get. And it's just simple to do it that way for compatibility.

  • @stalbaum
    @stalbaum Před rokem +1

    Thinking in tacos and lime is part of the reason California keeps inventing today the worlds you will live in tomorrow. I'm serious. And Siracha, curry, bol-ko-kee... Even milk toast. Together we invent new worlds.

  • @PanduPoluan
    @PanduPoluan Před 3 lety +1

    That "soda" looks mighty sus... 🧐
    I think that's an Impostor!

  • @scheimong
    @scheimong Před 3 lety +2

    I'm honestly super eager to see this technology mature and propagate down to the consumer market. Combined with the recent push for right to repair, it will surely make PC upgrading a much more practical and appealing option.

    • @creker1
      @creker1 Před 2 lety

      CXL brings nothing that would benefit right to repair. PCs are already extremely practical and upgradable. There's nothing to gain there.

  • @amateurwizard
    @amateurwizard Před 3 lety +1

    This has the potential to be HILARIOUS, skip to half way without the context of the first half. I don't mind watching it 1.5x...You're welcome! 👍

  • @MourningLobster
    @MourningLobster Před 3 lety +1

    Great explanation 🤠. Subbed 👍.

  • @tanmaypanadi1414
    @tanmaypanadi1414 Před 3 lety +3

    I wish we had a wild Wendell dropping by in the videos 😉

  • @rjdp3
    @rjdp3 Před 2 lety

    Next- ServeTheFood opens!
    Seriously, thanks for the food stretch

  • @Fee.1
    @Fee.1 Před 3 lety

    Sorry if I miss this but will it be a game changer/or affect personal PCs at all? Or not really

  • @Openspeedtest
    @Openspeedtest Před 3 lety +1

    So My next server need some lime, soda and tacos to run.

  •  Před 3 lety +1

    Have you kept the C70 alongside this nice FX3?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety +2

      This was filmed on the C70. I wanted to use the FX3 as a lighter mobile camera.

  • @KenHihihi
    @KenHihihi Před 3 lety +2

    holy shit, you explained perfectly ! hope this new technology will implemented very very soon

  • @IAMSolaara
    @IAMSolaara Před 3 lety +3

    I recently rewatched the IBM POWER10 video and I remember hearing something similar to this. Am I correct?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety +3

      Yes. There is a quick mention of OMI in here. The big difference is that this is what the industry is doing, IBM just was in front doing its own thing.

    • @creker1
      @creker1 Před 3 lety +1

      Yep, CXL is very similar to PowerAXON/OpenCAPI

  • @SinisterPuppy
    @SinisterPuppy Před 3 lety +1

    And with one video my home lab feels inadequate. Thanks for the video; can't wait to see what performance improvements CXL will provide.

    • @p3chv0gel22
      @p3chv0gel22 Před 3 lety

      Ha. My homelab can't feel that way
      Because it doesn't exist (cries in salary of an it trainee)😅

  • @capability-snob
    @capability-snob Před 3 lety +1

    I need to find out if layering over pci-e has kept the protocols capability secure. That's going to be a fun fishing exercise.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety

      Check out the CXL 2.0 security features including encryption.

  • @gfeie2
    @gfeie2 Před 2 lety

    This channel makes me so happy :)

  • @robertharker
    @robertharker Před 3 lety +2

    Great video. I liked the limes. But what about habanero peppers and avocados? Or fajitas and burritos? How will CXL integrate them?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety

      I have never tried habanero peppers in a "soda" but I imagine someone has.

    • @robertharker
      @robertharker Před 3 lety +1

      @@ServeTheHomeVideo As with all things, some people think about the sodas, some people think about the food. Each to their own and better yet when they share.

  • @MoraFermi
    @MoraFermi Před 3 lety +1

    So it's essentially SAN 2.0, with the difference that devices may simultaneously provide and consumer resources.

  • @edouard1580
    @edouard1580 Před 3 lety +3

    Isn't this going to put more strain on the memory bandwidth? Since CPUs spend a lot of time waiting for host memory data to come it seems like CXL could make this problem worse.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety +5

      In some ways. Something many folks underestimate is just how much effort is being spent on next-gen I/O.

  • @nobodyspecial7097
    @nobodyspecial7097 Před 2 lety +1

    I’m drinking a Dogfish Head 120 Minute “soda” watching this

  • @kickbul
    @kickbul Před 3 lety +2

    Came to learn about a cool new technology, left hungry!

  • @BoomChockolaca
    @BoomChockolaca Před 3 lety +3

    Great example with limes and "sodas", I really love the personal touch you've adding in the last videos!
    Aaaand, yeah, now I want some "soda" too :)

  • @PpVolto
    @PpVolto Před 3 lety +2

    My CXL Analogie: You have a Connector Board, a 64GB DDR5 Ram Board, a x86 CPU Board and a Arm CPU board the CPU boards only have 1GB to Function. Now your x86 board needs 16 GB Ram and request that from the Memory Board at the same time the ARM Board uses 32 GB Ram on the same Memory Board. Now you add a GPU Board with 8GB Ram but the GPU needs more it requests aditional 16GB ram from the Memory board. The next step ist the GPU need more Ram and you add a Second 64GB Ramboard now your GPU can Request 32 GB Ram and it switches with the full 32 GB to the new Memoryboard or only request 16 GB additional Memory from the new Board. The second scenario is You add a GPU Board with 128GB Ram but your GPU only needs 12 GB now your x86 CPU board can request 64 GB ram from that GPU board, now you add a Second GPU Board with 256GB of Ram and that GPU needs 300 GB Ram the first 256GB are on board and the 44 GB are on the first GPU.

  • @robcannon1215
    @robcannon1215 Před 3 lety

    Is it fair to say that this only seems realistic if running kubernetes? And if already running k8s, how dramatic of a shift would using CXL be?

    • @creker1
      @creker1 Před 3 lety

      CXL is limited to running inside a server. Kubernetes or any other distributed computing platform doesn't apply here.

  • @MisterRorschach90
    @MisterRorschach90 Před 3 lety +2

    It’s crazy to think that in 1-2 years they will have high end consumer motherboards with pcie gen5, they will probably have built in 25,40, or even 100gbe nics on the prosumer stuff. They will have bandwidth for multiple usb4 40gbps ports or usb5 if that comes out. And we will potentially have nvme drives that push 14gbps read and write. I hope intel does something with optane on the prosumer and consumer side for gen5. Seems they only have super expensive server stuff for gen4. It is stupid fast and reliable though.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety +1

      Probably not 100GbE NICs onboard on the prosumer side. The challenge there is really just the fact one would need to go fiber and most prosumers do not have the network switches/ devices for 100GbE. Power per bit goes down, but overall power goes up with 100GbE as well.

  • @AI-xi4jk
    @AI-xi4jk Před 3 lety +3

    Those “sodas” are waiting impatiently for the filming to be over.

  • @C3Cooper
    @C3Cooper Před 2 lety +1

    I wonder how long it will take before someone designs a cache/side-channel attack.

  • @jdl3408
    @jdl3408 Před 3 lety +1

    Now I have to go to Rubio’s this weekend…

  • @JustBiTurbo
    @JustBiTurbo Před 3 lety +1

    Is this similar to the current apple silicone implementation?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety

      Apple does not have to deal with this with the M1. This is mostly a technology for much larger systems.

  • @PoeLemic
    @PoeLemic Před 3 lety +1

    I really loved this overview. Helps me understand this in an overview fashion. Also, I love using analogies, and I laughed at how you used it. Very good work. You're a Teacher at heart.

  • @bw_merlin
    @bw_merlin Před 3 lety

    So CXL will only used for sharing memory? For things like composable infrastructure which uses PCIe, will PCIe continue to be used or will CXL replace PCIe for this?

    • @creker1
      @creker1 Před 2 lety

      CXL is PCIe. It's an extension based on the same protocol and physical layer. CXL is not meant for composable infrastructure. CXL only works inside a server. For composable infrastructure you need different technologies like Gen-Z.

  • @seylaw
    @seylaw Před 3 lety

    I'd like to see the Gen-Z connector sooner or later making it to the market. While compatibility with older gear is great, it comes at a cost.

  • @suntzu1409
    @suntzu1409 Před 3 lety +2

    This thing hodls great potential

  • @creker1
    @creker1 Před 3 lety +2

    I really doubt that SSD example will actually be implemented. I just don't see the point. SSDs need DRAM (if we're talking about something fast and not slow QLC or something for cold storage) and they need it to be very close due to latencies. CXL is simply gonna be too slow and cause too much overhead. We already saw this with Linux IO scheduler that only causes bottlenecks with NVMe. And that's CPU with its own RAM without any CXL fabrics. That's the same reason we have multiple levels of cache. It's a necessary optimization.
    What's actually much more relevant example, I think, is CXL being cache coherent interconnect for various compute resources. Servers already have ML processor, GPUs, CPUs, FPGAs and usually all of them work on the same task. They need to communicate and share data. Right now everyone uses different home-grown solutions to tackle this - Google TPUs use DMA engines and HBM with some weird interconnect; NVIDIA uses nvlink; AMD uses infinity fabric. FPGAs don't have anything at all, I think. All of these will be replaced by standard CXL fabric that will actually make everything interoperable. Intel already planning to use CXL as an interconnect between their Xe GPUs just like NVLink. But even then we will have the same problem - it's too slow to go over CXL to read or write someone else's memory. Like always software and hardware will try to keep everything in their local RAM as much as possible and utilize CXL fabric only to transfer necessary data.
    I also think we can't talk about CXL without mentioning Gen-Z. With recent partnership it completes the whole picture and stretches that coherent interconnect to datacenter scale. The dream about having boxes of CPUs, GPUs, FPGAs, NVMe all in separate racks and being able to provision them independently will finally come true.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety

      The write cache on SSDs that is power-loss protected is designed to enable fast/ safe sync writes. Writing to SCM/ Optane provides that while allowing time to flush to NAND at a slower pace, but also multiple NAND devices potentially beyond just a single machine. That model is already in production using Optane so it is a stretch, but one that is already deployed using existing technology.
      You are totally right about GenZ as well as accelerators. Accelerators are going to be the leading edge in CXL 1.1 devices next year.

    • @creker1
      @creker1 Před 3 lety +1

      @@ServeTheHomeVideo that's not the only reason SSDs need DRAM. Probably the main one is RAM for controller to store mapping tables. Without that and going over CXL they gonna be severely crippled. As for write cache, yes, that works and already in production. But it feels like moving the problem to a different place and not solving anything fundamental. You get cheaper SSDs, yes, but you then pay enormous money for optane. Not much of a gain IMO. What Optane brings to storage solutions is very low latency write cache, as you said, that you simply can't get with any NAND SSD. That's the value, I think. But then again, we don't need CXL for that.

  • @hgbugalou
    @hgbugalou Před 2 lety

    Instructions unclear. I now have tortillas jammed in my DIMM slots and a lime skewered on a heat pipe.

  • @mamdouh-Tawadros
    @mamdouh-Tawadros Před 3 lety +1

    Thank you for the clarification. But my question is that not all memories are made equal, so the presumed pool is a mixture of varying memory speeds, latencies, and generations.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety +2

      You have to think of next-gen servers as much larger. Core counts, TDP, and I/O go way up. So even if CXL has more latency than a host DDR5 controller, it is still useful if it gets you a larger pool of memory in a system.

    • @creker1
      @creker1 Před 3 lety

      Yes, CXL does cover that. There's whole QoS, back pressure, credit system and other stuff. Also different memory pools can expose through ACPI tables the exact latency and bandwidth so that the system can appropriately schedule and balance workloads between them. For any performance sensitive workload it just doesn't work as a simple unified memory pool.

    • @creker1
      @creker1 Před 3 lety

      @AstroCat devices will always have their own caches. It just doesn't work any other way. What CXL allows is keeping these caches coherent with various memory pools. For complex devices it will be done completely in hardware and will be transparent to the software. Like GPUs. They will always have multiple levels of cache and private memory pool. CXL can't and will not change that.

  • @maxhammick948
    @maxhammick948 Před 3 lety +2

    So hypothetically: we could have some CXL-enabled device that accepts a few m.2 ssds; then plug an m.2 to standard PCIe slot adapter into that; then plug a PCIe to usb expansion card in that; which will finally enable us to use the hard drive on a first gen ipod as system RAM? This calls for celebratory tacos 🌮🌮🌮🍋

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety +2

      One of the next videos will be on how in the next-gen servers U.2, U.3, and M.2 are going away

  • @bronekkozicki6356
    @bronekkozicki6356 Před 3 lety +1

    Here is a man who likes his tacos with lime juice.

  • @playdoh1975
    @playdoh1975 Před 3 lety +1

    Good I’ve been looking for a deal on eBay🥳

  • @gravesclay
    @gravesclay Před 2 lety +1

    So DMA but in the physical layer. You could have opened with that, and not wasted that precious margarita fuel.

  • @swyftty2
    @swyftty2 Před 3 lety +1

    Lets hope our 6x ram devices expand to more server devices for cxl than just gpu's

  • @JGoodwin
    @JGoodwin Před 3 lety

    What I heard: CXL = Tacos + Limes + soda. Key features: juicing, slicing, sharing
    Easier for my brain: CXL = a shared high speed memory bus. Key features: less copying, more flexibility (aka pooling)
    Do I understand correctly that it essentially context switches with the PCIe bus?

  • @bluespeck9119
    @bluespeck9119 Před 3 lety +3

    I'm just here for the tacos.

  • @Ian_Carolan
    @Ian_Carolan Před 3 lety

    OK, so dynamic resource sharing?

  • @computersales
    @computersales Před 3 lety +1

    I didn't care for the comparison since it just left me confused. Is that basically a new memory allocation/utilization standard through PCIe 5?
    Also there are easier ways to get tacos as a tax deduction ;)

    • @markhahn0
      @markhahn0 Před 2 lety

      Yes. the main point of CXL is coherent memory access.

  • @semosesam
    @semosesam Před 3 lety +3

    Damn, those tacos looked amazing...

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety +2

      They were very good. It was also very hard to let them just sit there to take the photos/ b-roll.

  • @paulmichaelfreedman8334

    Will CXL also be implemented in consumer Boards/CPUs?

    • @youtubecommenter4069
      @youtubecommenter4069 Před 3 lety

      It gets bequeathed down the line after a few cycles of use in big data and a few server versions later.

  • @kikeaMoldova
    @kikeaMoldova Před 3 lety +1

    Looks like that this video was sponsored by Taco! ;)

  • @iham1313
    @iham1313 Před 3 lety +3

    now i want shrimp-taco-cxl's!

  • @shaunlunney7551
    @shaunlunney7551 Před rokem

    Looking forward to this eventually landing on AM5. Give the (non HEDT) CPU, Quad channel DDR5! or even GDDR6/7 access for the CPU!

  • @Entity8473
    @Entity8473 Před 3 lety +1

    My local limes are way too acid to call delicious but great for the cold and flu.

  • @Felix-ve9hs
    @Felix-ve9hs Před 3 lety +3

    _limes_

  • @jms019
    @jms019 Před 3 lety +2

    What about oranges and grapefruit ? I like those too

  • @youtubecommenter4069
    @youtubecommenter4069 Před 3 lety

    "That DPU could manage the flashing of the persistent memory", 21:05? If the PMEM is already CXL, why do that? Wouldn't it be better to build an on chip controller on server class CXL PMEM instead of sending flash back to NAND?

  • @TurboVisBits
    @TurboVisBits Před 3 lety +1

    Just ordered food .. thanks

  • @GGBeyond
    @GGBeyond Před 3 lety +1

    So if I'm understanding this correctly, I can eat my servers next year?

  • @OVERKILL_PINBALL
    @OVERKILL_PINBALL Před 3 lety +2

    So you don't put the lime in the coconut anymore?
    :P

  • @hariranormal5584
    @hariranormal5584 Před 3 lety +1

    What is the camera with the fruits on top xD

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety

      Sony FX3 with a 24-70mm f2.8 Sony G Master lens. That is what I used for the TinyPilot Raspberry Pi video czcams.com/video/l7i-hm_E2ls/video.html

    • @hariranormal5584
      @hariranormal5584 Před 3 lety

      @@ServeTheHomeVideo
      Thank you sir :p

  • @MurchyMurch
    @MurchyMurch Před 3 lety +1

    What is it Tuesday or something??

  • @filovirus1
    @filovirus1 Před 3 lety

    I imagine CXL implementation for PC: need more CPU? buy cores incrementally; need more memory? buy more and add them hot-swap; need more GPU? same story plus no more HDD because everything is persistent. and run multiple different OS simultaneously on the same physical system - in effect, I become my own hyperscaler and open for business!

  • @lost4468yt
    @lost4468yt Před 3 lety +3

    Is this why PCIe 5.0 is coming out so quickly after PCIe 4.0? The time gap seems much smaller than between previous generations.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety +2

      CXL is one reason. The bigger reason though is that I/O was severely bottlenecked with PCIe Gen3. A hyper-scaler can do a 400GbE NIC in a PCIe Gen5 x16 slot as well which will help with next-gen systems. PCIe Gen6 will be a quicker transition as well.

    • @creker1
      @creker1 Před 3 lety +1

      @@ServeTheHomeVideo I always wondered, why are we still limited to 16 lanes on PCIe? Yes, there're some proprietary wider ports and standard does support x32 but actual hardware is still stuck at x16. Looking at how NVLink scales to ridiculous speeds it feels like PCIe is intentionally crippled by that with no good reason.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  Před 3 lety

      NVLink uses a TON of power. We did the 8x P100 but then saw a big jump on the 8x V100 machine. On the 8x A100 machine we have in the lab now, same story.

    • @lost4468yt
      @lost4468yt Před 3 lety +1

      @@ServeTheHomeVideo Oh right very interesting.
      Do you know what tech is used for the submarine cables? E.g. the Marea cable can transfer at 26.2 terabits/s, with 8 pairs of fibres inside of it. That's 3,275 gigabits/second in a single strand of fibre.
      What do they use for those? FPGAs? ASICs?

  • @LiraeNoir
    @LiraeNoir Před 3 lety +6

    I had some basic understanding of what CXL was. Then I listened to 5 minutes of lime analogy, and now I'm thoroughly confused...

    • @CheapSushi
      @CheapSushi Před 3 lety +1

      Gotta go eat a taco right after to truly know.

  • @MatthewHill
    @MatthewHill Před 3 lety +1

    So... only five or six years before this lands in my homelab and starts accelerating my Plex server and my NAS.

    • @tommihommi1
      @tommihommi1 Před 3 lety

      idk, the whole re-sizeable BAR hype we just had for consumer GPUs basically is a predecessor to what this technology could bring. The next gen of consumer CPUs and GPUs implement PCIe 5, after all. The dramless SSD argument also works for consumers.

  • @berndeckenfels
    @berndeckenfels Před 3 lety +4

    Nobody will give me my time wasted watching this back. But now I want a Limed Taco