Unboxing 100Gb NIC | How to setup Mellanox CX455A in CentOS 8

Sdílet
Vložit
  • čas přidán 22. 05. 2024
  • This video introduces a 100Gb NIC combo kit that includes 2 HP branded Mellanox CX455A single port 100Gb network cards, and a DAC cable to connect point-to-point. I'll do a short unboxing of the CX455A network card and then show you how I install it in my Dell PowerEdge R630. I'll then show how to setup the drivers and utilities in CentOS 8, and update the firmware along the way. Finally, we'll wrap with a iperf benchmark across the 100Gb link.
    Video Index:
    0:00 - Introductions
    1:07 - Unboxing 100Gb NIC
    2:34 - Installing the Mellanox CX455A NIC in a PowerEdge R630
    5:07 - Downloading the Mellanox drivers/utilities and the HP firmware for this CX455A.
    9:37 - Setup Mellanox drivers and utilities (and update firmware)
    19:12 - How to convert from Infiniband to Ethernet mode.
    25:19 - Running iperf benchmark on 100Gb link
    27:58 - Final wrap up
    100Gb NIC combo kit: ebay.to/3cvbIxU
    If you'd like to support this channel, please consider shopping at my eBay store: ebay.to/2ZKBFDM
    eBay Partner Affiliate disclosure:
    The eBay links in this video description are eBay partner affiliate links. By using these links to shop on eBay, you support my channel, at no additional cost to you. Even if you do not buy from the ART OF SERVER eBay store, any purchases you make on eBay via these links, will help support my channel. Please consider using them for your eBay shopping. Thank you for all your support! :-)
  • Věda a technologie

Komentáře • 50

  • @HAKA...
    @HAKA... Před 3 lety +24

    For anyone thinking about 10/40/100Gbps you need to think about how fast your storage runs, lot of people install 10+Gbps and then complain that the speeds are not what they see in iperf.

    • @ArtofServer
      @ArtofServer  Před 3 lety +2

      Good point!

    • @drtweak87
      @drtweak87 Před 3 lety +1

      Was about to say the same thing! Unless you got 100+ Gb of Storage, and to get that you probably need some mass array of SSD's, no point of doing 100Gb.

    • @fiskfisk33
      @fiskfisk33 Před 2 lety +2

      I upgraded to 10 at home because the 1G was really easy to saturate, even with my spinning rust. I'm not close to saturating the 10 but it was still a worthwile upgrade

  • @andrewkaplan149
    @andrewkaplan149 Před 2 lety +3

    That little kitty background is amazing. Thanks!

  • @RootSwitch
    @RootSwitch Před 3 lety +6

    Great video. Was going to say "the OFED driver install offers to update the firmware", but then you covered that using the HP specific firmware. The built in FW update does work pretty slick for OEM Mellanox ConnectX cards in my experience.
    While admittedly 100Gb/s direct attach IB isnt terribly practical for most uses, it is a solid way to simulate a 2 node HPC or machine learning environment that leverages RDMA. With IB cards, OFED, CUDA, NCCL, OpenSM, and an Nvidia GPU on both servers, you basically have the basis for a mini compute cluster. The 40Gb/s ConnectX3 cards are cheap. Helped me a ton when I was trying to wrap my mind around IB and a 3 node Nvidia DGX2 cluster I assisted in deploying.

    • @ArtofServer
      @ArtofServer  Před 3 lety

      Did you have that in a ring topology or with a switch in a hub and spoke topology?
      Glad you enjoyed it. Thanks for watching!

    • @RootSwitch
      @RootSwitch Před 3 lety +1

      @@ArtofServer So I wound up only using 2 nodes so they were directly attached with just one IB port. Nvidias NCCL for testing GPU to GPU communication assumes that every node can talk to eachother over all active IB ports, so I never got a ring of 3 nodes to work. I thought about picking up a Mellanox branded 40Gb IB switch to emulate the 100Gb switch that the DGX deployment had, but ultimately decided against it.

  • @501Bakky
    @501Bakky Před 3 lety +2

    Very cool!

  • @kaagyk3386
    @kaagyk3386 Před 3 lety

    wait for RoCE ,,hope next video

  • @andriitarykin9567
    @andriitarykin9567 Před 3 lety +1

    Thank you!

  • @muhammadalimilah
    @muhammadalimilah Před 3 lety +1

    :-) Omg thanks for sharing

  • @drtweak87
    @drtweak87 Před 3 lety +2

    One other thing to point out form what i remember on LTTs channel is also CPU bottlenecking. I remember when they were doing either 10 or 40Gb networking they had a LOT of issues getting there and issue was due to multiple things needed to be configured to get that like enabling jumbo frames and other tweaking and also that the CPU's couldn't even handle it and had to upgrade the CPU as well. But sure what CPUs you got but you got two, but then wouldn't all that bandwidth be tied to the CPU on which the card is tired to on the PCIe lanes?? at least one would think.

    • @ArtofServer
      @ArtofServer  Před 3 lety +1

      yes, as you saw in this video, a single thread iperf run could not come close to the 100Gbps. if you want to push more throughput per thread, then you definitely need to do some more tuning; things such as NUMA / core affinity (to avoid QPI latency), interrupt affinity, jumbo frames, increase socket buffers (to allow large TCP window sizes), disabling power management functions to sustain high clock speeds, etc.
      the benchmark run in this video was done without any of that tuning, but obviously takes advantages of using multiple threads to transfer data in parallel, and was able to hit the numbers i showed.

  • @guillepunx
    @guillepunx Před 3 lety +1

    I don't know why but youtube stop to show me when you publish a new video. I had to open your channel to see that you have been uploading videos for the last month. And I'm subscribed to your channel. :|

    • @ArtofServer
      @ArtofServer  Před 3 lety +1

      Well, thanks for checking my channel to see new videos! Make sure you hit the notification bell next to the subscribe button and select "all". If you already did that, could just be a glitch on CZcams.

    • @ArtofServer
      @ArtofServer  Před 3 lety +1

      By the way, I try to release 1 video every week on Friday at 6:30am pacific time.

  • @fengchouli4895
    @fengchouli4895 Před 3 lety +1

    Sigh, 100Gb NICS are fast but expensive. 40/56Gb might be more affordable. BTW, I found you used HP NICs/HBAs in your videos. Thought I am using DELL's machines too, is there any chance that we may see videos about HP servers in the future?

    • @ArtofServer
      @ArtofServer  Před 3 lety +2

      If I happen to acquire some HPE servers from decommissioned DC, maybe. But I'm not a big fan of HPE servers so I wouldn't go out of my way to get some.

    • @ewenchan1239
      @ewenchan1239 Před 2 lety +2

      On a $/Gbps basis, 100G actually isn't that bad.
      I think that it's actually a better deal than 10GbE.
      But that will depend on whether you actually need (or want) 100 Gbps.
      I have a micro HPC cluster at home, in my basement, and at least one of my applications WILL use around 80% of the 100 Gbps capacity when it is working on solving a problem for me, so unlike probably most people who might only use the capacity ONCE in a while, when I am running a simulation, I will use the capacity each and every single time and for a while, my micro cluster was pretty busy that I had simulations scheduled out 4 months in advance.
      So it is really going to depend on whether you will be able to make use of it.
      The most painful and expensive part, I think, is actually the switch (depending on whether you're going to be getting an Infiniband switch or an ethernet switch because for some stupid reason, Mellanox decided to charge customers an extra 50% for 100GbE vs. 100 Gbps Infiniband despite the fact that they own the technology where you can switch the ports from IB to ETH, which they could have just included that feature on their switches so that you can run both, but they want more money from you, so they DIDN'T do that).
      And then on top of that, if you are getting an Infiniband switch from Mellanox, you have to decide on whether you are going to be getting a managed switch or an externally managed switch. I initially bought a used, managed switch, but it was having a problem where it would reboot itself every hour, on the hour, so I returned that and bought a used externally managed switch instead.
      The upside of the externally managed switch was that it was cheaper than the managed switch. The only downside was that I had to have a Linux system run the subnet manager (OpenSM) and THAT ONLY runs in Linux.
      So, say if you want to run an all Windows lab, you'll either need to deploy a Linux system (like this) JUST to run the subnet manager OR you'll have to pay the premium to get the 100 GbE ethernet switch instead if you aren't going to be running Infiniband on Windows.
      The second most expensive thing that you will pay for will be the cables because if you have short distances, then the DAC that is shown will work, but if you need to run longer distances, then you need at least passive optic fibre cables, which costs more than the DAC cables.

  • @ritzmannaxie284
    @ritzmannaxie284 Před 2 lety

    Hi, is there any 40gb cheap infiniband card, like Mellanox, with runs on Ubuntu 20.04 ?

  • @richardwu8225
    @richardwu8225 Před rokem

    Can I connect two nodes with infiniband mode? or it needs the infiniband switch? Thanks!

    • @ArtofServer
      @ArtofServer  Před rokem +1

      I think so, but I don't know much about infiniband.

  • @RepaireroftheBreach
    @RepaireroftheBreach Před 11 měsíci

    I am thinking to move to Ubunto to achieve the 100G speed of my Network. Windows is only giving me 50 Gb/s write and 14 Gb/s read. And I need the Read speeds the most. I think SMB protocol may be at fault here. What network protocol are you using in CentOS?

  • @II_superluminal_II
    @II_superluminal_II Před 2 lety +1

    Hey is this mellanox CX455A compatible with the Poweredge R820 by any-chance, I have 1 R82 and 1 R920 and wanted to interconnect them through a 100gb/s mellanox link and then connect the master to a 10gb switch for a HPC cluster in my basement to run some simulations. I was wondering if this would at all possible, Couldn't find if the CX455A would work in the R820 PCIE-3 slot???? thanks for making these videos, you helped me rescue my R820 :) you should make a discord channel or something

    • @ArtofServer
      @ArtofServer  Před 2 lety +1

      What makes you think it would not be compatible? It's just a PCIe device...

    • @II_superluminal_II
      @II_superluminal_II Před 2 lety

      @@ArtofServer idk especially with dell servers, everything seems finicky. Never owned a server, wanted to use it as a home security and homelab setup

    • @ArtofServer
      @ArtofServer  Před 2 lety +1

      @@II_superluminal_II I've never had a PCIe card that didn't work in a Dell server. I've had issues like that in HPE stuff, but not Dell.

    • @Wolgorboy
      @Wolgorboy Před 2 lety

      @@ArtofServer Well, I just managed to buy a HPE 620QSFP28, that only works in ProLiant servers :( I hope Mellanox will work

  • @andyhello23
    @andyhello23 Před 3 lety +2

    Kind of mute at the moment, unless you are running loads of ssds or nvme in some sort of raid. Your hard disks simply will not take advantage of these speeds.
    Neat to know this tech is around though.
    Thanks for the video.

    • @ArtofServer
      @ArtofServer  Před 3 lety +1

      Good point about storage I/O being matched to the network I/O. NVMe is getting fairly cheap now, so it's not too hard to get fast storage that can handle 90-100Gbps I/O.

    • @ewenchan1239
      @ewenchan1239 Před 2 lety +2

      Your point is well taken, but that isn't the only thing that this is used for.
      For me, I use it to solve distributed sparse matrices where the problem is split between my nodes, and the matrix resides in RAM (128 GB per node, 512 GB total) and so, in order to "link" the memory together, fast enough, you will need something like this to accomplish that.
      My finite element analysis solutions can use about 80% of the 100 Gbps bandwidth capacity when it is running a simulation/working on solving a problem for me, and the results can be relatively small compared to the total volume of data that gets pushed through my Infiniband network during the course of the solution process.

  • @merlin3717
    @merlin3717 Před 3 lety

    Hi I don’t suppose you know how to get the dell repository manager on Ubuntu?

    • @ArtofServer
      @ArtofServer  Před 3 lety

      I don't use Ubuntu, but I have used DRM in Fedora.. it should work the same in any Linux OS as it's a monstrous java application.

  • @pcb7377
    @pcb7377 Před 6 měsíci

    What happens if you add information to the DAC cable that it is 2 times longer (the cable is 1 m - we sew it to be 2 m).
    Will it work

    • @ArtofServer
      @ArtofServer  Před 6 měsíci

      what do you mean by "add information to the DAC cable" ?

    • @pcb7377
      @pcb7377 Před 6 měsíci

      @@ArtofServer Thank you for responding!
      Each DAC cable at the ends has an EEPROM chip. In it lies information about the cable, about its length! I want to change the value in the field of the cable length (I want to increase 2 times). The question is whether such a cable will work. A cable with a modified (programmatically) length 2 times. There are people who can try it?

  • @FaizanAli-gg5qu
    @FaizanAli-gg5qu Před 3 lety +1

    dear sir
    Art of Server that is good thing you're doing for us,we are waiting your complete networking series make the all videos sequences wise, bring more complete networking series such windows server,cisco,huawei,virtualliztion,free nas,firewall and so on.l mean to share your IT knowledge on your youtube channel that you know, many people want learn complete networking series make all complete series step by step.if you will fellow my idea soon you will get more subscribes.add social media links for ask question & help .

  • @learnwot4131
    @learnwot4131 Před 3 lety +4

    Sadly I can give you only one thumb... Tnx

    • @ArtofServer
      @ArtofServer  Před 3 lety

      Thanks! Well, as long as you hit the like button an odd number of times, I'm ok with it! ;-)

  • @guywhoknows
    @guywhoknows Před 3 lety

    Well there goes high speed storage and system bus...
    Tbh. As files size has grown and codecs are not greater, mass data migration is somewhat a pain.
    I think, a 4 X 10gbe would "deal" is a lot of useage in common computing And in enterprise,.
    It was funny between my server and work station...
    I just install DAS to the work station, and echo for offline backup. But sure TCP/UDP is a good way to transfer files.
    What is the range?

    • @ArtofServer
      @ArtofServer  Před 3 lety

      SFP28 DAC cables can go up to 5-meters I believe. If you use optical transceivers, obviously the range can be much longer.

    • @guywhoknows
      @guywhoknows Před 3 lety

      @@ArtofServer I assume that the standard cable and DAC would make the range lower, as with higher data rates usually means shorter distances.
      Passive and active... Would make a difference.