Non-Uniform Memory Architecture (NUMA): A Nearly Unfathomable Morass of Arcana - Fedor Pikus CppNow

Sdílet
Vložit
  • čas přidán 2. 06. 2024
  • www.cppnow.org​
    / cppnow
    ---
    Non-Uniform Memory Architecture (NUMA): A Nearly Unfathomable Morass of Arcana - Fedor Pikus CppNow
    Slides: github.com/boostcon
    ---
    The Non-Uniform Memory Architecture (NUMA) systems are common in enterprise computing today: almost all high-end large-memory systems are NUMA machines, and even the most common mid-range servers (32 to 40 cores, under 500G of memory) are usually NUMA systems. For something so widely used, one would expect NUMA and its impact on program performance to be well understood. Sadly, it’s not. NUMA systems present a Non-Universal Menagerie of Attributes and their behavior is devilishly complex. Practical consequences range from “ignore the fact that it’s NUMA and you’re fine” to “the program runs much faster on 16 CPUs than on 32.” To make the matters worse, reliable measurements are hard to collect, the act of measuring often influences the behavior, the standard profiling tools are at best inadequate and at worst misleading, and concrete knowledge is usually inferred from somewhat opaque behavior.
    In this talk, I present my experiences and lessons learned from working with NUMA systems. If you never asked yourself, “how did they stuff so much memory into this box?,” worry not, we’ll get you up to speed with the introduction to NUMA. We will then discuss the performance restrictions and problems unique to NUMA systems, and learn how to identify and troubleshoot NUMA-related issues. I will show how the standard and often relied-on profiling tools can mislead you when working on a NUMA system and how to recognize the danger signs. I will also show the solutions we came up with for several very different types of problems: poor scaling, large overhead, and low memory performance. Overall, if you ever work on a NUMA system, this talk just might save you days or weeks of debugging, profiling, and experimentation.
    ---
    Fedor Pikus
    Fedor G Pikus is a Technical Fellow and head of the Advanced Projects Team in Siemens Digital Industries Software. His responsibilities include planning the long-term technical direction of Calibre products, directing and training the engineers who work on these products, design, and architecture of the software, and researching new design and software technologies.
    His earlier positions included a Chief Scientist at Mentor Graphics (acquired by Siemens Software), a Senior Software Engineer at Google, and a Chief Software Architect for Calibre PERC, LVS, and DFM at Mentor Graphics. He joined Mentor Graphics in 1998 when he made a switch from academic research in computational physics to the software industry.
    Fedor is a recognized expert in high-performance computing and C++. He is the author of two books on C++ and software design, has presented his works at CPPNow, CPPCon, SD West, DesignCon, and in software development journals, and is also an O'Reilly author. Fedor has over 30 patents and over 100 papers and conference presentations on physics, EDA, software design, and C++ language.
    ---
    Video Sponsors: think-cell and Bloomberg Engineering
    Audience Audio Sponsors: Innoplex and Maryland Research Institute
    ---
    Videos Filmed & Edited By Bash Films: bashfilms.com/
    CZcams Channel Managed & Optimized By Digital Medium Ltd: events.digital-medium.co.uk
    ---
    CppNow 2024
    www.cppnow.org​
    / cppnow
    ---
    #boost #cpp #numa
  • Věda a technologie

Komentáře • 11

  • @riffshyperion
    @riffshyperion Před 10 měsíci +4

    Great presentation. Would love to see the rest!

  • @guiorgy
    @guiorgy Před 10 měsíci +4

    Great talk, shame can't see the rest

  • @DmytroDukov
    @DmytroDukov Před 10 měsíci +3

    Awesome talk! Lots of intricate details and valuable insights. Thank you!

    • @BoostCon
      @BoostCon  Před 10 měsíci +1

      Very glad to hear that you enjoyed it! Thank you for your comments.

  • @LeDabe
    @LeDabe Před 10 měsíci +1

    NUMA seemed like a reasonable solution to continue providing higher memory throughput. Assuming the software is aware of the NUMA.

  • @philmarsh3859
    @philmarsh3859 Před měsícem

    I'd like to add NUMA awareness to the EM solve openEMS which is severely memory-bandwidth
    bound

  • @rgarciaf071
    @rgarciaf071 Před 9 měsíci

    Great talk sad it got cutout

  • @user-yc5fq9bv3u
    @user-yc5fq9bv3u Před 6 měsíci

    43:34 are threads acessing memory with constant speed? because if they go out of sync then there is additional performance penalty I guess
    you even admit that 5 minutes later without mentioning that it will affect total bandwidth as well
    I guarantee that if you implement same read pattern with same thread timing using only one socket you get terrible bandwidth as well because the limiting factor is not inter-socket interaction but DDR structure itself

  • @user-yc5fq9bv3u
    @user-yc5fq9bv3u Před 6 měsíci

    24:20 this graph would definitely look better in totals, not per thread

  • @killacrad
    @killacrad Před 3 měsíci

    What is precisely meant by no interaction between NUMA nodes if only talking to L3 cache at czcams.com/video/f0ZKBusa4CI/video.htmlfeature=shared&t=2005, in terms of effect on memory bandwidth?

  • @user-yc5fq9bv3u
    @user-yc5fq9bv3u Před 6 měsíci

    36:25 "the ratio here is about 50"
    how in the hell it could be fifty? it's a fraction of the cell height which is 10x
    the Y is logarithmic