Unlocking Developer Productivity across CPU and GPU with MAX: Chris Lattner

SdĂ­let
VloĆŸit
  • čas pƙidĂĄn 11. 09. 2024
  • Today's leading generative AI applications have workloads that span high performance GPU compute, CPU preprocessing, data-loading, and orchestration - often spread across a combination of Python, C++/Rust, and CUDA C++ - which increases the complexity and slows down the cycle of innovation. This talk explores the capabilities and power of the Modular Mojo programming language and Modular Accelerated Xecution (MAX) platform, which unifies CPU and GPU programming into a single Pythonic programming model that is simple and extensible. This results in reduced complexity and improved developer productivity, and streamlines innovation. We'll walk through CPU and GPU support with real-world examples, providing details of how AI application developers can use MAX and Mojo to define an end-to-end AI pipeline and overcome the complexities.
    Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at www.ai.enginee... & join us at the AI Engineer World's Fair in 2025! Get your tickets today at ai.engineer/2025
    About Chris
    Chris Lattner is a co-founder and the CEO of Modular, which is building an innovative new developer platform for AI and accelerated compute. Modular provides an AI engine that accelerates PyTorch and TensorFlow inference, as well as the MojođŸ”„ language, which extends Python into systems and accelerator programming domains. He has also co-founded the LLVM Compiler infrastructure project, the Clang C++ compiler, the Swift programming language, the MLIR compiler infrastructure, the CIRCT project, and has contributed to many other commercial and open source projects at Apple, Tesla, Google and SiFive.

Komentáƙe • 6

  • @jianghong6444
    @jianghong6444 Pƙed měsĂ­cem

    at 8:16 the presenter is comparing MAX against llama.cpp using CPU as inference, now the main contributor of llamafile claims that llama.cpp mainly focus on GPU stack (which sort of makes sense since CPU can be comparatively slower), so I'm not sure how big of a impact that would be.

  • @haichengwu799
    @haichengwu799 Pƙed měsĂ­cem

    Do you turn on splitk or streamk in cutlass? Your measurement of cutlass does not look correct . Haicheng @ nvidia

  • @JL-1735
    @JL-1735 Pƙed měsĂ­cem +16

    I have zero interest in Modular neither in MAX as long as it’s not fully open source. They have the right to make it closed, but the “we are making some things open” without any clarity or guarantee that the rest of the stack will eventually become open, is equal to it just being closed source. I would consider it a rug pull, as Chris has been teasing the community and been earning positive press -as if- it’s an open source project.

    • @LisaSamaritan
      @LisaSamaritan Pƙed měsĂ­cem +5

      He explains it on Lex Fridmans podcast #381. You can jump to 02:21:57. But basically he had bad experience from making Swift, where everyone wanted new functionality at he same time as the core parts was being developed and that led to a bunch of bugs and rewrites, and he don't want to make that mistake again.
      He will release parts as they become stable enough, that this will not happen.

    • @LisaSamaritan
      @LisaSamaritan Pƙed měsĂ­cem

      Besides all of his other projects* is open source, so why do you think he wouldn't do it again?
      * The LLVM/MLIR compiler
      The Clang compiler
      The Swift programming language
      The biggest question was surrounding MAX. MAX is written in Mojo, but isn't a part of the language. It now has a free license, for lokal/on prem use. You have to pay for using it in the cloud and for commercial support.
      [Also, nothing prevents you from writing your own MAX like solution in Mojo... Modular have to make money somehow and the license seems fair. Most people get it for free and the ones that can afford to pay, will pay.]
      But even without MAX, you will have Mojo, that is as simple to use as Python and can run any Python program at an expected 2-10x speed improvement (compared to Python's own compiler, without any optimization).
      A 10-100x improvement if you use the Mojo specific, low level parts (basically like writing a part in RUST).
      And in rare occasions you can get a greater improvement. There is some algorithm that have shown 36000x extra speed (if I remember correctly).
      As with everything, whatever extra speed you will get depends on many factors.

  • @RickySupriyadi
    @RickySupriyadi Pƙed měsĂ­cem

    OMG if there is new standard of API can communicate with LLM OMG that really change the world if they all use this standard automation in simple step! uh not really what about security.... like rouge LLM roaming around and exploiting those API wow more talks like these please.
    oh if it's open source LLM can communicate with those API might get more secured? maybe