Core C++ 2019 :: Nimrod Sapir :: High Frequency Trading and Ultra Low Latency development techniques

Sdílet
Vložit
  • čas přidán 9. 06. 2024
  • When developing a high frequency environment, every transaction is a race against the clock and against other players in the market. Therefore, your critical flow can never be “fast enough” as long as someone else may be faster. In this lecture I will cover some of the techniques qSpark’s trading infrastructure uses to survive in the HFT jungle, and will discuss in detail one of our main techniques: using dummy operations to warm up the instruction cache for critical operations while avoiding branch mis-predictions.
    -- -- -- -- -- --
    Presented at Core C++ 2019 conference, the slides can be found at bit.ly/cpp19le
  • Věda a technologie

Komentáře • 50

  • @CTPATEX
    @CTPATEX Před 4 lety +9

    43:51 The copy is also extremely incorrect with memcpy if your key/value aren't trivial. For trivial types, replacing copy-constructor body with "= default" should generate the same code.

    • @MultiNimrods
      @MultiNimrods Před 4 lety

      You are right about the triviality requirement (I didn't mention it, but I should) - regarding the quick copy: I am talking about copying the entire map, which can be done for this data structure with a single memcpy call (again, for trivial types), which is not the same as calling the copy constructor for each object (even for trivial objects)

    • @mohammedj2941
      @mohammedj2941 Před 3 lety

      @@MultiNimrods under the assumption of type triviality, doesn't the compiler optimize the individual copies away using memcpy in this case?

  • @rajatverma3205
    @rajatverma3205 Před 3 lety +5

    I’m new here. Currently I’m studying about computer architecture before making a HFT low latency software because hardware is important.
    I know c++, working heavily on that too.
    How would anyone guide a beginner who wants to contribute in low latency coding ?

    • @nimrodsapir3256
      @nimrodsapir3256 Před 3 lety +16

      I would learn as much as possible on advanced C++ features and paradigms (such as CRTP, which is very useful). Understand memory, caching (memory and CPU) branch prediction and efficient memory allocation and pipelining - all of those things become extremely important when writing low latency code

    • @rajatverma3205
      @rajatverma3205 Před 3 lety

      @@nimrodsapir3256 feel really good that those are the only things I’m focusing on. I learned my basic c++ from “C++ primer plus by Stephen prata” and have an ongoing course on computer architecture.
      For more advanced c++, I’ll take help from “c++ by Bjarne Stroustrup”.

  • @LordNezghul
    @LordNezghul Před 4 lety +4

    Wouldn't it be better to use custom version of compiler that generates code that warms up your cache instead of constantly fighting with "leaky abstractions"?

    • @ShalomCraimer
      @ShalomCraimer Před 4 lety +2

      Probably not. The extra cost of building your own compiler would exceed the benefit. It would require not merely compiler-oriented programmers, but also that those programmers become highly versed in the CPU-specific optimizations. These sorts of developers are harder to find. More code to maintain, and more developers means higher costs, for a hard-to-measure benefit.
      Also, this is a good moment to say that I'd love to go further - have a compiler that simulated the internal state of the CPU (which isn't always very well documented) and optimized the uops and evened out the port-pressure (e.g. how to distribute the work among the ALUs in each core). It might be tricky to provide compile-time core-associativity to allow for such optimizations, but it could be done!

    • @LordNezghul
      @LordNezghul Před 4 lety +1

      @@ShalomCraimer I think there is no need for building entirely new compiler from scratch but maybe just few extensions for existing compilers.

    • @ShalomCraimer
      @ShalomCraimer Před 4 lety

      @@LordNezghul I *was* only talking about the work of building a new backend for the compiler. The part of the compiler that decides how to convert the IR into the binary (e.g. "x86 bytecode" for the specific x86 CPU you want to optimize for). It's still a non-minor undertaking, not just to do it, but to prove that there is an improvement from the optimizations. Even discovering the optimizations would become a full-time job, especially while trying to keep up with new Intel hardware.

    • @nimrodsapir3256
      @nimrodsapir3256 Před 4 lety +2

      Thanks for your comment, and I have to ask - This custom compiler you describe - it will have to detect (at compile time) the flows which are rarely executed, but are business critical (you don't want to just warm up all your code, just those specific flow), which is not something that I think can be deduced automatically. Also, the generated code should run without side-effects, which is a very tricky definition (some counters may be harmless if accessed by the warmup code, while others must be replaced with a mockup). Again, it is very likely I am missing something here...

  • @cutyboi8630
    @cutyboi8630 Před 3 lety +1

    hi thanks its a good . i have a question why using cpp instead of using your own os drivers and assembly lang? why using linux kernal and cpp?

    • @JMRC
      @JMRC Před 3 lety +1

      Just a guess, but besides that maintenance will be a nightmare, often compilers are better in optimization than people are when writing in assembly themselves.

    • @nimrodsapir3256
      @nimrodsapir3256 Před 3 lety +1

      Just to comment - these days we have the ways to run our logic end to end inside the userspace (we are using specialized network cards and drivers). So as far as the kernel is configured to allocate the resources we need, we do not require to write any kernel code.

    • @zoasis7805
      @zoasis7805 Před 2 lety

      @@JMRC To add to this - compilers are better at optimising than humans, but you can always look at this disassembly produced from compiled c++ and try add different optimisations that way, much easier to do than write assembly from scratch.

    • @cppdeveloper
      @cppdeveloper Před měsícem

      I think C++ HFT is just the implementation of the algorithm - the tested and successfull algorithms are moving to FPGA, so it will be 100 times faster than assembly or drivers. So C++ is used just because it's one of the best language in the terms of performance out of the box and speed of the implementation of algo's MVP

  • @denispriyomov6086
    @denispriyomov6086 Před 4 lety +9

    The 1st video watched in 2x speed, also skipped initial 20 minutes used as cache warming... Should have been applying HFT algos ;)

  • @blazkowicz666
    @blazkowicz666 Před 2 lety +2

    Why Cpp over C, if performance is of ultimate importance?
    Also what about Rust vs Cpp?

  • @gurugamer8632
    @gurugamer8632 Před 2 lety

    Which programming languages are best to learn today for high frequency trading?

    • @insafidris2366
      @insafidris2366 Před 2 lety

      c++ for speed, python for ease, but to directly answer, it is c++

    • @Space_math.engineer
      @Space_math.engineer Před rokem

      C++ but maybe start with python if ur new to programming imo

    • @draked8953
      @draked8953 Před rokem

      Rust is gaining big traction rn in more agile firms

    • @recursion.
      @recursion. Před 10 měsíci

      @@draked8953 Could you name those agile firms? Really curious

    • @HowDoYouUseSpaceBar
      @HowDoYouUseSpaceBar Před 7 měsíci

      ​@@draked8953High Frequency Trading, Low Frequency Development

  • @paulmccumber9291
    @paulmccumber9291 Před 2 lety

    Why not use an RTOS? Or even bare metal code running application specific code?

    • @bibekkoirala8802
      @bibekkoirala8802 Před 2 lety +1

      They use multicore high end state-of-art processors, not microcontrollers

    • @paulmccumber9291
      @paulmccumber9291 Před 2 lety

      @@bibekkoirala8802 An RTOS runs just fine on a modern Intel space heater. I'm saying you'd have complete control over what is in the ISRs and be better suited to manage latency. Heck you could even write a thread that NEVER leaves context.

    • @bibekkoirala8802
      @bibekkoirala8802 Před 2 lety +2

      @@paulmccumber9291 space heater lmaooo. AFAIK they cut down all fluff from linux kernel, modify the networking layers(kernel bypass) and other performance modifications shit. So, they do get RTOS-like benefits from linux, maybe not hard real-time but close. IMO pure RTOS is better suited for something like sampling audio signals in real-time where you don't need networking protocols and shit like that. Just my views, I don't work in audio or HFT.

    • @joewu7092
      @joewu7092 Před rokem +1

      Indeed, some of the hfts do what you say. the bare metal code, but usually on a SoC (FPGA net stack + FPGA or ARM algo impl depends on the complexity). I think they are just keen on moving the impl/logics to HW as much as possible.

    • @nimrodsapir3256
      @nimrodsapir3256 Před rokem +1

      Basically, the idea is to bypass the OS services in real-time altogether (ideally, all the memory is pre-allocated, kernel bypass for the networking, and pinned and spinning threads for the real time threads). So the OS scheduling will only handle the administrative tasks of the system. Beyond that, FPGA indeed can give even high performance, but adds a lot of limitations of course

  • @rickydeldo8596
    @rickydeldo8596 Před 3 lety

    Thx

  • @cortexauth4094
    @cortexauth4094 Před 5 měsíci +1

    I am surprised that trading volume is just 50% lmaooo

  • @paulmccumber9291
    @paulmccumber9291 Před 2 lety +2

    How about C? I love C++ but I feel like C is right next to the hardware.

  • @turdwarbler
    @turdwarbler Před 3 lety +1

    interesting video thanks for making it. One tip, stand still. you shift side to side and as you are on screen its very distracting. :-)

    • @comitcrafter
      @comitcrafter Před 2 lety +5

      Have you ever lectured before ?

    • @turdwarbler
      @turdwarbler Před 2 lety +1

      @@comitcrafter yes I have quite a lot and I have been videoed doing it.

    • @nimrodsapir3256
      @nimrodsapir3256 Před rokem +4

      Thanks for the tip (really) - this was my first time doing such a long lecture so I was quite nervous...

  • @totenkopf30
    @totenkopf30 Před 4 lety +1

    If you don´t speak English, then choose your own language because is very annoying listening to someone trying hard to find the proper words to express himself in a foreign language.

    • @pcb1962
      @pcb1962 Před 4 lety +44

      Stupid comment, nothing wrong with his English.

    • @spicetard249
      @spicetard249 Před 4 lety +36

      at least he is trying to help others

    • @totenkopf30
      @totenkopf30 Před 4 lety

      @@spicetard249
      Fuck that shit, all we need to do in life is take care of ourselves. I say, if some asshole is wasting his time helping others, just take advantage of him.

    • @peterhooper2643
      @peterhooper2643 Před 3 lety +25

      @@totenkopf30 you must be fun at parties

    • @totenkopf30
      @totenkopf30 Před 3 lety

      @@peterhooper2643 what parties asshole, I fucking hate human beigns