KEYNOTE: What Everyone Should Know About How Amazing Compilers Are - Matt Godbolt [C++ on Sea 2019]

Sdílet
Vložit
  • čas přidán 4. 06. 2024
  • cpponsea.uk
    We use them every day, but how often do we stop to think about the kinds of amazing things our compilers do for us? Modern compilers are a feat of engineering and in this talk Matt will demonstrate just a few of the very cunning things they do for you.
    Matt will concentrate on the output of the compiler: the tricks they use to generate efficient, optimized assembler code.
    Writing clear, readable code that's also efficient hinges on being able to trust your compiler's code generator. By the end of this talk, you'll be be able to read assembly well enough to be able to appreciate your compiler, and have an understanding of what it can - and can't - optimize for you.
    ---
    Matt is a C++ programmer and occasional verb. He loves writing efficient code and sharing his passion about how computers work under the hood. An engineer at Coinbase, he has previously worked at a trading firm, on mobile apps at Google, run a C++ tools company and spent more than a decade making console games. When not tinkering on Compiler Explorer, Matt enjoys working on emulators for old 8-bit computer hardware.
    cpponsea.uk/2019/sessions/key...
    Filmed and Edited by Digital Medium Ltd: events.digital-medium.co.uk
    Enquiries: events@digital-medium.co.uk
  • Věda a technologie

Komentáře • 61

  • @Wyvernnnn
    @Wyvernnnn Před 3 lety +17

    Go on, funny verb man, blow my mind.

  • @seditt5146
    @seditt5146 Před 3 lety +13

    "Oh Golly I'm running out of time"- Dude, never enough time for ya, great speaker no doubt up there with Herb, Chandler and Andrea as well as some others.

  • @BryonLape
    @BryonLape Před 3 lety +17

    I learned enough in my compiler class in university 30 years ago to know I didn't ever want to make one.

    • @aoeu256
      @aoeu256 Před 3 lety +3

      The easiest compilers to make are Lisp macros if you think about it...

  • @KabelkowyJoe
    @KabelkowyJoe Před 3 lety +11

    17:00 Not only one instruction, LEA is using AGU instead of ALU unit for its calculations, when two instructions LEA and ADD are close each other both will be done same time as if it was 0.5 instruction for each. Of course if (and this is common) CPU have 3-4 ALU units 3-4 ADD, SUB will run same time (if independent) but if AGU used you can increase this pararelism up to 8

    • @KabelkowyJoe
      @KabelkowyJoe Před 3 lety +3

      18:30 Reason why GCC still using LEA even for multiply cause it will be done in just one cycle, without LEA it would be faster to use SHL still than multiply witch takes 4 cycles usualy or SHR instead division witch uses 12 cycles usually

  • @colinmaharaj
    @colinmaharaj Před 4 lety +11

    I am in fact, using this to help write an interpreter.

  • @logiciananimal
    @logiciananimal Před 4 lety +7

    What a wonderful website and introduction! Thanks for the resource.

  • @movax20h
    @movax20h Před 4 lety +19

    Actually at 58:36, compiler is perfectly free to replace this implementation of bogo-sort with call to qsort or something. Programs that don't terminate are ill formed. So compiler can assume there will be a progress, and the function will finish. There is no other observable state (mt is a local, and there is no other global state touched). So it must terminate and exit to ensure progress. So it knows that there is no other side effects, and on the exit the table is sorted. So sort it. :)
    It is kind of allowed optimization, but no sane compiler will actually do it, because it is essentially impossible to deduce the above logic by looking at this code, including stability of sort used, or which version is actually faster (it might depend on the size of the array possibly, and even the data values in the array itself).

    • @Alexander-yk
      @Alexander-yk Před 3 lety

      This is partly why we need (more-)high-level concepts.
      They will allow to distinguish RNG from PRNG (which is handy for crypto) and qualify what is mean for array to pass `is_sorted`. With this replacing bogosort with `std::sort` is reasonable.
      So, yeah. We can make a lot of fantastic decisions and optmizations on compiler front end.

    • @EllipticGeometry
      @EllipticGeometry Před 3 lety +3

      Programs that don’t terminate are perfectly legal. But suppose for the sake of argument that not terminating is undefined. That would just lead to the principle of explosion, of which the desired outcome is an infinitesimal subset. To be more explicit in a way that still incorporates some of the usual logic, why wouldn’t is_sorted be considered the culprit? Forcing it to be true in a case where the MT shuffle would never hit the right permutation is no more ridiculous than magically sorting the list (or setting your aunt’s hair on fire while still not terminating) in that same case. A theoretical super smart compiler might recognize that the defined case always results in a sorted list, and a simple way to omit the undefined case is to run the sorting algorithm there as well. Maybe an even smarter compiler could optimize the sort based on some property of bogosortable lists, and make even a charitable interpretation of undefined behavior break things again. Just don’t ever go there.

    • @styleisaweapon
      @styleisaweapon Před 6 měsíci

      That highlights the two principled kinds of optimization: those that transform, and those that calculate (or deduce), cache, and leverage runtime results at compile time - the second kind is always problematic at the edge cases because the compiler is dictating that it has perfect information when it in practice doesnt (it is assuming the state of runtime memory within an abstract machine that has multiple actors that can change it) - there are entire keywords dedicated to making the second type of optimization less problematic but they dont solve the problem they only disable those problematic optimizations (ex: volatile)@@EllipticGeometry

  • @bzboii
    @bzboii Před rokem +1

    52:34 “GCC has something called speculative virtualization…” **crowd member giggles** “yeah!”

  • @apojoga
    @apojoga Před 4 lety +3

    Great talk.

  • @Badcrow7713
    @Badcrow7713 Před 4 lety +11

    His name is GODBOLT

  • @jonathangriffin3486
    @jonathangriffin3486 Před 4 lety +4

    Can one do fixed point arithmetic with 256 bit registers, I ended up using fixed point for a ray-trace sqrt algorithm for 32 bit code in masm. Probably around 2007.

  • @jamesflames6987
    @jamesflames6987 Před 4 lety +1

    Wow!

  • @mockingbird3809
    @mockingbird3809 Před 3 lety +4

    3:21 is really a good joke :D

  • @horowitzhill6480
    @horowitzhill6480 Před 3 lety +4

    28:45 he meant to say 256 bit wide registers, right? 32 bytes are 256 bits

  • @xrafter
    @xrafter Před 3 lety +28

    Assembly is scary?
    Nah x86 is .

    • @ominous-omnipresent-they
      @ominous-omnipresent-they Před 3 lety +3

      Your avatar reminded me of how much I've been wanting to try Arch Linux on my desktop. Currently running openSUSE Tumbleweed on my laptop, and honestly, I freaking love it.

    • @xrafter
      @xrafter Před 3 lety +3

      @@ominous-omnipresent-they
      Happy to hear that .

  • @proxy1035
    @proxy1035 Před 4 lety +16

    "Compilers are awesome"
    me writing mostly with the 65C02:
    :(

  • @adrianomachado5629
    @adrianomachado5629 Před 3 lety +2

    we should have a T-shirt: Compilers are awesome

  • @RachelMant
    @RachelMant Před 3 lety +2

    github.com/bad-alloc-heavy-industries/substrate/blob/master/substrate/fd#L192 - my answer to the insanity that is "just use the netcode library to conditionally 'fix' the endian-ness of your data once it's already too late".. and the great thing is, this compiles out entirely on a CPU with matching endian.. and compiles to an efficient swap instruction otherwise

  • @BryonLape
    @BryonLape Před 3 lety +3

    All this time I thought int was defined as the machine word size. I guess not.

  • @Yupppi
    @Yupppi Před 6 měsíci

    I've been listening to these talks by Kevlin while trying to learn Rust, and I'm just thinking trying to write in Rust is like playing that crazy create password online game where any time you satisfy a condition, you get a new requirement until it agrees to compile. So basically at the end you're putting out fires protecting the egg and losing the workings of your already existing code and it never compiles. Especially if you're trying to combine features.

  • @Ryan-xq3kl
    @Ryan-xq3kl Před 4 lety +6

    Terry Davis has entered the chat.

  • @ThePlacehole
    @ThePlacehole Před 3 lety +3

    Yeah, compile-time speculative inlining of virtual functions sounds cool and all... Until you compare it to any JIT ever.

    • @loli42
      @loli42 Před 3 lety +8

      yeah, compiling all your code on the user's machine at program startup under extreme time constraints using a 100 megabyte runtime that ships with your program just so you can inline virtual functions 25% more frequently sounds cool and all... until you compare it with something that generates better code in every other case, once, ahead of time, and doesn't require a runtime.

    • @ThePlacehole
      @ThePlacehole Před 3 lety

      That does not sound cool to me, but to each their own I guess...
      But, just talk me through your reasoning here. So you have a program which needs to respond quickly, and you've picked a JIT which compiles on startup, but then you have also decided to start it up every time you need it? Why?
      Either way, you can't fanboy over C++ in those "extreme" situation either, because that is C territory.

    • @loli42
      @loli42 Před 3 lety

      @@ThePlacehole what are you even saying? my entire comment was sarcastic. i \*don't\* like JIT. i think it's a stupid idea. how does it follow that if you want your program to "respond quickly" that you would use a JIT? also yes, every time the program itself STARTS, the JIT has to recompile it. unless .NET or whomever has some stupid bytecode cache thing. i wouldn't know, i'm not a pajeet, and i'm most certainly not "fanboying" over sepples. my profile picture is literally yuki nagato holding up the k&r book.

    • @ThePlacehole
      @ThePlacehole Před 3 lety

      @@loli42 I am trying to say that your hypothetical situation does not really detract from JITs, because using a JIT in that way is retarded. Also that the only place where technologies have no negatives is the Magical Fanboy-land.

    • @loli42
      @loli42 Před 3 lety

      @@ThePlacehole using a JIT in what way?? the only way to use a JIT is to compile just-in-time, i.e. when the program starts up! in your hypothetical situation, does a JIT just never fucking compile the program? you know it takes TIME TO COMPILE A PROGRAM RIGHT??? it's like i'm talking to a fucking serial killer.

  • @aoeu256
    @aoeu256 Před 3 lety +1

    Lol Haskell is actually closer to what the CPU does than C with out-of-order execution.

  • @raymundhofmann7661
    @raymundhofmann7661 Před 5 lety +6

    Wouldn't it be more appropriate if the CPU itself does most vectorization? With ever changing architectures and instruction set inflation it becomes increasingly pointless to target a specific subset.
    Also we are stuck since over 10 years in the time of saturating single core performance, it is time for the CPU itself to be able to auto-vectorize and auto-paralellize to give more performance.
    Maybe this job of mapping the code to the available hardware of GPU, CPU cores, co-processor, etc. will be pioneered in a virtual machine like WASM before a CPU manufacturer manages to deliver something.

    • @compuholic82
      @compuholic82 Před 4 lety +15

      I am not sure that is the way to go. (1) It is really hard to do in hardware and it would mean that you would need to devote lots of transistors on the chip to this kind of optimization. And that means that CPUs would become even more inefficient than they are now. Even today only a tiny fraction of the transistors on the CPU are dedicated to the actual computation. Pretty much all transistors are used to keep the processor pipeline moving: caches and cache-management, out-of-order execution, speculative execution and stuff like that.
      (2) Such optimizations cross the boundary to algorithms. IMHO Software should not rely on the hardware to fix stupid algorithmic design.
      There is another way to stay machine-independent while still using machine-dependent optimizations: Acceleration libraries. I am thinking about libraries like Intel IPP, MKL. You can write architecture-independent code and at program startup the libraries will automatically detect the CPUs capabilities and select the appropiate implementation.

    • @edcatchpole4737
      @edcatchpole4737 Před 4 lety

      @@compuholic82 Don't forget about ISPC!

    • @ped7g
      @ped7g Před 4 lety +4

      One of the obstacles is, that the original C++ source target C++ abstract machine. After you compile it to the machine code of some target architecture like x64, you are throwing away all the original constraints and contracts defined by the C++ source and introducing the new set of constraints and contracts defined by the target-architecture. The machine code constraints are of course result-wise compatible with the original constraints to ensure that the result of machine code is correct under the C++ abstract machine considerations, but you can't reconstruct the original set of constraints from the machine code, you would end with lot more complex (more strict) requirements.
      But the CPU does see only the machine code, so it would have to vectorize against the lot more harsh contract, often ending with less optimal code than the compiler could produce, as compiler understands better the original intent of programmer (I mean the intent encoded in the C++ source, not the true intent of programmer, that will be revealed later over multiple debugging sessions after the programmer will realize how to write his original intent correctly in C++).

    • @raymundhofmann7661
      @raymundhofmann7661 Před 4 lety +1

      @@compuholic82 I didn't say it has to be done with specific CPU logic circuitry, just by the CPU from the code perspective.
      Doing it with hardwired logic would be a rather stupid idea for such a complicated and constantly changing problem.
      I implied by mentioning virtual machines that it might just be that, a virtual machine running software doing the machine specific mapping of a abstract machine to the actual computing resources.
      The CPU HW only has to allow for this virtualization, as it is already to some degree, the rest is SW.
      And the computation for this can be partly pooled, put into a cloud, as there is significant homogeneity in computational resource architectures.

  • @mattr2238
    @mattr2238 Před 3 lety +15

    Friends don't let friends use Windows for serious work.

  • @_DATA_EXPUNGED_
    @_DATA_EXPUNGED_ Před 4 lety +9

    The laughter is super creepy

    •  Před 4 lety +14

      -funsafe-laughter

    • @mohammedj2941
      @mohammedj2941 Před 3 lety +5

      it's not, WTH

  • @trisinogy
    @trisinogy Před 4 lety +5

    Big congrats for the hideous audio level. The guy is perhaps a C++ guru, but he doesn't know how to speak properly

    • @trisinogy
      @trisinogy Před 4 lety +4

      Peter Mortensen first and foremost: either the mic or the audio editing of the video are far from being ok. The “volume” (i.e. the level) is far too low. Secondly, the speaker occasionally bursts into loud sentences or laughter and then all of a sudden starts murmuring something to himself as if he wasn’t addressing an audience. That’s extremely disturbing and spoils the otherwise enjoyable and insightful content. I spent most of the time riding the volume control.

    • @Dorumin
      @Dorumin Před 4 lety +11

      I listened to most of it with half my attention, but honestly the sound is fine for me, granted I usually listen to things at a comfortable mild level, never loud, so the only times I complain about sound is when something is too low to be actually discernable

    • @trisinogy
      @trisinogy Před 4 lety +1

      Doru Min very glad it works for you. You clearly have different standards when it comes to audio quality.

    • @magfal
      @magfal Před 4 lety +11

      @@Dorumin The audio was fine for me as well.

    • @alexismandelias
      @alexismandelias Před 3 lety

      I see what you mean, I wouldn't necessarily complain but it is indeed an inconvenience