The Art of SIMD Programming by Sergey Slotin
Vložit
- čas přidán 8. 09. 2022
- Modern hardware is highly parallel, but not only in terms of multiprocessing. There are many other forms of parallelism that, if used correctly, can greatly boost program efficiency - and without requiring more CPU cores. One such type of parallelism actively adopted by CPUs is "Single Instruction, Multiple Data" (SIMD): a class of instructions that can perform the same operation on a block of 16, 32, or 64 bytes of data in one go, yielding a proportional speedup over scalar code.
While SIMD shares many similarities with classic multiprocessor computing, it is quite different and often requires creative use of the instruction set. In this talk, we will give a general introduction to the technology (focusing on x86/AVX2), derive and implement several state-of-the-art SIMD algorithms, and discuss their use in impactful open-source projects.
skillsmatter.com/skillscasts/... - Věda a technologie
great video. Thank very much for your lightening example and insightful explanation!
Thanks very appreciated. Especially the examples in C. Is this directky compatible in Cython ?
The intrinsics i mean
I don't understand why these architecture specific instructions are not recognized directly by gcc on O3.
they are, when you give the -march= argument, otherwise the compiler doesn't know which instruction sets are allowed and will fall back to a default (usually x86-64 without avx)