FFmpeg: A 94x speed improvement demonstrated using handwritten assembly

not_your_vase · on Nov 3, 2024

Okay... compared to what? If a compiler generates a code that makes Electron look like Speedy Gonzales, please don't forget to file a bug.

AlbertoGP · on Nov 3, 2024

Yes, it would be more interesting with some details. From the file names in the screensot it seems to be AVX512 versus plain C. AVX2 is already 67x, SSSE3 is 40x.

jsheard · on Nov 3, 2024

That code is from the dav1d AV1 decoder, and AFAICT they are comparing scalar C code to hand-vectorized assembly. They're not using vector intrinsics in C, which usually get you most of the way to optimal speed with a lot less pain than writing asm by hand.

kragen · on Nov 3, 2024

GCC also has generic SIMD vector types, which also do that while additionally not making your code nonportable. They look like this:

    typedef uint8_t vec16 __attribute__((vector_size(16)));

I haven't used this seriously, but a simple test I did six years ago is http://canonical.org/~kragen/sw/dev3/vecalpha.c, which I compiled for AMD64 with SSE and for ARM with NEON. I imagine you can do better using intrinsics or assembly, but those are architecture-specific.

AV1's poor performance in ffmpeg has been a major reason I haven't been using AV1. It does seem to provide slightly better bandwidth/quality tradeoffs than H.264 or H.265, but if it's 30× slower to encode, it's usually not worth it. Add to that the possibility of patents.