For those who aren't tempted to click through, the buried lede for this (and why I'm glad it's being linked to again today) is that "99% of the code in this PR [for llama.cpp] is written by DeekSeek-R1" as conducted by Xuan-Son Nguyen.
>99% of the code in this PR [for llama.cpp] is written by DeekSeek-R1
Yes, but:
"For the qX_K it's more complicated, I would say most of the time I need to re-prompt it 4 to 8 more times.
The most difficult was q6_K, the code never works until I ask it to only optimize one specific part, while leaving the rest intact (so it does not mess up everything)" [0]
And also there:
"You must start your code with #elif defined(__wasm_simd128__)
To think about it, you need to take into account both the refenrence code from ARM NEON and AVX implementation."
Interesting that both de-novo and porting seems to have worked.
I do not understand why GGML is written this way, though. So much duplication, one variant per instruction set. Our Gemma.cpp only requires a single backend written using Highway's portable intrinsics, and last I checked for decode on SKX+Zen4, is also faster.
Reading through the PR makes me glad I got off GitHub - not for anything AI-related, but because it has become a social media platform, where what should be a focused and technical discussion gets derailed by strangers waging the same flame wars you can find anywhere else.
And applies to any platform with a level of public interactions. Also, people can restrict opening issues/leaving comments etc to only collaborators on their repo if they want to.
That seems like a notable milestone.