For those who aren't tempted to click through, the buried lede for this (and why...

drysine · on Jan 28, 2025

>99% of the code in this PR [for llama.cpp] is written by DeekSeek-R1

Yes, but:

"For the qX_K it's more complicated, I would say most of the time I need to re-prompt it 4 to 8 more times.

The most difficult was q6_K, the code never works until I ask it to only optimize one specific part, while leaving the rest intact (so it does not mess up everything)" [0]

And also there:

"You must start your code with #elif defined(__wasm_simd128__)

To think about it, you need to take into account both the refenrence code from ARM NEON and AVX implementation."

[0] https://gist.github.com/ngxson/307140d24d80748bd683b396ba13b...

janwas · on Jan 29, 2025

Interesting that both de-novo and porting seems to have worked.

I do not understand why GGML is written this way, though. So much duplication, one variant per instruction set. Our Gemma.cpp only requires a single backend written using Highway's portable intrinsics, and last I checked for decode on SKX+Zen4, is also faster.

aithrowawaycomm · on Jan 28, 2025

Reading through the PR makes me glad I got off GitHub - not for anything AI-related, but because it has become a social media platform, where what should be a focused and technical discussion gets derailed by strangers waging the same flame wars you can find anywhere else.

skeaker · on Jan 28, 2025

This depends pretty heavily on the repo.

fennecfoxy · on Feb 3, 2025

And applies to any platform with a level of public interactions. Also, people can restrict opening issues/leaving comments etc to only collaborators on their repo if they want to.