Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For those who aren't tempted to click through, the buried lede for this (and why I'm glad it's being linked to again today) is that "99% of the code in this PR [for llama.cpp] is written by DeekSeek-R1" as conducted by Xuan-Son Nguyen.

That seems like a notable milestone.



>99% of the code in this PR [for llama.cpp] is written by DeekSeek-R1

Yes, but:

"For the qX_K it's more complicated, I would say most of the time I need to re-prompt it 4 to 8 more times.

The most difficult was q6_K, the code never works until I ask it to only optimize one specific part, while leaving the rest intact (so it does not mess up everything)" [0]

And also there:

"You must start your code with #elif defined(__wasm_simd128__)

To think about it, you need to take into account both the refenrence code from ARM NEON and AVX implementation."

[0] https://gist.github.com/ngxson/307140d24d80748bd683b396ba13b...


Interesting that both de-novo and porting seems to have worked.

I do not understand why GGML is written this way, though. So much duplication, one variant per instruction set. Our Gemma.cpp only requires a single backend written using Highway's portable intrinsics, and last I checked for decode on SKX+Zen4, is also faster.


Reading through the PR makes me glad I got off GitHub - not for anything AI-related, but because it has become a social media platform, where what should be a focused and technical discussion gets derailed by strangers waging the same flame wars you can find anywhere else.


This depends pretty heavily on the repo.


And applies to any platform with a level of public interactions. Also, people can restrict opening issues/leaving comments etc to only collaborators on their repo if they want to.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: