Hacker Newsnew | past | comments | ask | show | jobs | submit | XCSme's commentslogin

Thank you!

One of the worst is TikTok, even as a developer, when someone sends me a TikTok link and I have to visit it, I get stuck in the browser (same with the app but I uninstalled it), and it feels almost device-breaking the way they trap you in.


TikTok is actually very adamant to boot me out of the browser

Initially I thought this was about their B2 file versions/backups, where they keep older versions of your files.

B2 is not a backup service. It’s an object storage service.

Weird, because in the Reddit thread linked above they call themselves a backup service.

I guess you were as confused as me, as I only asociate BackBlaze with B2, I haven't used any other of their services.

It just describes what's in the photo and then some completely wrong/random facts about self-esteem, income, religion, etc.

I guess writing code is now like creating punch-cards for old computers. Or even more recently, as writing ASM instead of using a higher level language like C. Now we simply write our "code" in a higher language, natural language, and the LLM is the compiler.

> Now we simply write our "code" in a higher language, natural language, and the LLM is the compiler.

No we don't and we never should actually, compilers need to be deterministic.


It needs to be something stronger than just deterministic.

With the right settings, a LLM is deterministic. But even then, small variations in input can cause very unforeseen changes in output, sometimes drastic, sometimes minor. Knowing that I'm likely misusing the vocabulary, I would go with saying that this counts as the output being chaotic so we need compilers to be non-chaotic (and deterministic, I think you might be able to have something that is non-deterministic and non-chaotic). I'm not sure that a non-chaotic LLM could ever exist.

(Thinking on it a bit more, there are some esoteric languages that might be chaotic, so this might be more difficult to pin down than I thought.)


Why?

Also, give the same programming task to 2 devs and you end up with 2 different solutions. Heck, have the same dev do the same thing twice and you will have 2 different ones.

Determinism seems like this big gotcha, but in it self, is it really?


> Heck, have the same dev do the same thing twice and you will have 2 different ones

"Do the same thing" I need to be pedantic here because if they do the same thing, the exact same solution will be produced.

The compiler needs to guarantee that across multiple systems. How would QA know they're testing the version that is staged to be pushed to prod if you can't guarantee it's the same ?


This is not what a compiler is in any sense.

I cringe every time I read this "punch card" narrative. We are not at this stage at all. You are comparing deterministic stuff and LLMs which are not deterministic and may or may not give you what you want. In fact I personally barely use autonomous Agents in my brownfield codebase because they generate so much unmaintainable slop.

Except that compiler is a non-deterministic pull of a slot-machine handle. No thanks, I'll keep my programming skills; COBOL programmers command a huge salary in 2026, soon all competent programmers will.

Releasing version 9.0 of my self-hosted analytics app[0]. I will finally add an in-app cron job editor, so you can easily schedule clean-up jobs, data retention settings, newsletters/summaries, etc.

[0]: https://www.uxwizz.com


General intelligence (not coding) comparison: https://aibenchy.com/compare/z-ai-glm-5-medium/z-ai-glm-5-1-...

Is there really no rule that discourages 99% of your interactions with HN from being peddling some useless slop benchmark?

If it's relevant to the discussion, I hope not.

I've spent probably over100 hours working on this benchmarking/site platform, and all tests are manually written. For me (and many others that reached out to me) are not useless either. I use this myself regularly when choosing and comparing new models. I honestly beleive it is providing value to the conversation.

Let me know if you know of a better platform you can use to compare models, I built this one because I didn't find any with good enough UX.


It's a great benchmark. Don't listen to the haters. This one is especially interesting.

https://aibenchy.com/compare/anthropic-claude-sonnet-4-6-med...


This one's even more interesting

https://aibenchy.com/compare/anthropic-claude-opus-4-6-mediu...

Who knew Anthropic was this far behind???


Yeah, but actually that's not a good look. Anyone who's used Gemini will know how random it is in terms of getting anything serious done, compared to the rock solid opus experience.

Their benchmark is chock-full of things like that: It's deeply flawed and is essentially rating how LLMs perform if you exert yourself trying to hold them entirely the wrong way.

GLM 5.1 does worse than GLM 5 in my tests[0] (both medium reasoning OR no reasoning).

I think the model is now tuned more towards agentic use/coding than general intelligence.

[0]: https://aibenchy.com/compare/z-ai-glm-5-medium/z-ai-glm-5-1-...


The (none) version especially shows considerable degradation.

Gemma 4 is great: https://aibenchy.com/compare/google-gemma-4-31b-it-medium/go...

I assume it is the 26B A4B one, if it runs locally?


No, only E2B and E4B.

I tried using Astro for https://aibenchy.com, initially it went great, but then I got into static-website limitations (such as dynamically generating all comparison pages, which would been generating N^4 pages, where N is the number of tested models).

I ended up switching to plain PHP, and it worked great. It is still mostly "static", but I can dynamically include the same content on multiple pages without having to duplicate/build it every time.


It does quite well on my limited/not-so-scientific private tests (note the tests don't include coding tests): https://aibenchy.com/compare/google-gemma-4-31b-it-medium/go...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: