More

electroglyph · 2026-04-26T02:31:32 1777170692

but should you drive or walk to the car wash?

electroglyph · 2026-04-25T22:57:57 1777157877

i dunno, Opus is losing it's edge imo. i regularly use a mix of models, including Opus, glm 5.1, kimi 2.6, etc. and i find that all of them are pretty much equally good at "average" coding, but on difficult stuff they're nearly equally bad. i can't deny that Opus has an edge, but it's not a huge one.

electroglyph · 2026-04-25T22:27:38 1777156058

> they also don't know what they don't know

they sort of do tho:

https://transformer-circuits.pub/2025/introspection/index.ht...

2ndorderthought · 2026-04-25T23:08:16 1777158496

I won't quibble even though I likely should. Have to remember this is HN and companies need to shill their work otherwise ... Yes.

I will play along and assume this is sound. 10-40% +/- 10% is along the lines of "sort of" in a completely unreliable, unguaranteed and unproven way sure.

electroglyph · 2026-04-25T10:15:42 1777112142

how about a unicode art tool?

https://electroglyph.github.io/atheriz_draw/

electroglyph · 2026-04-20T09:00:25 1776675625

https://sleepingrobots.com/dreams/stop-using-ollama/

electroglyph · 2026-04-13T10:10:25 1776075025

flow matching is making some strides right now, too

electroglyph · 2026-04-11T05:20:22 1775884822

i don't buy this. distilled how? you don't get access to logprobs, and the thinking traces are fake and compressed. it's an expensive way to get potentially substandard training data.

electroglyph · 2026-04-09T10:31:17 1775730677

nah, a crypto grifter released one with cooked benchmarks

electroglyph · 2026-04-08T08:41:02 1775637662

better than Opus? not even close. after struggling thru server overload for the past couple hours i finally put 5.1 thru the paces and it's....okay. failed some simple stuff that Sonnet/Opus/Gemini didn't. failed it badly and repeatedly actually. this was in typescript, btw. not sure if i'll keep the subscription or not

electroglyph · 2026-04-05T11:18:45 1775387925

after you go from from millions of params to billions+ models start to get weird (depending on training) just look at any number of interpretability research papers. Anthropic has some good ones.

HumanOstrich · 2026-04-05T11:46:32 1775389592

> things start to get weird

> just look at research papers

You didn't add anything other than vibes either.

Barbing · 2026-04-05T15:14:15 1775402055

Interesting, what kind of weird?

DrewADesign · 2026-04-05T11:38:08 1775389088

Getting weird doesn’t mean calling it text prediction is actually ‘bullshit’? Text prediction isn’t pejorative…