More

mi_lk · 2026-01-27T12:08:29 1769515709

most certainly in the west, but not sure about the rest of the world

mi_lk · 2026-01-24T00:22:04 1769214124

Same. If you're already using a proprietary model might as well just double down

swores · 2026-01-24T11:15:35 1769253335

But you don't have to be restricted to one model either? Codex being open source means you can choose to use Claude models, or Gemini, or...

It's fair enough to decide you want to just stick with a single provider for both the tool and the models, but surely still better to have an easy change possible even if not expecting to use it.

mi_lk · 2026-01-24T13:24:15 1769261055

Codex CLI with Opus, or Gemini CLI with 5.2-codex, because they're open sourced agents? Go ahead if you want but show me where it actually happens with practical values

behnamoh · 2026-01-24T04:04:04 1769227444

until Microsoft buys it and enshits it.

consumer451 · 2026-01-24T05:43:01 1769233381

This is a fun thought experiment. I believe that we are now at the $5 Uber (2014) phase of LLMs. Where will it go from here?

How much will a synthetic mid-level dev (Opus 4.5) cost in 2028, after the VC subsidies are gone? I would imagine as much as possible? Dynamic pricing?

Will the SOTA model labs even sell API keys to anyone other than partners/whales? Why even that? They are the personalized app devs and hosts!

Man, this is the golden age of building. Not everyone can do it yet, and every project you can imagine is greatly subsidized. How long will that last?

tern · 2026-01-24T05:57:09 1769234229

While I remember $5 Ubers fondly, I think this situation is significantly more complex:

- Models will get cheaper, maybe way cheaper

- Model harnesses will get more complex, maybe way more complex

- Local models may become competitive

- Capital-backed access to more tokens may become absurdly advantaged, or not

The only thing I think you can count on is that more money buys more tokens, so the more money you have, the more power you will have ... as always.

But whether some version of the current subsidy, which levels the playing field, will persist seems really hard to model.

All I can say is, the bad scenarios I can imagine are pretty bad indeed—much worse than that it's now cheaper for me to own a car, while it wasn't 10 years ago.

depr · 2026-01-24T10:56:38 1769252198

If the electric grid cannot keep up with the additional demand, inference may not get cheaper. The cost of electricity would go up for LLM providers, and VCs would have to subsidize them more until the price of electricity goes down, which may take longer than they can wait, if they have been expecting LLM's to replace many more workers within the next few years.

andai · 2026-01-24T06:02:37 1769234557

The real question is how long it'll take for Z.ai to clone it at 80% quality and offer it at cost. The answer appears to be "like 3 months".

consumer451 · 2026-01-24T06:16:16 1769235376

This is a super interesting dynamic! The CCP is really good at subsidizing and flooding global markets, but in the end, it takes power to generate tokens.

In my Uber comparison, it was physical hardware on location... taxis, but this is not the case with token delivery.

This is such a complex situation in that regard, however, once the market settles and monopolies are created, eventually the price will be what market can bear. Will that actually create an increase in gross planet product, or will the SOTA token providers just eat up the existing gross planet product, with no increase?

I suppose whoever has the cheapest electricity will win this race to the bottom? But... will that ever increase global product?

___

Upon reflection, the comment above was likely influenced by this truly amazing quote from Satya Nadella's interview on the Dwarkesh podcast. This might be one of the most enlightened things that I have ever heard in regard to modern times:

> Us self-claiming some AGI milestone, that's just nonsensical benchmark hacking to me. The real benchmark is: the world growing at 10%.

https://www.dwarkesh.com/p/satya-nadella#:~:text=Us%20self%2...

YetAnotherNick · 2026-01-24T13:15:26 1769260526

With optimizations and new hardware, power is almost a negligible cost that $5/month would be sufficient for all users, contrary to people's belief. You can get 5.5M tokens/s/MW[1] for kimi k2(=20M/KWH=181M tokens/$) which is 400x cheaper than current pricing even if you exclude architecture/model improvements. The thing is currently Nvidia is swallowing up a massive revenue which China could possible solve by investing in R and D.

[1]: https://developer-blogs.nvidia.com/wp-content/uploads/2026/0...

FuckButtons · 2026-01-24T07:53:43 1769241223

I can run Minimax-m2.1 on my m4 MacBook Pro at ~26 tokens/second. It’s not opus, but it can definitely do useful work when kept on a tight leash. If models improve at anything like the rate we have seen over the last 2 years I would imagine something as good as opus 4.5 will run on similarly specced new hardware by then.

consumer451 · 2026-01-24T08:27:38 1769243258

I appreciate this, however, as a ChatGPT, Claude.ai, Claude Code, and Windsurf user... who has tried nearly every single variation of Claude, GPT, and Gemini in those harnesses, and has tested all the those models via API for LLM integrations into my own apps... I just want SOTA, 99% of the time, for myself, and my users.

I have never seen a use case where a "lower" model was useful, for me, and especially my users.

I am about to get almost the exact MacBook that you have, but I still don't want to inflict non-SOTA models on my code, or my users.

This is not a judgement against you, or the downloadable weights, I just don't know when it would be appropriate to use those models.

BTW, I very much wish that I could run Opus 4.5 locally. The best that I can do for my users is the Azure agreement that they will not train on their data. I also have that setting set on my claude.ai sub, but I trust them far less.

Disclaimer: No model is even close to Opus 4.5 for agentic tasks. In my own apps, I process a lot of text/complex context and I use Azure GPT 4.1 for limited llm tasks... but for my "chat with the data" UX, Opus 4.5 all day long. It has tested so superior.

barrenko · 2026-01-24T09:19:23 1769246363

Is Azure's pricing competitive on openAI's offerings through the api? Thanks!

consumer451 · 2026-01-24T09:38:34 1769247514

The last I checked, it is exactly equivalent per token to direct OpenAI model inference.

The one thing I wish for is that Azure Opus 4.5 had json structured output. Last I checked that was in "beta" and only allowed via direct Anthropic API. However, after many thousands of Opus 4.5 Azure API calls with the correct system and user prompts, not even one API call has returned invalid json.

EnPissant · 2026-01-24T11:09:15 1769252955

I'm guessing that's ~26 decode tokens/s for 2-bit or 3-bit quantized Minimax-m2.1 at 0 context, and it only gets worse as the context grows.

I'm also sure your prefill is slow enough to make the model mostly unusable, even at smallish context windows, but entirely at mid to large context.

mi_lk · 2026-01-24T00:20:42 1769214042

looks like it's trivial to you because I don't know how to

n2d4 · 2026-01-24T00:59:41 1769216381

If you're curious to play around with it, you can use Clancy [1] which intercepts the network traffic of AI agents. Quite useful for figuring out what's actually being sent to Anthropic.

[1] https://github.com/bazumo/clancy

fragmede · 2026-01-24T01:46:10 1769219170

If only there were some sort of artificial intelligence that could be asked about asking it to look at the minified source code of some application.

Sometimes prompt engineering is too ridiculous a term for me to believe there's anything to it, other times it does seem there is something to knowing how to ask the AI juuuust the right questions.

lsaferite · 2026-01-24T13:33:37 1769261617

Something I try to explain to people I'm getting up to speed on talking to an LLM is that specific word choices matter. Mostly it matters that you use the right jargon to orient the model. Sure, it's good and getting the semantics of what you said, but if you adjust and use the correct jargon the model gets closer faster. I also explain that they can learn the right jargon from the LLM and that sometimes it's better to start over once you've adjusted you vocabulary.

mi_lk · 2026-01-21T22:33:43 1769034823

curious: can anyone use Ghibli's movie scenes on a random website just like that?

observationist · 2026-01-21T22:46:21 1769035581

The site doesn't stream the movies, references still frames and the original works, and links directly back to the official site - there's no exploitation or arbitrage taking anything away from the studio.

Kelly v. Arriba Soft Corp. (2003) and Perfect 10 v. Amazon (2007) are precedents for image search engines displaying thumbnails - they were found to be fair use. The function is transformative, the site is for a completely different use case than watching media, and doesn't harm the market.

If they've purchased the movies legitimately, and have the receipts, they have an incredibly strong fair use case. Because it's beneficial to Studio Ghibli, I'd say they are best served by allowing it and not trying to exploit DMCA mechanisms to get them taken down.

This is one of those areas where copyright holders can be assholes and abuse the system for petty wins, but the big tech companies have fought and won explicit precedent demonstrating the legitimacy of fair use cases for tools exactly like this.

Awesome tool!

autoexec · 2026-01-21T23:18:15 1769037495

> If they've purchased the movies legitimately, and have the receipts, they have an incredibly strong fair use case.

While I'd also argue that this could be covered under a fair use defense, I thought it worth pointing out that buying a copy of a work and having receipts would have no bearing on the right to distribute copies of that work to others.

Obviously, if someone pirated these movies they could get in trouble for that as well, but that'd be an entirely different matter from the use of copyrighted images on their website.

observationist · 2026-01-21T23:29:11 1769038151

Well, if you're distributing a piece of media, you need to have legal access to the piece that you distribute. You can take a 5 second clip of a movie that hasn't been released to Netflix yet, broadcast it on X or YouTube, meet all the requisites of fair use, and it's not legal speech; you had no legal access to the media you're redistributing. The speech itself is criminal violation of copyright, because of the lack of legal rights to the media in the first place, secondary to any piracy concerns.

If Studio Ghibli were to take them to court, they'd have to show that they had legal access to the media they're redistributing, namely the frames from the various movies. I believe that in this case they're using frames directly from the official Ghibli site, so there's no ambiguity, but if they purchased each and every movie they index, they'd have an extraordinarily strong case for fair use even without linking back to the studio site.

mi_lk · 2026-01-15T07:54:57 1768463697

Not sure why this ad (access needs paid membership) is the top comment

mi_lk · 2026-01-08T06:31:05 1767853865

It’s only GS then JPM though?

mi_lk · 2026-01-07T22:56:21 1767826581

Not buying this. Have you seen studies that support this line of arguments?

mi_lk · 2026-01-06T05:59:30 1767679170

What kind of rendering engine and why Zed has to build it?

verdverm · 2026-01-06T06:25:49 1767680749

not sure, they do something with the GPU for sure, and because how are you going to draw anything on a monitor or screen without a rendering engine? Surely in their code base they have multiple levels of abstraction for rendering, drawing, and layout in their code base. You can see these kinds of things in other comments here and on their github without reading code.

The browser engine is itself an abstraction point that many people find agreeable on both sides, for those of us that don't have a problem with chromium/codium/electron as a technology, seeing it more so as useful and enabling

In my mind, sharing a common engine across chromium/codium/electron is like how so many things use the linux kernel. To me, the more eyes, devs, and consumers of the code makes it better in the long run

Etherlord87 · 2026-01-06T10:03:17 1767693797

Yes, the thing is, the browser is an extremely expensive abstraction layer. It's like having a car factory where everything is built by general purpose robots - it's very versatile, but obviously if you build an assembly line using dedicated machinery, it's going to run much faster.

verdverm · 2026-01-06T13:14:09 1767705249

But you also have to build your own factory and assembly line, which isn't faster to begin with and takes a lot of effort to get their. Zed still has issues with basics like font rendering and GPU usage from excessive redraws / repaints

Meanwhile, chromium works reasonable well on billions of devices of all shapes and kinds

Etherlord87 · 2026-01-06T15:53:10 1767714790

This is why Electron is so popular. Building entire factory is very expensive.

mi_lk · 2026-01-02T04:12:03 1767327123

What’s GH and GB server?

alecco · 2026-01-02T11:00:49 1767351649

Grace-Hopper and Grace-Blackwell. "Grace" is the integrated CPU+GPU architecture. DGX Spark is GB10 and it's allegedly like a small version of the server GB200.

saagarjha · 2026-01-02T04:41:46 1767328906

GH200/GB200, Nvidia’s server hardware

mi_lk · 2026-01-02T03:58:30 1767326310

I’m sure they would appreciate a report as it doesn’t seem that it can be reproduced yet