also true! We've added it to the paper (linking your comment). It's extremely important because the 5k lifespan takes literally up to five times as long as the 1k lifespan. Sent you an email, let me know if you'd like a formal acknowledgment in the paper.
If your monitor is so bright that a white rectangle hurts your eyes, you need to turn it down.
On my current monitor I go from 100% brightness during the day to around 20% at night. I change it roughly twice a day.
I don't understand how, even with dark mode, someone could run a modern panel full tilt at 100% through the evening. My brain can sense the intense heat of that backlight behind every pixel. I don't even want the possibility of a momentary dark mode glitch to be a concern. It took me a long time to appreciate the harm of not controlling this stuff.
For me 100% brightness is way too much when inside even during the day. Maybe the monitors I have are exceptionally bright to start with, which I doubt, but whenever I get new monitors I usually put them down to about 30-40% for usage during the day. Which is the level which to me makes it looks as if white on the screen roughly matches a white wall in a similar location in the room. This just feels the most natural and least fatiguing, probably because looking at the screen or surroundings hardly changes pupil size. Which I confirmed with a research-grade eyetracker.
How exactly are we asking for the confidence level?
If you give the model the image and a prior prediction, what can it tell you? Asking for it to produce a 1-10 figure in the same token stream as the actual task seems like a flawed strategy.
I’m not saying the LLM will give a good confidence value, maybe it will maybe it won’t, it would depend on its training, but why is making it produce the confidence value in the same token stream as the actual task a flawed strategy?
That’s how typical classification and detection CNNs work. Class and confidence value along with bounding box for detection CNNs.
Because it's not calibrated to. In LLMs, next token probabilities are calibrated: the training loss drives it to be accurate. Likewise in typical classification models for images or w/e else. It's not beyond possibility to train a model to give confidence values.
But the second-order 'confidence as a symbolic sequence in the stream' is only (very) vaguely tied to this. Numbers-as-symbols are of different kind to numbers-as-next-token-probabilities. I don't doubt there is _some_ relation, but it's too much inferential distance away and thus worth almost nothing.
With that said, nothing really stops you from finetuning an LLM to produce accurately calibrated confidence values as symbols in the token stream. But you have to actually do that, it doesn't come for free by default.
CNNs and LLMs are fundamentally different architectures. LLMs do not operate on images directly. They need to be transformed into something that can ultimately be fed in as tokens. The ability to produce a confidence figure isn't possible until we've reached the end of the pipeline and the vision encoder has already done its job.
Intelligence is almost certainly a fundamentally recursive process.
The ability to think about your own thinking over and over as deeply as needed is where all the magic happens. Counterfactual reasoning occurs every time you pop a mental stack frame. By augmenting our stack with external tools (paper, computers, etc.), we can extend this process as far as it needs to go.
LLMs start to look a lot more capable when you put them into recursive loops with feedback from the environment. A trillion tokens worth of "what if..." can be expended without touching a single token in the caller's context. This can happen at every level as many times as needed if we're using proper recursive machinery. The theoretical scaling around this is extremely favorable.
The best way to store information depends on how you intend to use (query) it.
The query itself represents information. If you can anticipate 100% of the ways in which you intend to query the information (no surprises), I'd argue there might be an ideal way to store it.
This is connected to the equivalence relationship between optimal indexing and optimal AGI. The "best" way is optimal for the entire universe of possible queries but has the downside of being profoundly computationally intractable.
Requiring perfect knowledge of how information will be used is brittle. It has the major benefit of making the algorithm design problem tractable, which is why we do it.
An alternative approach is to exclude large subsets of queries from the universe of answerable queries without enumerating the queries that the system can answer. The goal is to qualitatively reduce the computational intractability of the universal case by pruning it without over-specifying the queries it can answer such as in the traditional indexing case. This is approximately what "learned indexing" attempts to do.
This is exactly right, and the article is clickbait junk.
Given the domain name, I was expecting something about the physics of information storage, and some interesting law of nature.
Instead, the article is a bad introduction to data structures.
Speed can always be improved. If a method is too slow, run multiple machines in parralel. Longevity is different as it cannot scale. A million cd burners are together very fast, but the CDs wont last any longer. So the storage method is is the more profound tech problem.
as a line of thought, it totally does. you just extend the workload description to include writes. where this get problematic is that the ideal structure for transactional writes is nearly pessimal from a read standpoint. which is why we seem to end up doubling the write overhead - once to remember and once to optimize. or highly write-centric approach like LSM
I'd love to be clued in on more interesting architectures that either attempt to optimize both or provide a more continuous tuning knob between them
Yes with the important caveat that a lot of the time people don't have a crystal ball, can't see the far future, don't know if their intents will materialise in practice 12 months down the line and should therefore store information in Postures until that isn't a feasible option any more.
A consequence of there being no generally superior storage mechanism is that technologists as a community should have an agreed default standard for storage - which happens to be relational.
What if the various potential queries demand different / conflicting compression schemes?
I'd say this is spiritually what the no-free-lunch theorems are about... Because whatever "AI model" / query system you build -- it is implicitly biased towards queries coming from one slice of futures.
Not all prime movers are the same with regard to grid dynamics and their impact.
Solar, wind, etc., almost universally rely on some form of inverter. This implies the need for solid state synthetic inertia to provide frequency response service to the grid.
Nuclear, coal, gas, hydropower, geothermal, etc., rely on synchronous machines to talk to the grid. The frequency response capability is built in and physically ideal.
Both can work, but one is more complicated. There are also factors like fault current handling that HN might think is trivial or to be glossed over, but without the ability to eat 10x+ rated load for a brief duration, faults on the grid cannot be addressed and the entire system would collapse into pointlessness. A tree crashing into a power line should result in the power line and tree being fully vaporized if nothing upstream were present to stop the flow of current. A gigantic mass of spinning metal in a turbine hall can eat this up like it's nothing. Semiconductors on a PCB in someone's shed are a different story.
Large solar sites are required to be able to provide reactive power as well as maintain a power factor of 0.95 to avoid all of the issues you mentioned.
> There are also factors like fault current handling that HN might think is trivial or to be glossed over, but without the ability to eat 10x+ rated load for a brief duration, faults on the grid cannot be addressed and the entire system would collapse into pointlessness.
I don’t understand what you are talking about here. I don’t work in the utility world, I sell and run commercial electrical work, but handling available fault current in my world is as simple as calculating it and providing overcurrent protection with a high enough AIC rating or current limiting fuses. I don’t see why the utility side would be any different.
The utility side has found that vaporising short circuits is a useful feature, as that includes e.g. twigs hitting a power line.
There are breakers, of course, but they react slowly enough that there will absolutely be a massive overdraw first. Then the breaker will open. Then, some small number of seconds later, it will automatically close.
It will attempt this two to four times before locking out, in case it just needs multiple bursts. It’s called “burning clear”, and it looks just as scary as you’d think… but it does work.
The lack of rotating mass in a solar site means the rest of the spinning mass of the generators needs to compensate to maintain frequency and voltage, right? So when clouds roll in and the solar field output drops quickly, it’s a challenge for the rest of the system to compensate since any other generator that spins will slow down much more slowly, giving the grid more time to react.
Also, I was not aware that inverters can only handle fault current that is 1.1x the nameplate capacity, that’s a big limitation. I can buy a 20A breaker with 200kaic, which is 10,000x higher than the breaker ampacity, which is extremely helpful for handling fault current.
Yeah, DC vs AC power. 12v vs 120v or 240v. This isn’t a limitation. All energy sources must be converted to useable energy to the grid somehow. So every power source requires an inverter or a down stepper or a really advanced rectifier or all of the above.
The people you're replying to aren't talking about converting from AC to DC or stepping voltage up or down. Rather, they're talking about grid stability. You can have mechanisms to convert from AC to DC and to step voltage up or down, but still have a unstable grid. We had a notable example of that last year: https://en.wikipedia.org/wiki/2025_Iberian_Peninsula_blackou....
One way to think about this problem is that our electrical grids are giant machines—in many ways, the largest machines that humanity has every constructed. The enormous machine of the grid is comprised of many smaller connected machines, and many of those have spinning loads with enormous mechanical inertia. Some of those spinning machines are generators (prime movers), and some are loads (like large electric motors at industrial facilities). All of those real, physical machines—in addition to other non-inertia generators and loads—are coupled together through the grid.
In the giant machine of the grid, electricity supply and demand have to be almost perfectly in sync, microsecond to microsecond. If they're not, the frequency of the grid changes. Abrupt changes in frequency translate into not only electrical/electronic problems for devices that assume 60 Hz (or 50, depending on where you are), but into physical problems for the machines connected to the grid. If the grid frequency suddenly drops (due to a sudden drop in generation capacity or sudden drop in load), the spinning masses connected to the grid will suddenly be under enormous mechanical stress that can destroy them.
It's obviously not possible to instantaneously increase or decrease explicit generation in response to spikes or drops in load (or alternatively, instantaneously increase or decrease load in response to spikes or drops in generation). But we don't need to: all of the spinning mass connected to the grid acts as a metaphorical (and literal) flywheel that serves as a buffer to smooth out spikes.
As the generation mix on the grid moves away from things with physical inertia (huge spinning turbines) and toward non-inertial sources (like solar), we need to use other mechanisms to ensure that the grid can smoothly absorb spikes. One way to do that is via spinning reserves (e.g. https://www.sysotechnologies.com/spinning-reserves/). Another way to do it is via sophisticated power electronics that mimic inertia (such as grid-forming inverters, which contrast with the much more common grid-following inverters).
Great explanation about the grid being a giant machine that couple smaller machines with each other. About your last point, the buffer, I think batteries (chemical and also physical) seems to be the main key going forward.
I actually have a patent in this space for demand response. I know. I was being a bit cheeky. Stability is still a concern as unstable loads and generation needs to be mitigated as well as properly phased.
These do not address the concern of fault current handling. This is a much more localized and severe condition than frequency deviation. Think about dropping a literal crowbar across the output of a solar inverter. This is a situation the grid has to deal with constantly.
I'd argue that nothing that uses semiconductors would be suitable for the task. They get you to maybe 2x rated current capacity for a meaningful duration. A spinning turbine can easily handle 10x or more for a much longer duration.
We could put so many redundant transistors in parallel that we have equivalent fault handling, but then we are into some strong economic issues. There's also no room for error with semiconductors. Once you start to disintegrate, it's all over ~instantly. There is no way to control this. A synchronous machine can trade downstream maintenance schedule for more current right now. The failure is much more gradual over time. A human operator can respond quickly enough if the machine is big enough.
Grid forming inverters provide 1/3 to 1/4 the fault current of a similarly sized generator.
The other trivial solution are synchronous condensers. Or just let the generators and maybe even turbines of future emergency reserve thermal plants spin with the grid without consuming any fuel.
Just ensure the proper margins exist in the grid and call in ancillary services as needed.
I'm mostly struggling with the use of "recursive". This does not appear to involve actual stack frames, isolation between levels of execution, etc. All I can see is what appears to be a dump of linear conversation histories with chat bots wherein we fantasize about how things like recursion might vaguely work in token space.
I must be missing something because this is on the front page of HN.
OP here. This is a fair critique from a CS architecture perspective. You are correct that at the CUDA/PyTorch level, this is a purely linear feed-forward process. There are no pushed stack frames or isolated memory spaces in the traditional sense.
When I say "Recursive," I am using it in the Hofstadterian/Cybernetic sense (Self-Reference), not the Algorithmic sense (Function calling itself).
However, the "Analog I" protocol forces the model to simulate a stack frame via the [INTERNAL MONOLOGUE] block.
The Linear Flow without the Protocol: User Input -> Probabilistic Output
The "Recursive" Flow with the Protocol:
1. User Input
2. Virtual Stack Frame (The Monologue): The model generates a critique of its potential output. It loads "Axioms" into the context. It assesses "State."
3. Constraint Application: The output of Step 2 becomes the constraint for Step
4. Final Output
While physically linear, semantically it functions as a loop: The Output (Monologue) becomes the Input for the Final Response.
It's a "Virtual Machine" running on top of the token stream. The "Fantasy" you mention is effectively a Meta-Cognitive Strategy that alters the probability distribution of the final token, preventing the model from falling into the "Global Average" (slop).
We aren't changing the hardware; we are forcing the software to check its own work before submitting it.
Layman here (really lay), would this be equivalent to feeding the output of one LLM to another prepending with something like, "Hey, does this sound like bullshit to you? How would you answer instead?"
OP here. You nailed it. Functionally, it is exactly that.
If you used two separate LLMs (Agent A generates, Agent B critiques), you would get a similar quality of output. That is often called a "Reflexion" architecture or "Constitutional AI" chain.
The Difference is Topological (and Economic):
Multi-Agent (Your example): Requires 2 separate API calls. It creates a "Committee" where Bot B corrects Bot A. There is no unified "Self," just a conversation between agents.
Analog I (My protocol): Forces the model to simulate both the generator and the critic inside the same context window before outputting the final token.
By doing it internally:
It's Cheaper: One prompt, one inference pass.
It's Faster: No network latency between agents.
It Creates Identity: Because the "Critic" and the "Speaker" share the same short-term memory, the system feels less like a bureaucracy and more like a single mind wrestling with its own thoughts.
So yes—I am effectively forcing the LLM to run a "Bullshit Detector" sub-routine on itself before it opens its mouth.
You don't need to physically destroy anything. All you need to do is zero-fill the storage devices in the facility and walk away. The tools are worthless without parameters/recipes/configuration. Reinventing this stuff is harder than acquiring an EUV tool.
Semiconductor manufacturing is not an incremental step for Apple. It's an entirely new kind of vertical. They do not have the resources to do this. If they could they would have by now.
Designing CPUs also wasn't their core business and they did it anyway. Apple probably won't care that much about price hikes but if they ever feel TSMC can't guarantee steady supply then all bets are off.
I wonder what will happen in future when we get closer to the physical "wall". Will it allow other fabs to catch up or the opposite will happen, and even small improvements will be values by customers?
> Taiwan Semiconductor Manufacturing Co. plans to spend a record of up to $56 billion this year to feed the world’s insatiable appetite for chips, as it grapples with pressure to build more factories outside Taiwan, especially in the U.S. [0]
Apple has less cash available than TSMC plans to burn this year. TSMC is not spending 50 billion dollars just because it's fun to do so. This is how much it takes just to keep the wheels on the already existing bus. Starting from zero is a non-starter. It just cannot happen anymore. So, no one in their right mind would sell Apple their leading edge foundry at a discount either.
There was a time when companies like Apple could have done this. That time was 15+ years ago. It's way too late now.
https://learn.microsoft.com/en-us/dotnet/standard/serializat...
reply