Hacker Newsnew | past | comments | ask | show | jobs | submit | ubutler's commentslogin

> Weirdly, the blog announcement completely omits the actual new context window size which is 400,000: https://platform.openai.com/docs/models/gpt-5.2

As @lopuhin points out, they already claimed that context window for previous iterations of GPT-5.

The funny thing is though, I'm on the business plan, and none of their models, not GPT-5, GPT-5.1, GPT-5.2, GPT-5.2 Extended Thinking, GPT-5.2 Pro, etc., can really handle inputs beyond ~50k tokens.

I know because, when working with a really long Python file (>5k LoCs), it often claims there is a bug because, somewhere close to the end of the file, it cuts off and reads as '...'.

Gemini 3 Pro, by contrast, can genuinely handle long contexts.


Why would you put that whole python file in the context at all? Doesn't Codex work like Claude Code in this regard and use tools to find the correct parts of a larger file to read into context?


Here's a copy of the only known recording of a prank call to the Queen: https://www.youtube.com/watch?v=-YFFhc3XZDw


After having read the article in its entirety, I’m still not sure what Cybersyn is…


You might think of it as a nation-scale business intelligence system. It’s part of the case study at the end of Stafford Beer’s Brain of the Firm, an (over?)ambitious project cut short by the fall of Allende in the Chilean coup.


The name of a bunch of linked geographically distributed telex machines. Like a phone network but for text? Uh, actually, I guess what they built it something like email? And the central hub had a simulator that could help with taking decisions based on the input from that data?

They als had a very swell control room.


Personally, I like it. However, I like being able to comment and upvote more. At the same time, I'd be reluctant to say the least to hand over my login credentials. It could be quite cool to see this turned into a FOSS RES-style browser extension. Or maybe even a commercial product. I already paid for the HACK app.


We were unfortunately disappointed to discover that, yes, Voyage, Cohere, and Jina all train on the data of their API customers by default.

Voyage's terms say:

> you grant Voyage AI (and its successors and assigns) a worldwide, irrevocable, perpetual, royalty-free, fully paid-up, right and license to use, copy, reproduce, distribute, prepare derivative works of, display and perform the Customer Content: ... (iii) to train, improve, and otherwise further develop the Service (such as by training the artificial intelligence models we use).

Cohere's terms say:

> YOU GRANT US A ... RIGHT TO ... USE ... ANY DATA ... TO ... IMPROVE AND ENHANCE THE COHERE SOLUTION AND OUR OTHER OFFERINGS AND BENCHMARK THE FOREGOING, INCLUDING BY SHARING API DATA AND FINETUNING DATA WITH THIRD PARTIES ...

Jina's terms say:

> Jina AI shall, subject to applicable mandatory data protection requirements, be entitled to retain data uploaded to the Jina AI Systems or otherwise provided by the Customer or collected by Jina AI in the course of providing the Services and to use such data in anonymized/pseudonymized format for its business purposes including to improve its artificial intelligence applications.


This is the most interesting part of this article.


In my experience, maintaining a very popular software library, supporting open source, and blogging have absolutely all contributed to my success, and, additionally, as someone who is now a founder seeking like-minded, highly skilled engineers, those are key signals for an attractive hire.

I can understand though, perhaps in a work environment where management is unlikely to be able to retain high skilled talent, you may want 'low-profile' workers that aren't going to have as many competitors chasing after them...


Further to @dust42, BERT is an encoder, GPT is a decoder, and T5 is an encoder-decoder.

Encoder-decoders are not in vogue.

Encoders are favored for classification, extraction (eg, NER and extractive QA) and information retrieval.

Decoders are favored for text generation, summarization and translation.

Recent research (see, eg, the Ettin paper: https://arxiv.org/html/2507.11412v1 ) seems to confirm the previous understanding that encoders are indeed better for “encoder task” and vice-versa.

Fundamentally, both are transformers and so an encoder could be turned into a decoder or a decoder could be turned into an encoder.

The design difference comes down to bidirectional (ie, all tokens can attend to all other tokens) versus autoregressive attention (ie, the current token can only attend to the previous tokens).


You can use an encoder style architecture with decoder style output heads up top for denoising diffusion mode mask/blank filling. They seem to be somewhat more expensive on short sequences than GPT style decoder-only models when you batch them, as you need fewer passes over the content and until sequence length blows up your KV cache throughout cost, fewer passes are cheaper. But for situations that don't get request batching or where the context length is so heavy that you'd prefer to get to exploit memory locality on the attention computation, you'd benefit from diffusion mode decoding.

A nice side effect of the diffusion mode is that it's natural reliance on the bidirectional attention from the encoder layers provides much more flexible (and, critically, context-aware) understanding so as mentioned, later words can easily modulate earlier words like with "bank [of the river]"/"bank [in the park]"/"bank [got robbed]" or the classic of these days: telling an agent it did wrong and expecting it to in-context learn from the mistake (in practice decoder-only models basically merely get polluted from that, so you have to re-wind the conversation, because the later correction has literally no way of backwards-affecting the problematic tokens).

That said, the recent surge in training "reasoning" models to utilize thinking tokens that often get cut out of further conversation context, and all via a reinforcement learning process that's not merely RLHF/preference-conditioning, is actually quite related: discrete denoising diffusion models can be trained as a RL scheme during pre training where the training step is provided the outcome goal and a masked version as the input query, and then trained to manage the work done in the individual steps on it's own to where it eventually produces the outcome goal, crucially without prescribing any order of filling in the masked tokens or how many to do in which step.

A recent paper on the matter: https://openreview.net/forum?id=MJNywBdSDy


Until we got highly optimized decoder implementations, decoders for prefill were often even implemented by using the same implementation as an encoder, but logit-masking inputs using a causal mask before the attention softmax so that tokens could not attend to future tokens.


> Actually it reflects the idea of Unicode code points correctly. They are meant to represent graphs, not semantics.

Why do we then have lots of invisible characters that are intended essentially as semantic markers (eg, zero-width space)?


Hey HN,

Over the past couple months, we, a team of Aussie legal and AI experts, have been working on building a new type of legal AI company — a company that, instead of trying to automate legal jobs, is trying to automate legal tasks.

We want to make lawyers’ lives easier, not replace them.

We’ve been laser-focused on building small and efficient yet still highly accurate, specialized models for some of the most time-consuming and mundane legal tasks lawyers have to perform. Stuff like running through a thousand contracts just to locate any clauses that would allow you to get out early.

We just finished training our first set of models, focused on document and clause classification, probably the most common problem we see come up. Our benchmarks show our models to be far more accurate and almost more efficient than their closest general-purpose competitors.

Today, we’re making those models publicly available via the Isaacus API, the world’s first legal AI API.

Our models don’t require any finetuning because they’re zero-shot classifiers — you give them a description of what you’re looking for (for example, “This is a confidentiality clause.”) and they pop out a classification score.

Because our models are so small, which they have to be to be able to process reams of legal data at scale, they can sometimes be a bit sensitive to prompts. To help with that, however, we’ve preoptimized an entire library of prompts, including what we call, universal templates, which let you plug in your own arbitrary descriptions of what you’ve looking for.

We’ve made our prompt library available via the Isaacus Query Language or IQL. Another world first — it’s a brand-new AI query language designed specifically for using AI models to analyze documents.

You can invoke query templates using the format “{IS <query_template_name>}”. You can also chain queries together using Boolean and mathematical operators, like so: “{This is a confidentiality clause.} AND {IS unilateral clause}”.

We think our API is pretty neat and we hope you will too.

This is just the beginning for us — over the course of this year, we’re planning on releasing text extraction and embedding models as well as a second generation of our Kanon legal foundational model.

Below are some links for your convenience.

- Quickstart guide: https://docs.isaacus.com/quickstart

- Announcement: https://isaacus.com/blog/the-worlds-first-legal-ai-api

- Docs: https://docs.isaacus.com/home

- Sign up: https://platform.isaacus.com/accounts/signup/

- LinkedIn: https://www.linkedin.com/company/isaacus/

- Email: hello@isaacus.com


Kagi Professional is $10/m and comes with unlimited searches.


...and, if you append a question mark to the end of your query, you get a one-shot LLM answer (currently powered by Claude).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: