More

SchemaLoad · 2026-04-09T03:09:50 1775704190

The insiders are the ones writing the laws though.

SchemaLoad · 2026-04-01T04:45:20 1775018720

What do you mean? It says used AI in the title. That already implies it doesn't actually work.

gymbeaux · 2026-04-01T04:48:52 1775018932

Strong disagree on AI implying it doesn’t actually work. Staff engineers at FAANG are vibe coding these days and their stuff absolutely works.

mathisfun123 · 2026-04-01T04:52:08 1775019128

> Staff engineers at FAANG are vibe coding these days and their stuff absolutely works.

60% of the time it works 100% of the time

chillfox · 2026-04-01T04:54:48 1775019288

You really want to claim that with the amount of outages lately?

We are trending towards zero nines (not 100) uptime.

rjh29 · 2026-04-01T07:05:16 1775027116

I use Gemini CLI and given the might of all their engineers and AI tools, the features they ship are constantly buggy and broken.

It's so easy to pump out borderline useless functionality with AI that nobody is actually testing or designing it.

SchemaLoad · 2026-03-31T23:31:39 1774999899

"Good enough" bridges still last 50+ years. We could design a bridge to last 200 years but we won't even know if the design we have today will even be needed in 200 years. Maybe by then we all use trains in underground tunnels.

SchemaLoad · 2026-03-31T23:29:17 1774999757

I don't think that's true. Engineers would largely want to build the best bridge costs be damned. But they would end up undercut by anyone who cuts corners resulting in the only companies getting contracts are the ones who cut the most corners. Even if no one wants to build bridges that collapse, it would be impossible without some counter forces of laws and accountability.

SchemaLoad · 2026-03-30T05:42:31 1774849351

Microsoft has had a lot of naming blunders in the past but this has to be their worst. Copilot is currently, a tool to review PRs on github, the new name for windows cortana, the new name for microsoft office, a new version of windows laptop/pc, a plugin for VS code that can use many models, and probably a number of other things. None of these products/features have any relation to each other.

So if someone says they use Copilot that could mean anything from they use Word, to they use Claude in VS Code.

protocolture · 2026-03-30T05:57:26 1774850246

>Microsoft has had a lot of naming blunders in the past but this has to be their worst.

Nah I still rate "Windows App" the Windows App that lets you remotely access Windows Apps. I hate it to death, its like a black hole that sucks all meaning from conversations about it.

ValentineC · 2026-03-30T06:05:06 1774850706

"Microsoft Remote Desktop" was such a good and distinct name. RIP.

hsbauauvhabzb · 2026-03-30T05:45:25 1774849525

It’s probably a useful feature: if it’s named copilot, assume it’s slop and avoid it.

SchemaLoad · 2026-03-29T22:50:09 1774824609

This feels like an AI generated comment, but I'll reply anyway. AI has been a massive negative for open source since every project is now drowning in AI generated PRs which don't work, reports for issues which don't exist, and the general mountain of time waster automated slop.

We are getting to the point where many projects may have to close submissions from the general public since they waste far more time than they help.

SchemaLoad · 2026-03-29T22:37:23 1774823843

And then you get a new hire who already knows the common SaaS products but has to re learn your vibe coded version no one else uses where no information exists online.

There is a reason why large proprietary products remain prevalent even when cheaper better alternatives exist. Being "industry standard" matters more than being the best.

threethirtytwo · 2026-03-30T00:05:23 1774829123

The new hire will just vibe code a new solution that translates your solution into something he prefers. Every new hire will have his own.

SchemaLoad · 2026-03-30T05:43:49 1774849429

This will all end well I'm sure

threethirtytwo · 2026-03-30T08:11:59 1774858319

It will. By translation I mean like a front end client that translates the api into a user interface they prefer. They will build something localized to their own workflow. If it doesn't end well it's localized to them only.

ares623 · 2026-03-29T22:47:35 1774824455

As the kids say: "let them cook"

SchemaLoad · 2026-03-29T22:35:36 1774823736

Maybe, but I don't really believe users can or want to start designing software, if it was even possible which today it isn't really unless you already have software dev skills.

That would basically make users a product manager and UX designer, which they aren't really capable of currently. At most they will discover what they think they want isn't what they actually want.

SchemaLoad · 2026-03-27T03:34:24 1774582464

Benchmarks on public tests are too easy to game. The model owners can just incorporate the answers in to the dataset. Only the private problems actually matter.

sanxiyn · 2026-03-27T03:37:54 1774582674

In this case the code is public and you can see they are not cheating in that sense.

DetroitThrow · 2026-03-27T05:13:46 1774588426

The harness seems extremely benchmark specific that gives them a huge advantage over what most models can use. This isn't a qualifying score for that reason.

Here is the ARC-AGI-3 specific harness by the way - lots of challenge information encoded inside: https://github.com/symbolica-ai/ARC-AGI-3-Agents/blob/symbol...

Davidzheng · 2026-03-27T04:08:24 1774584504

I agree it's not cheating that restricted sense. But I'm not really convinced that it can't be cheating in a more general sense. You can try like 10^10 variations of harnesses and select the one that performs best. And probably if you then look at it, it will not look like it's necessarily cheating. But you have biased the estimator by selecting the harness according to the value.

SchemaLoad · 2026-03-27T03:41:56 1774582916

Once the model has seen the questions and answers in the training stage, the questions are worthless. Only a test using previously unseen questions has merit.

lambda · 2026-03-27T03:46:52 1774583212

They aren't training new models for this. This is an agent harness for Opus 4.6.

measurablefunc · 2026-03-27T03:59:02 1774583942

All traffic is monitored, all signal sources are eventually incorporated into the training set in one way or another. The person you're responding to is correct, even a single API call to any AI provider is sufficient to discount future results from the same provider.

stale2002 · 2026-03-27T04:11:44 1774584704

ok! So if someone uses an existing, checkpointed, open source model then the answer is yes the results are valid and it doesn't matter that the tests are public.

measurablefunc · 2026-03-27T04:35:52 1774586152

Yes, assuming the checkpoint was before the announcement & public availability of the test set.

raincole · 2026-03-27T05:33:21 1774589601

You live in a conspiracy world. Those AI providers don't update the models that fast. You can try ask them solve ARC-AGI-3 without harness and see them struggle as yesterday yourself.

measurablefunc · 2026-03-27T06:23:17 1774592597

Which part is the conspiracy? Be as concrete as possible.

bberrry · 2026-03-27T09:19:34 1774603174

They are definitely cheating, they have crafted prompts[1] that explain the game rules rather than have the model explore and learn.

1. https://github.com/symbolica-ai/ARC-AGI-3-Agents/blob/symbol...

versteegen · 2026-03-27T12:26:30 1774614390

Where do you see that? I only skimmed the prompts but don't see any aspects of any of the games explained in there. There are a few hints which are legitimate prior knowledge about games in general, though some looks too inflexible to me. Prior knowledge ("Core priors") is a critical requirement of the ARC series, read the reports.

SchemaLoad · 2026-03-25T22:53:44 1774479224

What is the use in keeping it open when no one will ever look at it again after it goes stale? It still exists in the system if you ever wanted to find it again or if someone reports the same issue again. But after a certain time without reconfirming the bug exists, there is no point investigating because you will never know if you just haven't found it yet or if it was fixed already.

youarentrightjr · 2026-03-25T23:32:36 1774481556

See my reply to eminence32 - bug tracking serves as a list of known defects, not as a list of work the engineers are going to do this [day/month/year].

grey-area · 2026-03-26T07:16:18 1774509378

The primary purpose is not usually a list of known defects and many ‘bugs’ are not actually bugs but feature requests or misunderstandings from users (e.g. RFC disallows the data you want my html parser to allow).

youarentrightjr · 2026-03-26T16:31:16 1774542676

> The primary purpose is not usually a list of known defects and many ‘bugs’ are not actually bugs but feature requests

IME there are separate mechanisms to track feature work, bug trackers are for... bugs.

> or misunderstandings from users (e.g. RFC disallows the data you want my html parser to allow).

Again, this is a class of bug report that nobody is arguing should stay open.

grey-area · 2026-03-27T09:07:45 1774602465

The people who filed them would disagree and many would vehemently argue that their bug is in fact a bug, and is the most important bug and how dare you close it.