Not necessarily, not all tactics can be used symmetrically like that. Many of th...

whilenot-dev · 2025-01-29T21:31:57 1738186317

> As long as the vast majority of their data is good (and it will be)

So expert answers are out of scope? Nice, looking forward to those quality data!

Uehreka · 2025-01-29T23:55:40 1738194940

If you want to pick apart my hastily concocted examples, well, have fun I guess. My overall point is that ensuring data quality is something OpenAI is probably very good at. They likely have many clever techniques, some of which we could guess at, some of which would surprise us, all of which they’ve validated through extensive testing including with adversarial data.

If people want to keep playing pretend that their data poisoning efforts are causing real pain to OpenAI, they’re free to do so. I suppose it makes people feel good, and no one’s getting hurt here.

pigeons · 2025-01-30T07:10:24 1738221024

I'm interested in why you think OpenAI is probably very good at ensuring data quality. Also interested if you are trying to troll the resistance into revealing their working techniques.

kleton · 2025-01-31T01:43:08 1738287788

They buy it through scale ai

halfadot · 2025-01-30T03:46:08 1738208768

What makes people think companies like OpenAI can't just pay experts for verified true data? Why do all these "gotcha" replies always revolve around the idea that everyone developing AI models is credulous and stupid?

aprilthird2021 · 2025-01-30T05:38:58 1738215538

Because paying experts for verified true data in the quantities they need isn't possible. Ilya himself said we've reached peak data (https://www.theverge.com/2024/12/13/24320811/what-ilya-sutsk...).

Why do you think we are stupid? We work at places developing these models and have a peek into how they're built...

nyrikki · 2025-01-30T16:39:44 1738255184

You see a rowboat, and you need to cross the river.

Ask a dozen experts to decide what that boat needs to fit your need.

That is the specification problem, add on the frame problem and it becomes intractable.

Add in domain specific terms and conflicts and it becomes even more difficult.

Any nontrivial semantic properties, those without a clear T/F are undecidable.

OpenAI with have to do what they can, but it is not trivial or solvable.

It doesn't matter how smart they are, generalized solutions are hard.

aprilthird2021 · 2025-01-30T05:35:58 1738215358

Sure not necessarily the same tactics, but as with any hacking exercise, there are ways. We can become the 95% :)