The breathtaking audacity of calling distilling GPT4 'stealing' when GPT4 traine...

microtherion · on May 26, 2023

"We ignore what created us; we adore what we create." — Aleister Crowley, The Book of Lies

RobotToaster · on May 26, 2023

"You are trying to kidnap what I've rightfully stolen, and I think it quite ungentlemanly."

wilg · on May 26, 2023

They put "stealing" in scare quotes, so it's probably not worth getting fired up about.

fasterik · on May 26, 2023

Was GPT-4 trained on data that was acquired illegally? Or was it trained on data acquired legally that OpenAI didn't have the rights to redistribute? There is a difference. In the latter case, whether it counts as "stealing" would come down to whether or not GPT-4 counts as a derivative work, or some similar legal concept.

svaha1728 · on May 26, 2023

https://www.washingtonpost.com/technology/interactive/2023/a...

Scribd has lots of pdfs of books that are copyrighted. The Washington Post article mentions there are several other places it downloaded and scraped pdfs of copyrighted textbooks, etc

fasterik · on May 27, 2023

That's interesting to know, but that doesn't by itself imply that it's illegal. For example, Google Books, which has massive amounts of scanned PDFs of copyrighted works, is considered fair use under US copyright law.

cyanydeez · on May 28, 2023

There's no good faith world where OPENAI trained only on legally available works

The only valid arguments is whether their model or it's output is itself protected legally.

still_grokking · on May 27, 2023

As long as you don't try to scrape all the book's content…

It's only fair use for search purposes.

fasterik · on May 27, 2023

It's fair use if the work is "transformative". GPT-4 isn't publishing the content of the books, it's publishing a model derived from the entire corpus. I'm not a lawyer, but I think there's an argument that it is transformative.

still_grokking · on May 28, 2023

Imho as transformative as encoding a DVD as DivX…

It's correct that OpenAI isn't publishing any of the "stolen" content directly. But they "stole" it to make their service possible in the first place. Not distributing it themself doesn't make much difference than.

kordlessagain · on May 26, 2023

Just because someone can convert text to numbers doesn’t mean they have a right to the numbers. That’s like trying to own the emotion a book has on someone, or the things they see in mind when they read it.

blazespin · on May 26, 2023

What I find rather amusing is they spend the whole paper dismissing it as ineffective yet still feel the need to worry about the 'ethics' and 'legality'. They don't cite anything with regards to a discussion/evidence of either, of course, and looking at the authorship list I don't believe any of them are lawyers or ethics experts.

colordrops · on May 26, 2023

No one should have "rights" to any data, information, bits, or whatever. It's not physical and any attempt to apply artificial scarcity to replicate the physical world is a crime against humanity. The lines around which data is protected and which is copyable is arbitrary bullshit. You aren't stealing a fire when you light one candle with another. It's my storage device and I'm not breaking the law all of a sudden because the gates are holding a different set of charges.

looping__lui · on May 26, 2023

By that logic, you also need to accept that no one should ever need to pay you for creating artifacts that are not bound to the physical world solely. I assume you work for free for your employer or in a space that is not “dealing” with data, information, bits, whatsoever.

henry2023 · on May 26, 2023

Hairdressers charge for a service and none of them will assume that they "own" your hair.

looping__lui · on May 26, 2023

They are transforming physical objects very much the same way a carpenter does. The service industry is not equal to Tech / digital. A hairdresser does not create Bits or data. I would also argue in this particular case you are wrong. You hand over the hair on the ground to them which they then dispose or maybe resell (maybe without explicit consent but at least implicit). If that wasn’t the case they would commit theft when they dispose your hair…

saurik · on May 26, 2023

I am honestly shocked that this hasn't happened, what with how the world has been going in recent decades.

looping__lui · on May 26, 2023

Well, they probably can since you give them consent to keep your hair when you leave the shop… Disposing it would otherwise be considered theft, no?

quickthrower2 · on May 26, 2023

Like a torrent of the last GoT season then?

… with compression.

croes · on May 26, 2023

Imagine the GoT producers used GRRM's books without licensing and then claim copyright on the series.

Does OpenAI have the rights on all the texts they used to train their GPTs?

quickthrower2 · on May 26, 2023

i think we agree

politician · on May 26, 2023

I would like the big players to argue that they have some right to the numbers as it has important applications to BitTorrent and cryptography too for that matter.

runsWphotons · on May 26, 2023

yeah this is insane thinking haha

layer8 · on May 26, 2023

Stolen twice is still stolen.