Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's interesting to know, but that doesn't by itself imply that it's illegal. For example, Google Books, which has massive amounts of scanned PDFs of copyrighted works, is considered fair use under US copyright law.


There's no good faith world where OPENAI trained only on legally available works

The only valid arguments is whether their model or it's output is itself protected legally.


As long as you don't try to scrape all the book's content…

It's only fair use for search purposes.


It's fair use if the work is "transformative". GPT-4 isn't publishing the content of the books, it's publishing a model derived from the entire corpus. I'm not a lawyer, but I think there's an argument that it is transformative.


Imho as transformative as encoding a DVD as DivX…

It's correct that OpenAI isn't publishing any of the "stolen" content directly. But they "stole" it to make their service possible in the first place. Not distributing it themself doesn't make much difference than.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: