Won't hold in court. GPT is a platform mainly providing answer to private individuals asking. Is like you ask a professor a question and he answered verbatim what copyrighted materials available (due to photographic memory) word for word back to you. Now if you take this answer and write a book or publish enmass on blogs for example, then you are the one should be sued by NYT. If GPT use the exact same wordings and publish it out to evetyone visiting their page, then that is on OpenAI.
I hope people start calling out the "well it's fine if a human does it" arguments out for the rat fuck thinking it is. These are computational systems operating at very large scales run by some of the wealthiest companies in the world.
If I go fishing, the regulations I have to comply with are very light because the effect I have on the environment is minimal. The regulations for an industrial fishing barge are rightfully very different, even if the end result is the same fish on your plate.
GPT is like a fleet of small fishing boats, each user driving their boat in another direction, not a fishing barge. For every token written by the model there must be a human who prompted, and then consumed it. It is manual, and personal, and deliberate.
In fact all the demonstrations in the lawsuit PDF were intentionally angling for reproducing copyrighted content. They had to push the model to do it. That won't happen unless users deliberately ask for it. It won't happen en-masse.
Gpt is operated by one company. If a million people eat your fish, you're still a barge.
Boo hoo they had to push it. That was never the problem with these bullshit nozzles. The issue is they put that stuff in the training set in the first place. If you can't be honest about that then I have no interest in debating this with you.
unfortunately that's not the crowd of people here. 80% of the comments under this thread (right now, 2:52est) are making similar arguments and *continue* to act like LLMs are doing something unique/creative... instead of just generating sentences, from algorithms, from virtually pirated content in the form of data mining
The professor having been trained in academia would state the sources of the verbatim quotes. In writing papers he would use references and explicit quotes. There's nothing hidden going on with the professor.
If said professor offered a service where anyone could ask them for information that is behind a paywall, and they provided it without significant transformation, this would certainly be copyright infringement that the copyright holder would have every right and motivation to take action against.
I think the scale only matters here (probably). Because I will find it hard that a teacher/professor will not be allowed to setup a service where they will teach and provide their knowledge for others. That is basically the concept of teaching. Of course until LLM, we never had this scale before. Millions of potential learners vs the normal hundreds in a classroom session. So that makes the new case interesting
"Teaching" by copying source books word for word, would be copyright infringement; see, for example, the well-known issues around photocopying books or even excerpts.
Also lying on source materials (e.g. telling students that some respected historian denies the Holocaust happened, when it's obviously not the case) is not "teaching" - it's defamation, and the NYT is absolutely right to pursue that angle too.
Using LLMs as general-purpose search engines is a minefield, I would not be surprised if the practice disappeared in the next 20 years. Obviously the tech is here to stay, there is no problem when it's applied to augmenting niche work; but as a Google replacement, it has so many issues
> Teaching" by copying source books word for word, would be copyright infringement; see, for example, the well-known issues around photocopying books or even excerpts.
Incorrect. Educational use helps satisfy one of tests for fair use. Teachers can, in many cases, photocopy copyrighted work without infringing on that copyright.
Educational use is just one of the many factors used to determine whether an instance of copyright infringement is fair use or not, but it is not carte blanche for educators to ignore IP laws just because they're educating.
Teachers can in some very limited cases photocopy very small chunks of copyrighted work. This also varies significantly from country to country; the starting position is that they cannot reproduce works in their entirety.
scale is important here - maybe a better analogy is setting up a paid Spotify clone with all the music sourced from torrents with some slight distortion effect added