It certainly raises some interesting questions. After a model has been trained is there really any surefire way to prove that these models are profiting from your individual code? How is this different, from, say, search indexing in Google? Imagine Wikipedia wants to sue Google for stealing their content. Google essential keeps a mirror of Wikipedia and uses that data to serve up better search results (sometimes). But is there legal group to stand on in such a situation? It seems hard to prove that Company X made Y dollars off your individual code or text, therefore Company X owes you money.
In any case it raises some other questions about intellectual property as a whole. If you can sue an AI model for profiting off your intellectual property, why can't you sue a human for the same? Say you read a book one day, and are so inspired that you go ahead and write a new book. Imagine you publish that new book and sell millions of copies. Are you entitled to pay royalties to the author who gave you inspiration? It seems to me that unless you're plagiarizing large chunks of the original work verbatim, you probably shouldn't be forced to owe the original author much of anything. LLMs do plagiarize, but they do so somewhat inconsistently due to their non-deterministic output (just like humans!).
In any case it raises some other questions about intellectual property as a whole. If you can sue an AI model for profiting off your intellectual property, why can't you sue a human for the same? Say you read a book one day, and are so inspired that you go ahead and write a new book. Imagine you publish that new book and sell millions of copies. Are you entitled to pay royalties to the author who gave you inspiration? It seems to me that unless you're plagiarizing large chunks of the original work verbatim, you probably shouldn't be forced to owe the original author much of anything. LLMs do plagiarize, but they do so somewhat inconsistently due to their non-deterministic output (just like humans!).