They're teaching a computer as you would a student or child.
The minute you see Linux kernel sources, your future output is not forever tainted by the GPL.
It's one thing if they've overfit. But we're still so early in this. It would suck to let the Luddites kill a technological development perhaps on par with or even greater than the Internet itself.
Where are these same voices when artists' output serves as the training set? Or novels? Or forum posts?
If this lawsuit succeeds, we'll cut ourselves off at the knees too. We won't be able to develop models either due to the mishmash of licenses.
I don't understand, what does training a machine learning algorithm like Pilot have in common with teaching a student or child? Teaching them programming? Don't you teach a student or child with lectures and exersizes and practice, not with trying to simply have them read terrabytes of existing code of varying sophistication and figuring they'll know how to program on the other end? What am I missing?
What are the ways you believe how you learned to program is similar to training a machine-learning model? Just that they both consist of "looking at examples", in the broadest sense?
I learned to program by doing programming exersizes (some self-devised, like making simple text games). To accomplish these exersizes, I read programming manuals (in, say, BASIC or Logo); I also looked at simple examples in the languages I was working in.
I did not start by reading huge quantities of source code, including sophisticated programs well beyond my current understanding, in all manner of languages and platforms. This seems very different to training a machine-learning model to me.
> I did not start by reading huge quantities of source code, including sophisticated programs well beyond my current understanding, in all manner of languages and platforms. This seems very different to training a machine-learning model to me.
I don't really see how the complexity (which is subjective) of the source code one reads makes the training different.
I don't want to stop it at all! I encourage them to develop co-pilot. But it should be a toggle on each repository and should require explicit user action to opt-in.
I think we should be able to use any data as fair use for the purpose of training of models. Otherwise data will be owned by the privileged few, and most of us will never be able to compete.
By hoping Microsoft loses this case, you're hoping they gain the permanent upper leg.
Well then I will just never upload my code if you want to use it without permission. Git is distributed thankfully and does not require a centralized service.
Then shouldn't this greater-than-internet technological evolution happen under some public domain, licensing of sorts, instead of in a proprietary silo?
Atleast the web had RFCs, and came from a "research" institution. Here there's nothing but a "cloud", and perhaps some day they'll let you pay 8$ for a blue checkmark to commit code.
Hrm, it's a $10M venture capital fund, plus a $400k fellows/mentorship program. I thought at first that it was a $10M sponsorship fund like we recently did at Sentry, I was gonna be excited about that. :^)
funny how it comes right after major rouse because of their ethically murky use of open source code.