The amount of data (not just code) that we would need to get sign off on is prohibitively large. If you account for all the stakeholders, then this won't be easy at all.
Meanwhile, the institutions will leap ahead of us. Models and annotated data sets will forever be out of reach. Open source equivalents will be severely behind the status quo.
Strongly disagree, institutions have an advantage in having the data, but they do not have the capability of creating more.
An intentional open source dataset could target new domains that there was no institutional will to pursue. I strongly believe that the open source community's capabilities far exceed that of any single large corporation.
Meanwhile, the institutions will leap ahead of us. Models and annotated data sets will forever be out of reach. Open source equivalents will be severely behind the status quo.