Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The big one, OPT-175B, isn't an open model. The word "open" in technology means that everyone has equal access (viz. "open source software" and "open source hardware"). The article says that research access will be provided upon request for "academic researchers; those affiliated with organizations in government, civil society, and academia; and those in industry research laboratories.".

Don't assume any good intent from Facebook. This is obviously the same strategy large proprietary software companies have been using for a long time to reinforce their monopolies/oligopolies. They want to embed themselves in the so-called "public sector" (academia and state institutions), so that they get free advertising for taxpayer money. Ordinary people like most of us here won't be able to use it despite paying taxes.

Some primary mechanisms of this advertising method:

1. Schools and universities frequently use the discounted or gratis access they have to give courses for students, often causing students to be only specialized in the monopolist's proprietary software/services.

2. State institutions will require applicants to be well-versed in monopolist's proprietary software/services because they are using it.

3. Appearance of academic papers that reference this software/services will attract more people to use them.

Some examples of companies utilizing this strategy:

Microsoft - Gives Microsoft Office 365 access for "free" to schools and universities.

Mathworks - Gives discounts to schools and universities.

Autodesk (CAD software) - Gives gratis limited-time "student" (noncommercial) licenses.

Altium (EDA software) - Gives gratis limited-time licenses to university students.

Cadence (EDA software) - Gives a discount for its EDA software to universities.

EDIT: Previously my first sentence stated that the models aren't open - in fact, only OPT-175B is not (but the other ones are much smaller).



The other ones are smaller but not much worse according to their tests (oddly, in the Winograd Schema Challenge and Commitment Bank tasks, the largest model actually appears to be worse than much smaller ones).

30B parameter models are already large enough to exhibit some of the more interesting emergent phenomena of LLMs. Quantized to 8 bits, it might be possible to squeeze into 2, better three 3090s. But the models also seem undercooked, slightly to strongly under-performing GPT-3 in a lot of tasks. To further train the same model is now looking at > 100 GB, possibly 200GB of VRAM. Point being, this is no small thing they're offering and certainly preferable to being put on a waiting list for a paid API. The 6.7B and 13B parameter models seem the best bang for your buck as an individual.


Can you actually stack multiple 3090s arbitrarily like that?

That is use multiple 3090s to load a single model for inference.

I thought that at most you could use two 3090s via NVlink.

Stacking multiple cards would open some real cheap options.

Like a real budget option would be something like a few ancient K80s (24GB version). eBay price was around $200-300 last I checked. .


Add Mathematica to that list, too. Pretty cool to play with and I would have bought a license if I had a good excuse to; the tactic works.


Mathematica has been on my mind since high school because we got it for free. I went through the free trial process recently and tried a couple of things I have been too lazy to manually code up (some video analysis). It was too slow to be useful. My notebooks that were analyzing videos just locked up while processing was going on, and Mathematica bogged down too much to even save the notebook with its "I'm crashing, try and save stuff" mode. I ultimately found it a waste of time for general purpose programming; the library functions as documented were much better than library functions I could get for a free language, but they just wouldn't run and keep the "respond to the UI" thread alive.

So basically all their advertising money ended up being wasted because they can't fork off ffmpeg or whatever. Still very good at symbolic calculus and things like that, though.


I'm afraid of companies pushing large scale models as the end all for anything text related. Large language models are revolutionary but the last thing I want to see is everything being run through an API. I'm more interested in things like knowledge distillation or prompt tuning. The hope is that a medium size model with some training can match a large one large one using zero shot approaches




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: