The other ones are smaller but not much worse according to their tests (oddly, in the Winograd Schema Challenge and Commitment Bank tasks, the largest model actually appears to be worse than much smaller ones).
30B parameter models are already large enough to exhibit some of the more interesting emergent phenomena of LLMs. Quantized to 8 bits, it might be possible to squeeze into 2, better three 3090s. But the models also seem undercooked, slightly to strongly under-performing GPT-3 in a lot of tasks. To further train the same model is now looking at > 100 GB, possibly 200GB of VRAM. Point being, this is no small thing they're offering and certainly preferable to being put on a waiting list for a paid API. The 6.7B and 13B parameter models seem the best bang for your buck as an individual.
30B parameter models are already large enough to exhibit some of the more interesting emergent phenomena of LLMs. Quantized to 8 bits, it might be possible to squeeze into 2, better three 3090s. But the models also seem undercooked, slightly to strongly under-performing GPT-3 in a lot of tasks. To further train the same model is now looking at > 100 GB, possibly 200GB of VRAM. Point being, this is no small thing they're offering and certainly preferable to being put on a waiting list for a paid API. The 6.7B and 13B parameter models seem the best bang for your buck as an individual.