A cluster of many $8000+ gpus. You're looking at around 350GB of vram, so 30 12g...

coolspot · on May 3, 2022

One could use $4.5k RTX A6000 48Gb instead. They can be joined in pairs of 96Gb common memory pool with NVlink. That’s 7x$4.5=$31.5k in GPUs to get 336Gb of memory. Or 8x$4.5=$36k in GPUs to get 384Gb of memory.

Add say $3k per GPU pair for surrounding computer (MB,CPU,RAM,PSU) 4x$3k=$12k.

$48k total budget.

coolspot · on May 3, 2022

> so 30 12gb gpus - a 3090 will cost around $1800

3090 has 24Gb, thus 15 GPUs X $1800 = $27,000 in GPUs

etaioinshrdlu · on May 3, 2022

Can 3090 GPUs share their memory with one another to fit such a large model? Or is the enterprise grade hardware required?

coolspot · on May 3, 2022

Yes, two 3090s ($1.7k each) can be connected via NVlink with common 48Gb of memory pool.

Two RTX A6000 ($4.5k each) can form 96Gb memory pool.

adamsmith143 · on May 3, 2022

Almost no one does this on prem. What would this cost on AWS?

cardine · on May 3, 2022

This is not true. On prem is extremely common for things like this because after ~6 months you'll have paid more in cloud costs than it would have cost to purchase the GPUs. And you don't need to purchase new GPUs every 6 months.

AWS would cost $50-100k/mo for something comparable.