If you were running a real business with these would the aim not be to overprovi...

omneity · 2025-06-04T14:15:33 1749046533

That seems to be the gist of it. You cannot rely on serverless alone and you need one or many pre-warmed instances at all times. This distinction is rarely mentioned in serverless GPU spaces yet has been my experience in general.

nullpointerexp · 2025-06-04T15:42:39 1749051759

When scaling from 0 to 1 instances, yes, you have to wait 19 seconds.

For scaling N --> N+1 - If you configure the correct concurrency value (the number of parallel requests one instance can handle), Cloud Run will scale up to additional instances when getting to X% (I think it's 70%). That will be before the instance is fully exhausted. So your users should not experience the 19 seconds cold start.