Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you were running a real business with these would the aim not be to overprovision and to setup auto scaling in such a way that you always have excess capacity?


That seems to be the gist of it. You cannot rely on serverless alone and you need one or many pre-warmed instances at all times. This distinction is rarely mentioned in serverless GPU spaces yet has been my experience in general.


When scaling from 0 to 1 instances, yes, you have to wait 19 seconds.

For scaling N --> N+1 - If you configure the correct concurrency value (the number of parallel requests one instance can handle), Cloud Run will scale up to additional instances when getting to X% (I think it's 70%). That will be before the instance is fully exhausted. So your users should not experience the 19 seconds cold start.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: