I imagine there wouldn’t bd much of a cost to the provider on the API call there so much longer times may be possible. It’s not like this would hold up the LLM in any way, execution would get suspended while the call is made and the TPU/GPU will serve another request.
They need to keep KV cache to avoid prompt reprocessing, so they would need to move it to ram/nvme during longer api calls to use gpu for another request