I imagine there wouldn’t bd much of a cost to the provider on the API call there... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		danpalmer 10 months ago \| parent \| context \| favorite \| on: Gemini 2.5 Flash I imagine there wouldn’t bd much of a cost to the provider on the API call there so much longer times may be possible. It’s not like this would hold up the LLM in any way, execution would get suspended while the call is made and the TPU/GPU will serve another request.

suchar 10 months ago [–]

They need to keep KV cache to avoid prompt reprocessing, so they would need to move it to ram/nvme during longer api calls to use gpu for another request

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact