Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
ivape
7 months ago
|
parent
|
context
|
favorite
| on:
Cloud Run GPUs, now GA, makes running AI workloads...
Does anyone actually run a modest sized app and can share numbers on what one gpu gets you? Assuming something like vllm for concurrent requests, what kind of throughput are you seeing? Serving an LLM just feels like a nightmare.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: