Does anyone actually run a modest sized app and can share numbers on what one gp...

		ivape 7 months ago \| parent \| context \| favorite \| on: Cloud Run GPUs, now GA, makes running AI workloads... Does anyone actually run a modest sized app and can share numbers on what one gpu gets you? Assuming something like vllm for concurrent requests, what kind of throughput are you seeing? Serving an LLM just feels like a nightmare.