Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But was that with batching? It makes a big difference. You can run many requests in parallel on the same card if you're doing LLM inferencing.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: