Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
LMArena Is a Cancer on AI (surgehq.ai)
6 points by gk1 18 days ago | hide | past | favorite | 1 comment


Like any LLM benchmark, LMArena is highly flawed. I do think it has a right to exist. For me anecdotally it has been indicative of which LLMs style I like best, not necessarily its factual accuracy. It hasn't however been a very useful tool to find the best LLM for a given job.

To the article's point though, it's treated as the gold standard, which it isn't. We should have learned that with the sycophancy-gate.

I'm not sure if the methodology here really is sound for the question at hand. It's a bit like saying, oh prediction markets don't work because 40% of people that voted were wrong.

You can't really get around running your own benchmarks for the job at hand, if you really want to get 95th-percentile performance on a task.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: