Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Know when your LLM app is hallucinating or malfunctioning

It astonishes me that you are willing to make so many deceptive claims on your website like this.

You have no ability to detect with any certainty hallucinations. No one in the industry does.



I think it depends on the use case and how you define hallucinations. We've seen our metrics perform well (=correlates with human feedback) for use cases like summarization, RAG question-answering pipeline, and entity extraction.

At the end of the day things like "answer relevancy" are pretty dichotomic in a sense that for a human evaluator it will be pretty clear whether an answer is answering a question or not.

I wonder if you can elaborate on why you claim that there's no ability to detect with any certainty hallucinations.


clearly LLM app has added such logic to their app:

``` if (query.IsHallucinated()) { notifyHumanOfHallucination(); } ```

this one line will get them that unicorn eval


I think that LLMs are hallucinating by design. I'm not sure we'll ever get to a 0% hallucinations and we should be ok with it (at least for the next coming years?). So getting an alert on hallucination becomes less interesting. What is more interesting perhaps is knowing the rate that this happens. And keeping track on whether this rate increases or decreases with time or with changes to models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: