> Know when your LLM app is hallucinating or malfunctioning It astonishes me tha...

nirga · on July 17, 2024

I think it depends on the use case and how you define hallucinations. We've seen our metrics perform well (=correlates with human feedback) for use cases like summarization, RAG question-answering pipeline, and entity extraction.

At the end of the day things like "answer relevancy" are pretty dichotomic in a sense that for a human evaluator it will be pretty clear whether an answer is answering a question or not.

I wonder if you can elaborate on why you claim that there's no ability to detect with any certainty hallucinations.

xyst · on July 17, 2024

clearly LLM app has added such logic to their app:

``` if (query.IsHallucinated()) { notifyHumanOfHallucination(); } ```

this one line will get them that unicorn eval

nirga · on July 17, 2024

I think that LLMs are hallucinating by design. I'm not sure we'll ever get to a 0% hallucinations and we should be ok with it (at least for the next coming years?). So getting an alert on hallucination becomes less interesting. What is more interesting perhaps is knowing the rate that this happens. And keeping track on whether this rate increases or decreases with time or with changes to models.