Evals is not suitable for evaluating LLM applications such as RAG, etc because o... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		shahules on Aug 2, 2023 \| parent \| context \| favorite \| on: Patterns for building LLM-based systems and produc... Evals is not suitable for evaluating LLM applications such as RAG, etc because one has to evaluate on their own data where no golden test data exists, and techniqus used have poor correlation with human judgement. We have build RAGAS framework for this https://github.com/explodinggradients/ragas

resiros on Aug 2, 2023 [–]

Great project! We're building an open-source platform for building robust LLM apps (https://github.com/Agenta-AI/agenta), we'd love to integrate your library into our evaluation!

shahules on Aug 2, 2023 | [–]

Thank you. That's great. We are working on forming paradigms to evaluate Agents, I'll get in touch with you.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact