Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Evals is not suitable for evaluating LLM applications such as RAG, etc because one has to evaluate on their own data where no golden test data exists, and techniqus used have poor correlation with human judgement. We have build RAGAS framework for this https://github.com/explodinggradients/ragas


Great project! We're building an open-source platform for building robust LLM apps (https://github.com/Agenta-AI/agenta), we'd love to integrate your library into our evaluation!


Thank you. That's great. We are working on forming paradigms to evaluate Agents, I'll get in touch with you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: