Hacker Newsnew | past | comments | ask | show | jobs | submit | arnok's commentslogin

I think you are 3 orders of magnitude off in your airplane energy calculation.

3,168,750kWh = 3 GWh


Darn it! You are correct!

Ok, so how about: we made about as much in batteries last year as all the A380s in the world can hold fully loaded in jet fuel. (There aren’t many of them!) Not as impressive, but still.


Did you do any kind of validation? For example, do you have a testset of questions with criteria for what a right answer would be?


I don't have any test sets because I haven't trained any model from scratch. I have built a simple RAG, and my validation comes from users directly, like whether they find the answer useful or not.


The real value of these tools is in the validation, and I mean not just the face validity. User feedback is just face validity.

If you were a doctor and you needed to make a real treatment decision for a real patient, would you use this tool without checking the answer thoroughly, reading the literature yourself and checking to see if it didn't miss any relevant sources? If no, then you might as well skip the tool and do the work yourself. If yes, then you need to know for certain that the answer is correct.

And I don't think it matters if you trained the model yourself. You validate the tool as a whole.

The problem with using user feedback as validation is that users ask questions they don't know the answer to. Therefore, they are unable to judge the correctness of an answer. What you need is a gold standard, and validate against that.


Lsd can be tested as well in the Netherlands. You can get a quantative analysis, but you lose the sample.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: