Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can you be more specific than "current reasoning models"? Maybe I missed it, but I have not yet seen any that would not hallucinate wildly.


Let's try it this way: give me one or two prompts that you personally have had trouble with, in terms of hallucinated output and lack of awareness of potential errors or ambiguity. I have paid accounts on all the major models except Grok, and I often find it interesting to probe the boundaries where good responses give way to bad ones, and to see how they get better (or worse) between generations.

Sounds like your experiences, along with zozbot234's, are different enough from mine that they are worth repeating and understanding. I'll report back with the results I see on the current models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: