Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

they don't come with any inbuilt sense of when they might be getting things wrong

Spend some time with current reasoning models. Your experience is obsolete if you still hold this belief.



Can you be more specific than "current reasoning models"? Maybe I missed it, but I have not yet seen any that would not hallucinate wildly.


Let's try it this way: give me one or two prompts that you personally have had trouble with, in terms of hallucinated output and lack of awareness of potential errors or ambiguity. I have paid accounts on all the major models except Grok, and I often find it interesting to probe the boundaries where good responses give way to bad ones, and to see how they get better (or worse) between generations.

Sounds like your experiences, along with zozbot234's, are different enough from mine that they are worth repeating and understanding. I'll report back with the results I see on the current models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: