With my own experiments I've also found this. This behavior is very persistent w...

With my own experiments I've also found this. This behavior is very persistent with llms on default hyperparameters and system prompt. Right now I am exploring how to get these models to output more human like interactions and it seems that a very specific and detailed system prompt is very important to get this to work. These systems are VERY sensitive to system prompt and user input. Meaning that the quality of output varies drastically depending on not just the language you use but how its structured, the order of that structure and also other many nuanced things like system prompt plus user input pre conditioning. So far it seems its possible to get to where we need to for this task but lots of exploration needs to be done in finding the way in how to structure the whole system together. This revelation is kind of nuts when you think about it. It basically means, once you find the right words and the order in which they should be structured for the whole system you can get 2x+ improvement in every variable you care about. That's why I am spending some time creating an automated solution to find these things for x model. Its a tedious effort to do manually, but we have the tools to automate its own optimization and calibration efforts.