They're literally trained on natural language to output natural language. You would need to create the hyper compressed language first, convert all of your training data to that, and then train the models with that. But token efficiency per word already does vary between different languages, with Chinese being like 30%-40% more efficient than English last I heard