Qwen3 8B works pretty well. But for complex planning and navigation tasks, big models (GPT4.1, claude 3.7) are the still the best bet. We also let you use your own API keys for the big models.
Exactly the question on my mind. I have an ongoing fantasy of a super powerful, private inference server sitting in my closet that I can throw at stuff like this.