Most of the very large token limits are just fake marketing bullshit. If you really try them out, you will immediately realise that the model is not at all able to keep all 100k tokens in its memory. The results tend to be pure luck, so in the end you end up just using 16k tokens anyways, which is already much much better than the initial 4k, but still quite limiting.
- We first struggled with token limits [solved]
- We had issues with consistent JSON ouput [solved]
- We had rate limiting and performance issues for the large 3rd party models [solved]
- We wanted to reduce costs by hosting our own OSS models for small and medium complex tasks [solved]
It's like your product becomes automatically cheaper, more reliable, and more scalable with every new major LLM advancement.
Obivously you still need to build up defensibility and focus on differentiating with everything “non-AI”.