The approach on how you consume tokens. The APIs are stateless and have a "this ...

The approach on how you consume tokens.

The APIs are stateless and have a "this is how many tokens you send", "this is how many tokens you asked for" - and thus the person making the requests can control the rate of consumption there. Unless you're being extremely inefficient or using it as part of some other service that has a significant number of requests (in which case ChatGPT isn't appropriate) then this is likely to be less expensive for simple queries.

With ChatGPT you don't have insight into the number of tokens created or the number that are used in the background for maintaining state within a session. Trying to limit a person by tokens midway could have a negative impact on the product.

So, estimate the amount of compute a person uses in a month and then base it on that.