Whenever I've had bottlenecks on a single Postgres instance, it's been because of patterns causing excessive lock contention. Redesigning your stuff to be compatible with a horizontal Postgres in the first place involves eliminating forms of shared state, which could also greatly improve the performance on a vanilla single-machine Postgres instance, so you can get very far.
Like, Citus's FAQ says "if you use Citus, you do not need to manually shard your application, and you do not need to re-architect your application in order to scale out." But the line between application-level and DB-level sharding isn't this sharp. The fundamental limitations of distributed systems surface in their rules* about what you can/cannot do across shards, and you might find yourself re-architecting your application anyway.
Like, Citus's FAQ says "if you use Citus, you do not need to manually shard your application, and you do not need to re-architect your application in order to scale out." But the line between application-level and DB-level sharding isn't this sharp. The fundamental limitations of distributed systems surface in their rules* about what you can/cannot do across shards, and you might find yourself re-architecting your application anyway.
* https://docs.citusdata.com/en/stable/develop/reference_worka...