ffs, this attitude causes massively more problems than it solves.
1. You can always change later. Uber switched from Postgres to MySQL when they had already achieved massive scale.
2. You don't know what scaling problems you're going to get until you've scaled.
3. Systems designed to scale properly sacrifice other abilities in order to do that. You're actively hurting your velocity with this attitude.
4. Every single expert in the field who has done this, says to start with a monolith and break it out into microservices as the product matures. Yet every startup is founding on K8s because "we'll need it when we hit scale so we might as well start with it"
5. Twitter's Fail Whale - the problems that failing to scale properly bring are less than the problems of not being flexible enough in the early stages.
Build it simple, and adapt it as you go. Messing up your architecture and slowing down your development now to cope with a problem you don't have is crazy.
> You don't know what scaling problems you're going to get until you've scaled.
This is the point I keep repeating.
If you find yourself needing to scale, the way you scale likely does not match what anyone else is doing. The way Netflix scaled does not look anything like the way WhatsApp scaled. The application dictates the architecture. Not the other way around. Netflix started as a DVD service. Their primary scaling concerns were probably keeping a LAMP stack running and how the hell to organize, ship, and receive thousands of DVDs a day. These scaling problems have little in common with their current, streaming, scaling problems.
It's a weird thing that developers love to discuss and hype up scale and scaling technology and then turn around and warn against the dangers of premature optimization in code. If you ask me, the mother of all premature optimization is scaling out your architecture to multiple servers, sharding when you don't need to, dealing with load balancing, multiple security layers, availability, redundancy, data consistency, containers, container orchestration, etc. All for a system that could, realistically, run quite adequately on an off-the-shelf Best Buy laptop. We have gigabit ethernet and USB 3 on a Raspberry Pi today and people are still shocked you could run a site like HN off a single server. We've all been lobotomized by the cloud hype of the 2010s that we can't even function without AWS holding our hand.
I am partial to the "don't solve problems you don't have" argument which holds true in a lot of cases.
That said, the database is the one part of the system that is very tricky to evolve after the fact. Data migrations are hard. It's worth investing a little bit of time upfront to get it right.
Yes, which is exactly why you shouldn't go with a highly scalable database solution. All of the solutions for really big scale involve storing data in non-normalised form, which mean the pain of data migrations frequently while developing features.
Don't do anything obviously complex with your RDBMS and migrations are free. If all you need is a few views, tables and FKs, then migration between RDBMS' should be low effort if you have a decent RSM or ORM to plug behind it. And even with more efforted things, I've written low-effort migrations from and to various RDBMS', it's not black magic.
The little time upfront is "use pgsql unless there is a good reason not to" as your first choice.
if you dont change schema dramatically, then it doesnt make much sense to migrate to another RDBMS, because most engines have pretty much similar query planner (if you not doing "anything obviously complex").
if you do migrate due to scaling issues, then the schema must evolve, for example: add in-memory db for caching, db sharding/partitioning, table partitioning, hot/cold data split, OLTP/OLAP split, etc.
Scaling issues can present themselves in numerous ways which may not require an in-memory DB, sharding/partitioning, hot/cold or such to be changed, they may even be already present.
In a lot of cases, these can be used and added without locking you out of migration since parts of these are deeper application level or just DB side. The query planner isn't the end-all of performance, there is plenty of differences between MySQL and PgSQL performance behaviour that might force you to switch even though the query planner won't drastically change things.
I have not seen comments about technical debt. I think you are right: It is good to take shortcuts to ship faster. When you do that, you accumulate technical debt. I think it is important to identify it and to remain aware of this debt. I've seen too many people in denial who resist change.
ffs, this attitude causes massively more problems than it solves.
I don't think that it causes so many problems to just use MySQL instead of Postgres from the very beginning of a project. I like using Postgres and I understand that I shouldn't care about scaling but if a make a good decision from the very beginning it can't hurt.
I would rather use Postgres and have a RDBMS that is quite strict and migrate data later instead of having a RDBMS that just does what it likes sometimes.
For example, query your table „picture“ with a first column „uuid“ (varchar) with the following query:
SELECT * FROM picture WHERE uuid = 123;
I don‘t know what you expect, I expect the query to fail because a number is not a string. MySQL thinks otherwise.
Uber switched because of a very specific problem they had with the internals of Postgres, that was handled differently in MySQL (which I believe is now "solved" anyway).
It's not that MySQL scales better than Postgres, but that Uber hit a particular specific scaling problem that they could solve by switching to MySQL.
You could well use MySQL "because it scales better" and then hit a particular specific problem that would be solved by switching to Postgres.
While the executives are dreaming of exotic cars, the engineers are dreaming of exotic architectures. The difference is that when the CEO says, "It's crucial that I have this Ferrari BEFORE the business takes off," nobody takes them seriously.
The really funny part is that the engineers don't just dream of those architectures, they implement them. That's how you get an app that adds two numbers that runs on K8S, requires four databases, a queuing system, a deploy pipeline, a grafana/prometheus cluster, some ad-hock Rust kernel driver and a devops team.
MySQL isn't a general solution to problems of scale, because you don't know what problems you're going to have until you have them. So for example if your scaling problem is ACID compliant database updates - say you're the next fintech - then I was under the impression that MySQL would be the last database you'd want to be using. Have I missed something?
I'm no expert and can't answer that. It was just my impression, and I might be wrong, that for scaling purposes MySQL is better suited. Currently I'm working on a Saas product and the test instance that runs on Digital Ocean sometimes causes connection limit issues (with connection pool) sometimes. Sure my code is maybe not perfectly utilizing connections but I'm really afraid that this happens in production and I don't know how to fix it. On my test environment I just restart everything but on a productive environment I can't do that all the time.
The default limit on Postgres is 100, so you need to ask yourself why you’re exhausting all those connections. The issue isn’t the dB, it’s the code making the connections. Advice: don’t fret scaling issues, get your fundamentals right
> Sure you shouldn't care about scaling at the beginning. But why should you start using a system that you already know won't scale in the future?
Because it's well supported and solid otherwise? There's a wealth of documentation, resources of many kinds, software built around it (debugging, tracing, UIs, etc.). Because there's a solid community available that can help you with your problems?
What alternative technology is there that scales better? I guess MySQL could be it, but doesn't MySQL also come with a ton of its own footguns?
I use Postgres at the moment and I'm happy except for the process per connection part and the upgrade part. Knowing what I know now I think MySQL would have made me happier. On the other hand, it may have caused other issues I don't have with Postgres. I just hope the Postgres team maintains its roadmap based on posts like this.
Only if I _know_ I'm creating something that will definitely have huge amounts of concurrent users and someone pays me to make it scale from the start.
For a hobby project that might take off or might not, there's really no point in making everything "webscale"[0] just in case.
But you have to get to that future first! If you lose your customers because you can't deliver something on time due to complexity of your 'scaling-proof' system or because you can't accommodate changes requested by clients because they would compromise your architecture, scaling will be last of your worries.
Because the hyperscalable databases are much more difficult to set up, use and administer. It's not a "free" upgrade, it'll slow down everything else you do.