Lots of teams really underestimate cloud costs since there is an assumption that the hundreds of millions they are raising will give them enough runway to survive a few years despite losing money for years.
Even scaling would be somewhat of an issue depending on the tech stack. Imagine the cost of running standard Java micro-services and the "solution" was to "spin up hundreds of more nodes". The worst that I have seen was a bank proudly having up to 8,000 - 10,000 separate micro-services.
Just imagine the daily cost of that. Unjustifiable.
But of course the AWS cloud consultants would be happy to shill you their offerings at "cheap" prices, but in reality the pricing is designed for you to accumulate millions in costs as you scale on the tiniest amount of usage, even for testing.
So before you build the software, one must think about the costs of scaling if it becomes widely used rather than taking the easy approach in just spinning up nodes and increasing more costs and act as if you don't have the capital to solve the problem. You can only do that for so long until you don't.
I remember that at a previous company somehow it leaked that the AWS cost was 50% of all the developers staff salary.
There was nowhere near the same volume of data as Basecamp/Hey, not there was much processing power needed. It was purely bad engineering accumulated over 10 years.
I was once contracted to work on a project where the monthly GCP bill for Postgres was $60k per month - this was basically my YEARLY rate at that time, just for managed Postgres.
After some time I was quite familiar with their stack and had gathered considerable domain experience. This led to an idea how to halve the database load (and the cost would presumably fall by a similar percentage), which I wanted to use as leverage during contract renegotiation.
I boldly offered to work for free to halve their database load, in exchange for being paid half the money this optimization would save over the course of one year. This would basically triple my pay, and they would still save money.
They declined, and I moved to a better opportunity.
Last I heard they had to pay a team of 4 new consultants for a year to implement the same idea I had. Without the domain knowledge, consultants couldn't progress as fast as I suspect I could have done (my estimated was 2 months of work).
I know it's very petty, but I regret revealing too many implementation details of the idea during the pitch and allowing the company to contract other consultants to see it done.
If you built up that domain knowledge while being paid top dollar per hour by the same company, then I understand their reluctance to go along with your offer. It feels a little bit extortionate to be honest. I wouldn't go along with it either, not because it's a bad deal in isolation, but because it sets a bad precedent. It basically tells every employee/contractor that if they know a way to add a lot of measurable value, they can use that as a bargaining chip to 3x their pay. This also discourages trying to add any value that isn't as easily expressed in dollars (which is the case for many important things, such as product quality improvements).
I think part of the expectation when contracting somewhere long-term (or just being an employee, for that matter) is that the amount of value you add per hour worked increases sharply over time, and slower than your fee. In other words, initially you're overpaid wrt your value-add, and then that corrects itself over time as you figure out what the company is all about.
I've made similar pitches to clients many times, and one thing I've learned is that ironically the problem is promising the actual saving, vs. offering a much smaller saving.
The challenge is that people don't believe you when you tell them they can save that much, no matter how evidence you prepare. I'm starting a sales effort for my agency right now, and one of the things we've worked on is to promise less than what we determine we can deliver after reviewing the clients costs, and raising our prices, because it's ironically easier to close on the basis of a promise to deliver 20%-30% savings at a relatively high cost than a promise to deliver 50%+ with little effort.
My current and last jobs had monthly RDBMS bills in excess of $1 million/month. It is staggering. We could buy two fully-loaded 42U racks in separate DCs and be net positive after a few months. I’ve done the math, in great detail.
No go. “It’s hard to hire for that skill set.” Is it $9 million/year hard?! You already have a team lead – me. This shit is not that hard; people will figure it out, I promise.
Is 50% that bad? If instead you hire engineer to maintain access to some kind of file storage on the internet, would it cost more or less?
Would be alarming if it is 500% the staff salary, but at 50% that just seems the cost of outsourcing to standard that likely won't be achieved in house.
Considering it was about 100 developers, it was horrible.
The two major problems were:
1. The volume of data itself was not that that big (I had a backup on my laptop for reproductions), but it was just too heavy for even the biggest things in AWS. Downtimes were very frequent. This is mostly due to decisions from 10 years ago.
2. Teams constantly busy putting out fires but still getting only 1-2% salary increases due to lack of new features.
EDIT: Since people like those war stories. The major cause for the performance issues was that each request from an internal user would sometimes trigger hundreds of queries to the database. Or worse: some GET requests would also perform gigantic writes to the Double-Entry Accounting system. It was very risky and very slow.
This was mostly due to over-reliance on abstractions that were too deep. Nobody knew which joins to make in the DB, or was too afraid, so they would instead call 5 or 6 classes and joining manually causing O(N^2) issues.
To give a dimension of how stupid it was: one specific optimization I worked on changed the rendering time of a certain table from 25 seconds to 2 miliseconds. It was nothing magic.
That does sound like an engineering problem more than anything.
On an off note, migrating to nosql might not have a lot of on paper benefit, but it does enforce developers to design their table and queries in a way that prevents this kind of query hell. Which might be worth it on its own.
How does NoSQL (and which flavor are you referring to?) enforce that? RDBMS enforces it in that if you don’t do it correctly, you get referential integrity violations and performance issues. You’d think that would be enough to motivate devs to learn it, but no, let’s use more JSON columns!
It's the human aspect of engineering, you can't join 15 different tables just by running an 200 line SQL command in nosql and this manual burden forces a re-thinking in what the acceptable design is.
relational DB is great, but just like java design pattern, it's being abused because it could be. People are happy doing stuff like that because it was low resistance and low effort, with consequences building up in the long term.
In my example the abuse was on the OOP part, not in the relational database part.
Database joins were fine, they just weren’t being made in the database itself, due to absurd amounts of abstraction.
I don’t disagree that rethinking the problem with NoSql would solve it (or maybe even would have prevented it), but on the other hand I bet having 5 layers of OOP could also mess up a perfect NoSql design.
My experience from offering devops services on retainer to a number of clients is that the ones that host in cloud environments spend more money on me for similar scale setups than the ones that host on managed setups.
And even if you don't want the hassle of storing the data yourself, there are many far cheaper outsourced options than S3.
> the hundreds of millions they are raising will give them enough runway to survive a few years despite losing money for years.
It's more that the decision makers at every stage are not incentivized to care, or at least, were not during the ZIRP period. This is slowly changing, as evidenced by more and more talks of "cloud exits".
Software engineers are encouraged by the job market to fill their resume with buzzwords and overengineer their solutions.
Engineering managers are encouraged by the job market to increase their headcount, so complicated solutions requiring lots of engineers actually play in their favor.
CTOs are encouraged by the job and VC funding market to make it look like their company is doing groundbreaking things and solving complex problems, so overengineering again plays in their favor. The fact these problems are self-inflicted doesn't matter, because everyone is playing the same game and has no reason to call them out for it.
Cloud providers reward companies/CTOs for behaving that way by extending invites to their conferences, which gives the people involved networking opportunities and "free" exposure for the company to hire more engineers to fuel the dumpster fire even more.
Testing in particular is something I hate about AWS being the most egregious.
You don’t get any testing services baked into the pricing, you’re paying production pricing for setting up / tearing down environments for testing. They have little to nothing in the ways of running emulators locally for services and it leads to other solutions of varying quality.
It’s outrageous and something i will always hold against AWS forever. Not to mention their CDK is for shit. Their APIs are terrible and poorly documented. I don’t know why anyone chooses them still other than they seem to have the “nobody got fired for choosing AWS” effect.
Azure is really good at providing emulators for lots of their core services for local testing for instance. Firebase is too, though I can’t vouch for the wider GCP ecosystem
This is where your choice of which cloud services to use comes into play - Containerised web apps with Postgres on RDS? Simple to move off onto self hosting _if_ you can prove a business model that needs scaling. All-in on some proprietary services - less so.
> Even scaling would be somewhat of an issue depending on the tech stack. Imagine the cost of running standard Java micro-services and the "solution" was to "spin up hundreds of more nodes". The worst that I have seen was a bank proudly having up to 8,000 - 10,000 separate micro-services.
Just imagine the daily cost of that.
I'm not going to preach for thousands of micro-services necessarily, but they also make scaling easier and cheaper.
Not every service in your application receives the same load, and being able to scale up by increasing the 20% of Lambdas that receive 80% of the traffic, will result in massive savings too.
That is excessive and it's already $4K a day.
Lots of teams really underestimate cloud costs since there is an assumption that the hundreds of millions they are raising will give them enough runway to survive a few years despite losing money for years.
Even scaling would be somewhat of an issue depending on the tech stack. Imagine the cost of running standard Java micro-services and the "solution" was to "spin up hundreds of more nodes". The worst that I have seen was a bank proudly having up to 8,000 - 10,000 separate micro-services.
Just imagine the daily cost of that. Unjustifiable.
But of course the AWS cloud consultants would be happy to shill you their offerings at "cheap" prices, but in reality the pricing is designed for you to accumulate millions in costs as you scale on the tiniest amount of usage, even for testing.
So before you build the software, one must think about the costs of scaling if it becomes widely used rather than taking the easy approach in just spinning up nodes and increasing more costs and act as if you don't have the capital to solve the problem. You can only do that for so long until you don't.