Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Not my problem. Fastly should work as intended.

What's your SLA with them?

Just assuming things will always work because the marketing copy said so is recipe for disaster. It's hoping that things never go wrong, and when they inevitably do, being caught pants down.

Everything fails sometimes. You must know how much your SaaS provider contractually promises, ensure that any SLA breach is something financially acceptable for you, and ensure that you can handle failure time within SLA.



> What's your SLA with them?

Sorry what?

You've just witnessed almost the entire internet break because of a catastrophic cascading outage that affected lots of huge companies, since third party services used and trusted Fastly.

Shopify stores couldn't accept payments on their websites, Coinbase Retail/Pro transactions and trading apps failed to load, and delivery apps stopped loading all of a sudden. These are just a few that this outage has caused, and now you are trying to blame this onto me for not checking their SLA when millions were indirectly affected by this?

Fastly offered a product, their main product which is a CDN which took down lots of websites. I don't care if everything fails sometimes. There are sites that should NOT go down because of this configuration issue which they messed up.


> I don't care if everything fails sometimes

You can say you don't care for reality, but it's not going to help you have better systems.

> There are sites that should NOT go down

Then they surely either engineered their system to not 100% rely on Fastly or negotiated appropriate terms with Fastly (Or decided Fastly going down was an acceptable business risk, which it is for nearly everybody). Everything else would be negligent, and surely nobody would be negligent when operating a site that "should NOT go down"?


> You can say you don't care for reality, but it's not going to help you have better systems.

No where in my sentence I said this so quit the strawman argument.

I know a client using a service that has 100% uptime for the year, that also relies on huge clients, I don't understand why Fastly can't guarantee at the very least and a failover system to counteract this, but clearly didn't work. (or even existed)

> (Or decided Fastly going down was an acceptable business risk, which it is for nearly everybody).

Then why did this cascade to almost everybody even indirectly? Surely their advertised failover system would have prevented this from prolonging further but lasted longer than it should have.

I don't think a store, exchange or trading desk not accepting payments from people for an hour is acceptable at all.


> You've just witnessed almost the entire internet break because of a catastrophic cascading outage that affected lots of huge companies, since third party services used and trusted Fastly.

Blame the companies that relied on Fastly being up 100% of the time, even though Fastly explicitly states that they might be down any number of hours, and they will even give you money back for that [1]. If they did offer 100% SLA, it would probably be out of budget for most users, as that kind of systems are prohibitively expensive to run.

Depending on a single CDN like Fastly is building an SPOF into your product. It is not less of a design blunder that whatever Fastly did internally to have an outage. If Shopify lost millions because of a short, simple third-party outage they have at least as much of a high-priority postmortem to write and issues to address as Fastly.

[1] - https://docs.fastly.com/products/service-availability-sla


The main problem is that they had a failover system, the mystery is where was it in this outage?

Why didn't this trigger? where was this system in place to prevent further cascading failures?

> Blame the companies that relied on Fastly

So it's everybody's fault Fastly went down now? That is a new one.


If companyA got affected by this, then either: 1- Its companyA's fault for not having a contingency plan or 2- Its companyA's accepted risk that this might happen.

We understand you're upset and passionate about this, perhaps now when more information has been published you understand better the circumstances that caused this problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: