The fault is theirs and they have said that they have failover, this worldwide outage caused by them just goes to show you that Fastly does not actually have a failover system in place.
> "Fastly’s network has built-in redundancies and automatic failover routing to ensure optimal performance and uptime." - status.fastly.com
Even their status page was down. Very embarrassing, Fastly did not work as advertised and mislead its customers.
Edit: Offended flaggers circling around silencing misled Fastly customers. How pathetic.
Even when they said this was a rare [0] case, they knew this case should be handled, but didn't handle it.
> or in the extremely rare case our network isn’t serving traffic.
reports also came in that this was a service configuration[1] issue, so not only there is no failover system, not even any validation automation was in place that could have prevented this.
So why didn't the 'automatic failover' kick in during the outage? Where was it then? I don't see anything about 're-routing traffic' anywhere in the status page [0]
We don't know, but the usual scenarios would be "issue impacts failover mechanism too", "failover mechanism overloads other system components leading to cascading failure" or "something causes failover mechanism to to think all is fine".
So, the rarest of cases (our network isn’t serving traffic) just happened right now, and their failover system just took a snooze then, but 'it exists apparently' according to you.
Tell that the huge clients that lost sales because of this, and all you have to say is: "wE DoN'T kNoW..."
Not the point. They were also told that a failover system would kick in and re-route traffic had there been any issues, but this was where to be seen.
A worldwide outage happened that affected almost all locations and everybody, so actually SLA is meaningless in this case. Where was the extra redundancy? Where was the failover system? Why was other companies indirectly affected?
As far as I know Fastly's status page was even down during the outage, the fact that the best answer to this 'is we don't know' tells you everything you need to know. Maybe stop victim blaming this situation and focus on the main culprit.
Just assuming things will always work because the marketing copy said so is recipe for disaster. It's hoping that things never go wrong, and when they inevitably do, being caught pants down.
Everything fails sometimes. You must know how much your SaaS provider contractually promises, ensure that any SLA breach is something financially acceptable for you, and ensure that you can handle failure time within SLA.
You've just witnessed almost the entire internet break because of a catastrophic cascading outage that affected lots of huge companies, since third party services used and trusted Fastly.
Shopify stores couldn't accept payments on their websites, Coinbase Retail/Pro transactions and trading apps failed to load, and delivery apps stopped loading all of a sudden. These are just a few that this outage has caused, and now you are trying to blame this onto me for not checking their SLA when millions were indirectly affected by this?
Fastly offered a product, their main product which is a CDN which took down lots of websites. I don't care if everything fails sometimes. There are sites that should NOT go down because of this configuration issue which they messed up.
You can say you don't care for reality, but it's not going to help you have better systems.
> There are sites that should NOT go down
Then they surely either engineered their system to not 100% rely on Fastly or negotiated appropriate terms with Fastly (Or decided Fastly going down was an acceptable business risk, which it is for nearly everybody). Everything else would be negligent, and surely nobody would be negligent when operating a site that "should NOT go down"?
> You can say you don't care for reality, but it's not going to help you have better systems.
No where in my sentence I said this so quit the strawman argument.
I know a client using a service that has 100% uptime for the year, that also relies on huge clients, I don't understand why Fastly can't guarantee at the very least and a failover system to counteract this, but clearly didn't work. (or even existed)
> (Or decided Fastly going down was an acceptable business risk, which it is for nearly everybody).
Then why did this cascade to almost everybody even indirectly? Surely their advertised failover system would have prevented this from prolonging further but lasted longer than it should have.
I don't think a store, exchange or trading desk not accepting payments from people for an hour is acceptable at all.
> You've just witnessed almost the entire internet break because of a catastrophic cascading outage that affected lots of huge companies, since third party services used and trusted Fastly.
Blame the companies that relied on Fastly being up 100% of the time, even though Fastly explicitly states that they might be down any number of hours, and they will even give you money back for that [1]. If they did offer 100% SLA, it would probably be out of budget for most users, as that kind of systems are prohibitively expensive to run.
Depending on a single CDN like Fastly is building an SPOF into your product. It is not less of a design blunder that whatever Fastly did internally to have an outage. If Shopify lost millions because of a short, simple third-party outage they have at least as much of a high-priority postmortem to write and issues to address as Fastly.
If companyA got affected by this, then either:
1- Its companyA's fault for not having a contingency plan
or
2- Its companyA's accepted risk that this might happen.
We understand you're upset and passionate about this, perhaps now when more information has been published you understand better the circumstances that caused this problem.
The fault is theirs and they have said that they have failover, this worldwide outage caused by them just goes to show you that Fastly does not actually have a failover system in place.
> "Fastly’s network has built-in redundancies and automatic failover routing to ensure optimal performance and uptime." - status.fastly.com
Even their status page was down. Very embarrassing, Fastly did not work as advertised and mislead its customers.
Edit: Offended flaggers circling around silencing misled Fastly customers. How pathetic.