Meh. Losing sleep sounds like an over-reaction. No system is foolproof. Of course Fastly should do what they can to prevent downtime, but it's still expected that they will go down.
I would blame anyone who claimed otherwise or couldn't deal with it while not having a fallback.
I hear that you're suggesting that those involved shouldnt feel bad because its a systemic / just a job / etc. But the reality is that incidents like this can be very traumatic for those involved and thats not something they can control. If it was that simple to manage, depression and anxiety would not be a thing.
Think its best to show a large amount of support and empathy for the individuals having a really bad day today, and how awful they may feel. Some will probably end up reading this thread (I know I would).
And of course, still hold Fastly the business accountable for their response (but objectively, once we understand what the root cause was, and the long term solution).
I don't see how it's so traumatic for the engineers involved, unless the company culture in Fastly is really awful and there are punitive repercussions, or attempts to pin responsbility on individuals rather than systems, which I doubt.
Many here have been responsible for web service outages albeit on much smaller scales, and in my experience it feels awful while it's happening but you quickly forget about it because so does everyone else.
I guess it very much depends on your personality. I screwed up a a not very important project for a client 4 years ago while working at a different company, and I still feel bad when I think about it, despite the fact that my company had my back through the entire process and literally everybody involved has moved on and probably forgotten about it.
I wanted to show support to the engineers in the sense that I don't think you should encourage a working culture where you have "massive post-mortems" and expect people to feel bad for extended periods of time over simple mistakes. By not making a big deal out of it, you can also support your staff.
But I think our disagreement mainly stems from how we interpreted the parent comment. I thought it was very double, at one hand claiming to show support, at the other hand emphasizing how big of a catastrophy this was.
I just wanted to say that I think it most likely was a completely natural mistake, only exerbarated by the scale of the company, and that while you should take some action to prevent it in the future, you should not spend so much time dwelling on it. Shit happens, it's fine.
I think the government websites being down (UK ones for example) are the bigger issue. Reddit/Stackoverflow etc being down isn't that big of a deal imo.
I would blame anyone who claimed otherwise or couldn't deal with it while not having a fallback.