I don't recall sites like Google or Facebook ever being down for maintenance. Ar...

endymi0n · on Nov 27, 2016

A good start would be all of http://highscalability.com - but it more or less boils down to being able to roll back: And that rules out hard schema changes. So the proper and hard way is always a variant of: 1) Create another column, 2) Write to both columns at the same time from the database, 3) Create code to run on the new column, 4) Enable feature switch to run everything on the new column, 5) Build back code dealing with old column, 6) Remove old column.

If that looks complicated, it is - and you better only start with these things if your site earns more money per minute than you need to pay engineers and project managers to pull that off.

jholman · on Nov 28, 2016

This is correct, except your step 4. It should say something like: 4a) Enable feature switch on 1% of requests, ensure that they're working correctly. 4b) go to 10%. 4c) start rolling it out across all requests.

tyingq · on Nov 27, 2016

Both have the advantage of not having to present consistent data to end users.

dotancohen · on Nov 27, 2016

I actually have seen Google down once, I might even have a screenshot. An immediate F5 (possibly after the screenshot) showed them back up. I'm not sure if it was my local Google office down (Israel), but the message was in English.

brianwawok · on Nov 27, 2016

Google is so big some % of it is always down. Just low chances to hit it, and if you do a reload and you will hit another server.

jsjohnst · on Nov 27, 2016

Google has had a couple worldwide outages (most recently was about 2 years ago if memory serves). That's not "maintenance" per say, it's an emergency crisis outage. Downtime is definitely not always "maintenance".