Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't recall sites like Google or Facebook ever being down for maintenance. Are there any articles that discuss how they manage application layer and database layer migrations?


A good start would be all of http://highscalability.com - but it more or less boils down to being able to roll back: And that rules out hard schema changes. So the proper and hard way is always a variant of: 1) Create another column, 2) Write to both columns at the same time from the database, 3) Create code to run on the new column, 4) Enable feature switch to run everything on the new column, 5) Build back code dealing with old column, 6) Remove old column.

If that looks complicated, it is - and you better only start with these things if your site earns more money per minute than you need to pay engineers and project managers to pull that off.


This is correct, except your step 4. It should say something like: 4a) Enable feature switch on 1% of requests, ensure that they're working correctly. 4b) go to 10%. 4c) start rolling it out across all requests.


Both have the advantage of not having to present consistent data to end users.


I actually have seen Google down once, I might even have a screenshot. An immediate F5 (possibly after the screenshot) showed them back up. I'm not sure if it was my local Google office down (Israel), but the message was in English.


Google is so big some % of it is always down. Just low chances to hit it, and if you do a reload and you will hit another server.


Google has had a couple worldwide outages (most recently was about 2 years ago if memory serves). That's not "maintenance" per say, it's an emergency crisis outage. Downtime is definitely not always "maintenance".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: