So what is the purpose of the massive level of redundancy that you are already paying for when you store a file on S3? I don’t think it’s terribly common for even medium sized companies to have a multi tier1 cloud backup strategy.
Back in the day, we used to talk a lot about how RAID is not a backup strategy. The modern version of that is that S3 is not a backup strategy.
> So what is the purpose of the massive level of redundancy that you are already paying for when you store a file on S3?
You're paying to try and ensure you don't need to restore from backups. Our data lives in an RDS cluster (where we pay for read replicas to try and make sure we don't need to restore from backups) and in S3 (where we pay for durable storage to try and make sure we don't need to restore from backups), but none of that is a backup!
If you're not on the AWS cloud S3 is a decent place to store your backups of course, but storing your backups on S3 when you're already on AWS is, at best, negligent, while treating the durability of S3 as a form of backups is simply absurd.
> I don’t think it’s terribly common for even medium sized companies to have a multi tier1 cloud backup strategy.
The company I work for is on the AWS cloud, so we store our backups on B2 instead. It's no more work than storing them on S3, and it means we still have our data in the event that we, for whatever reason, lose access to the data we have in S3. Who the hell doesn't have offsite backups?
> Back in the day, we used to talk a lot about how RAID is not a backup strategy. The modern version of that is that S3 is not a backup strategy.
This is not remotely the same thing. A RAID offers no protection against logical corruption from an erroneous script or even something as simple as running a truncate on the wrong table. Having a backup of your database in a different storage medium on the same cloud provider protects from vastly more failure modes.
> Who the hell doesn't have offsite backups?
No one. But S3 is already storing your data in three different data centers even if you have a single bucket in one region, and you also have SQL log replication to another region. Multi-region is as easy as enabling replication but that is only available within a single cloud provider (I can't replicate RDS to Google Cloud SQL, only to another RDS region). I would guess that a lot of people use that rather than using a different cloud provider.
> This is not remotely the same thing. A RAID offers no protection against logical corruption from an erroneous script [...] But S3 is already storing your data in three different data centers
That sounds like...the same argument?
A RAID array stores your data on multiple physical drives in the machine, but offers no protection against logical corruption (where you store the same bad data on every drive), destruction of the machine, or loss of access to the machine.
S3 stores your data in multiple physical data centres in the region, but offers no protection against logical corruption, downtime of the entire region, or loss of access to the cloud.
You can't count replicas as providing durability against any threat that will apply equally to all the replicas.
Storing a file on two tier1s would surely protect you from fire, water, theft no? Yet you will also be paying for all the extra copies Amazon and Google each make. I'm not disagreeing that this is the right strategy, just pointing out that the market offerings and trends don't support it.