SoftNAS Recommended Best Practices for Maximum Data Protection
The following provides guidance on best practices to ensure your data remains safe. Each of the following tips can help to protect your data, based on your particular circumstances. Select the guidance that best fits your use case.
1) Device Failure Resiliency
Use RAID 10 instead of RAID 0 for high-performance block storage – RAID 10 provides the best balance of both read and write IOPS with data redundancy. With RAID 10, ZFS can actually continue running in the face of block device (e.g., EBS) failures because it has data mirrored. Unfortunately, we are seeing an increase in EBS disk failures, especially in US East region, perhaps due to some combination of aging infrastructure combined with heavy usage. RAID 10 doubles the underlying storage costs, but data protection trumps cost for high-performance workloads.
Use RAIDZ1/2 instead of RAID 0 for read-intensive block storage – RAIDZ provides excellent read IOPS, but write IOPS are only as fast as a single VDEV (which is why RAID 10 is preferred when high data ingest rates are also required).
- RAID 0 – if one chooses RAID 0, you are relying 100% on the absence of EBS disk failures. The documented EBS annual failure rates (AFR) are 0.1% and 0.5%. When an EBS disk fails underneath a pool configured in RAID 0, data loss will occur! We are seeing increased reports of EBS failures, so we all need to be aware of this and ensure we and our customers are properly prepared. Choosing RAID 0 must be an informed decision, and says that you understand and agree that there is a 0.5% chance per year you will have an EBS disk failure (lost disk) and that this is an acceptable risk. For these reasons, RAID 0 is not a recommended configuration.
2) High Availability
SNAP HA provides a redundant copy of data for primary storage use cases. This helps mitigate potential EBS (RAID 0) failures, but it does not completely eliminate all EBS risks. We have recently observed a situation where EBS failed and took both HA nodes down, so much so that the instances would not even reboot without AWS Support assistance. Statistically, it should be extremely uncommon for EBS failures to occur simultaneously in two zones; however, we have our first confirmed incident.
We also strongly recommend that our customers reach out to SoftNAS in the event of an HA failover event, in order to analyze the event. Not only does a joint investigation help you to find preventable root causes and get our clients back to a fully redundant configuration as quickly as possible., it also helps our team to implement safeguards and best practices that will benefit all our customers.
3) Offsite DR Copy
DR Node – we highly recommend using a 3rd SoftNAS node, running with storage snapshots enabled, running in a different Region (datacenter), whereby all files are synchronized using RSYNC from the primary HA pair to the DR node. This ensures that there are always copies of all files being replicated to the DR node, which also has snapshots, which provides a near real-time DR backup copy in case an entire region failure occurs (or data corruption or multi-zone EBS failure, etc.). We see this configuration commonly used by our largest enterprise customers for good reasons. We also have an rsync script available to use in implementing this easily.
Protected DR Node – for the ultimate in protection, customers should create a “protected” cloud account (in the same or another cloud provider) that has very limited access; i.e., requires different access credentials than the primary cloud account, in case those credentials are somehow compromised, allowing an attacker to delete everything in the cloud account (it has happened). In this situation, there is a protected DR copy residing within a completely separate cloud account (or on premise) that is write-only to the primary cloud account. This means that files can be copied into the protected account, but the user account used to write them does not have the right to delete them. This ensures there will always be at least one copy of the data in that protected account that cannot be deleted by a malicious user, employee or contractor, hacker, or errant application/script. This provides the highest level of data protection (aside from a backup that does the same thing).
Local Disk Copy
EBS Snapshots – inside Storagecenter™, we provide an EBS Backup capability that can be (manually) used to create a consistent set of disk snapshots to EBS. Unfortunately, this capability is only available on AWS as Azure does not yet provide storage snapshot copies. This is implemented as a script, which can be called as a cron job for additional data protection. Some customers use their own scripts and/or 3rd party tools to manage EBS snapshots. It is always a good idea to have copies of your data disks as yet another way to recover quickly when the chips are down; e.g., EBS disk failures.
Offsite File Backup Copy
File Backups – all customers must use some means of backing up their files. SoftNAS storage snapshots and HA are not a substitute for proper file backup hygiene. We routinely ask our customers if they have “file backup” procedures in place and advise them to do so. There are many different backup tools available today for the cloud in the marketplaces and elsewhere. There is no excuse for not having file level and/or disk level backups. Backups should be stored within a different, protected account (not the same cloud account), for the reasons mentioned in the “Protected DR Node” above).