What is Dual Controller HA?
Dual Controller HA™ is an extension to our existing SoftNAS Cloud® high availability solution, SNAP HA™. It is designed to provide high availability for a shared pool of object storage only.
Adding a device to a dedicated storage pool results in the pool being replicated in the usual way, via SyncImage and asynchronous SnapReplicate ZFS send/receive once per minute, ensuring a copy of the pool’s data is maintained on the target node. HA failover operates as always, with dedicated storage devices and pools on each node having their own distinct, non-shared data that requires replication for use in HA (original design of SNAP HA). SoftNAS SNAP HA™ provides NFS, CIFS and iSCSI services via redundant storage controllers. One controller is active, while another is a standby controller. As only one controller is active at a time, this can be considered single-controller HA.
Dual Controller HA™ on the other hand, only applies if a shared pool of object storage, such as AWS S3, or Azure Hot or Cool blob storage, is specified at storage pool creation. After adding object storage 'disks' via Disk Devices, and selecting Create in Storage Pools, the following dialog will appear. If Shared Storage is selected, Dual Controller HA™ will automatically be applied to the shared pool after SNAP HA™ is configured.
Shared pools operate very differently from dedicated pools from an HA perspective. First, underlying storage devices are shared across nodes. Such shared devices (e.g., S3 cloud disks, Azure Hot and Cool Blob storage) include their own data redundancy, and are typically accessed over a network connection, enabling it to be shared across two or more nodes (only two nodes are currently supported).
A second major difference is the take-over process for shared pools. Volume configuration files are replicated between both the primary and secondary controller (hence Dual Controller). Failover is initiated at the point the primary controller fails to reply to an IO request within the expected time frame.
During a take-over event, first the devices associated with a shared pool must be mounted by the target node (and sometimes disconnected or unmounted from the original node, if required by the device type). Next, the shared pool is imported using the ZFS import command (and verified the pool was imported successfully and is not degraded or faulted). The appropriate level of both debug/trace and info/error logging is provided in existing HA log files, to ensure it’s possible to troubleshoot and provide support in the field if errors or issues arise.
With this method of failover:
- Very little data needs to be transferred for fail-over to occur.
- There is no need to create duplicate pools of already resilient object storage.
- No potential loss of transactional data occurs due to standard SNAP HA asynchronous replication delays.
To determine if Dual Controller HA is right for your deployment, see AWS Getting Started: Choosing your HA Solution.
No change to Dedicated Pools
As stated above, Dual Controller HA does not change the way SNAP HA is configured, nor does it change how it operates for dedicated pools. SoftNAS has worked very hard to ensure that this feature is a seamless addition, with little to no change to existing functionality, or configuration.
Regardless of whether it is a shared pool or dedicated, the customer must first define a SnapReplicate™ relationship between the primary and secondary node, then add the SNAP HA relationship. In other words, there is no change to the SnapReplicate/SNAP HA process shown below.
Adding a device to a shared storage pool results in the pool being excluded (skipped) by SnapReplicate; i.e., the data on the underlying device is already shared across nodes, so there is no need to replicate shared storage pools. This involves a change in SnapReplicate’s “pool discovery” logic, forcing it to first read the sharedpools.xml file to get the list of shared pool names, then exclude those pools from the list of pools to be replicated (similar to how pool names not found on the target node get excluded).
This allows SnapReplicate and SNAP HA to function across both types of pools, and to differentiate between them. Existing SNAP HA customer installations continue to operate uninterrupted, and new SoftNAS instances can be paired with both Dual Controller HA shared storage pools and dedicated pools asynchronously replicating via "standard" SNAP HA simultaneously. This also ensures that regardless of which type of pool selected, the customer can confidently set up SNAP HA with the same documentation.