Dual Controller HA™ is an extension to our existing SoftNAS Cloud® high availability solution, SNAP HA™. It is designed to provide high availability for a shared pool of object storage only.
Adding a device to a dedicated storage pool results in the pool being replicated in the usual way, via SyncImage and asynchronous SnapReplicate ZFS send/receive once per minute, ensuring a copy of the pool’s data is maintained on the target node. HA failover operates as always, with dedicated storage devices and pools on each node having their own distinct, non-shared data that requires replication for use in HA (original design of SNAP HA). SoftNAS SNAP HA™ provides NFS, CIFS and iSCSI services via redundant storage controllers. One controller is active, while another is a standby controller. As only one controller is active at a time, this can be considered single-controller HA.
Dual Controller HA™ on the other hand, only applies if a shared pool of object storage, such as AWS S3, or Azure Hot or Cool blob storage, is specified at storage pool creation. After adding object storage 'disks' via Disk Devices, and selecting Create in Storage Pools, the following dialog will appear. If Shared Storage is selected, Dual Controller HA™ will automatically be applied to the shared pool after SNAP HA™ is configured.
Shared pools operate very differently from dedicated pools from an HA perspective. First, underlying storage devices are shared across nodes. Such shared devices (e.g., S3 cloud disks, Azure Hot and Cool Blob storage) include their own data redundancy, and are typically accessed over a network connection, enabling it to be shared across two or more nodes (only two nodes are currently supported).
A second major difference is the take-over process for shared pools. Volume configuration files are replicated between both the primary and secondary controller (hence Dual Controller). Failover is initiated at the point the primary controller fails to reply to an IO request within the expected time frame.
During a take-over event, first the devices associated with a shared pool must be mounted by the target node (and sometimes disconnected or unmounted from the original node, if required by the device type). Next, the shared pool is imported using the ZFS import command (and verified the pool was imported successfully and is not degraded or faulted). The appropriate level of both debug/trace and info/error logging is provided in existing HA log files, to ensure it’s possible to troubleshoot and provide support in the field if errors or issues arise.
With this method of failover:
To determine if Dual Controller HA is right for your deployment, see Azure Getting Started: Choosing your HA Solution.
As stated above, Dual Controller HA does not change the way SNAP HA is configured, nor does it change how it operates for dedicated pools. SoftNAS has worked very hard to ensure that this feature is a seamless addition, with little to no change to existing functionality, or configuration.
Regardless of whether it is a shared pool or dedicated, the customer must first define a SnapReplicate™ relationship between the primary and secondary node, then add the SNAP HA relationship. In other words, there is no change to the SnapReplicate/SNAP HA process shown below.
Adding a device to a shared storage pool results in the pool being excluded (skipped) by SnapReplicate; i.e., the data on the underlying device is already shared across nodes, so there is no need to replicate shared storage pools. This involves a change in SnapReplicate’s “pool discovery” logic, forcing it to first read the sharedpools.xml file to get the list of shared pool names, then exclude those pools from the list of pools to be replicated (similar to how pool names not found on the target node get excluded).
This allows SnapReplicate and SNAP HA to function across both types of pools, and to differentiate between them. Existing SNAP HA customer installations continue to operate uninterrupted, and new SoftNAS instances can be paired with both Dual Controller HA shared storage pools and dedicated pools asynchronously replicating via "standard" SNAP HA simultaneously. This also ensures that regardless of which type of pool selected, the customer can confidently set up SNAP HA with the same documentation.
Having prepared the environment on both SoftNAS Cloud AWS instances, we can now set up high availability. The first step towards high availability in SoftNAS is to establish replication. SnapReplicate™ makes this as simple as completing a quick wizard.
To establish the secure SnapReplicate relationship between two SoftNAS Cloud® nodes, simply follow the steps given below:
The SnapReplicate/SNAP HA page will be displayed.
The source node must be able to connect via HTTPS to the target node (similar to how the browser user logs into StorageCenter using HTTPS). HTTPS is used to create the initial SnapReplicate configuration. Next, several SSH sessions are established to ensure two-way communications between the nodes is possible. This connection is established by providing the IP address.
SnapReplicate™ establishes a replication relationship, one that can be manually triggered or scheduled, but is not automated. For true high availability in a failover situation, SNAP HA™ must be configured as well.
Note: Configuration of SnapReplicate™ is a prerequisite to setup of SNAP HA™. If SnapReplicate™ is not configured, the Add SNAP HA™ button will be grayed out.
If you have not yet configured a notification email, the opportunity to provide one will be presented prior to continuing SNAP HA™. Provide an email address for support reports and logs to be sent to, and click OK.
The next screen depends upon whether your storage pool has made use of MSFT disks added from within the SoftNAS UI (as explained in Adding Block Storage via the SoftNAS UI), or if you added Azure Blob Storage disks, or added your block storage disks through the Azure Portal.
If you added Azure Blob Storage or used the Azure Portal to add your disks, then you would first have to provide Azure account credentials before being prompted to enter your Virtual IP Address.
If (as directed in this guide) you added Microsoft disks using the SoftNAS UI, you will have supplied Azure credentials already. In this case, the wizard will skip ahead to the Virtual IP screen. This is because your credentials are cached in order to speed up the process.
Click Finish on the Finish HA Setup screen.
To test, shut down one of the instances. The other will become primary after a few moments. Alternatively, select Actions, and Takeover to simulate a failover.