What is Dual Controller HA?
Dual Controller HA™ is an extension to our existing SoftNAS Cloud® high availability solution, SNAP HA™. It is designed to provide high availability for a shared pool of object storage only.
Adding a device to a dedicated storage pool results in the pool being replicated in the usual way, via SyncImage and asynchronous SnapReplicate ZFS send/receive once per minute, ensuring a copy of the pool’s data is maintained on the target node. HA failover operates as always, with dedicated storage devices and pools on each node having their own distinct, non-shared data that requires replication for use in HA (original design of SNAP HA). SoftNAS SNAP HA™ provides NFS, CIFS and iSCSI services via redundant storage controllers. One controller is active, while another is a standby controller. As only one controller is active at a time, this can be considered single-controller HA.
Dual Controller HA™ on the other hand, only applies if a shared pool of object storage, such as AWS S3, or Azure Hot or Cool blob storage, is specified at storage pool creation. After adding object storage 'disks' via Disk Devices, and selecting Create in Storage Pools, the following dialog will appear. If Shared Storage is selected, Dual Controller HA™ will automatically be applied to the shared pool after SNAP HA™ is configured.
Shared pools operate very differently from dedicated pools from an HA perspective. First, underlying storage devices are shared across nodes. Such shared devices (e.g., S3 cloud disks, Azure Hot and Cool Blob storage) include their own data redundancy, and are typically accessed over a network connection, enabling it to be shared across two or more nodes (only two nodes are currently supported).
A second major difference is the take-over process for shared pools. Volume configuration files are replicated between both the primary and secondary controller (hence Dual Controller). Failover is initiated at the point the primary controller fails to reply to an IO request within the expected time frame.
During a take-over event, first the devices associated with a shared pool must be mounted by the target node (and sometimes disconnected or unmounted from the original node, if required by the device type). Next, the shared pool is imported using the ZFS import command (and verified the pool was imported successfully and is not degraded or faulted). The appropriate level of both debug/trace and info/error logging is provided in existing HA log files, to ensure it’s possible to troubleshoot and provide support in the field if errors or issues arise.
With this method of failover:
- Very little data needs to be transferred for fail-over to occur.
- There is no need to create duplicate pools of already resilient object storage.
- No potential loss of transactional data occurs due to standard SNAP HA asynchronous replication delays.
To determine if Dual Controller HA is right for your deployment, see Azure Getting Started: Choosing your HA Solution.
No change to Dedicated Pools
As stated above, Dual Controller HA does not change the way SNAP HA is configured, nor does it change how it operates for dedicated pools. SoftNAS has worked very hard to ensure that this feature is a seamless addition, with little to no change to existing functionality, or configuration.
Regardless of whether it is a shared pool or dedicated, the customer must first define a SnapReplicate™ relationship between the primary and secondary node, then add the SNAP HA relationship. In other words, there is no change to the SnapReplicate/SNAP HA process shown below.
Adding a device to a shared storage pool results in the pool being excluded (skipped) by SnapReplicate; i.e., the data on the underlying device is already shared across nodes, so there is no need to replicate shared storage pools. This involves a change in SnapReplicate’s “pool discovery” logic, forcing it to first read the sharedpools.xml file to get the list of shared pool names, then exclude those pools from the list of pools to be replicated (similar to how pool names not found on the target node get excluded).
This allows SnapReplicate and SNAP HA to function across both types of pools, and to differentiate between them. Existing SNAP HA customer installations continue to operate uninterrupted, and new SoftNAS instances can be paired with both Dual Controller HA shared storage pools and dedicated pools asynchronously replicating via "standard" SNAP HA simultaneously. This also ensures that regardless of which type of pool selected, the customer can confidently set up SNAP HA with the same documentation.
Having prepared the environment on both SoftNAS Cloud AWS instances, we can now set up high availability. The first step towards high availability in SoftNAS is to establish replication. SnapReplicate™ makes this as simple as completing a quick wizard.
To establish the secure SnapReplicate relationship between two SoftNAS Cloud® nodes, simply follow the steps given below:
- Log into the source controller's (the first instance within which you created the CIFS enabled volume) SoftNAS StorageCenter™ administrator interface using a web browser.
- In the Left Navigation Pane, select the SnapReplicate™/SNAP HA™ option.
The SnapReplicate/SNAP HA page will be displayed.
- Click the Add Replication button in the Replication Control Panel.
The Add Replication wizard will be displayed. Read the instructions on the screen and then click the Next button.
- In the next step, enter the IP address or DNS name of the remote, target SoftNAS Cloud® controller node in the Hostname or IP Address text entry box. Note that by specifying the replication target's IP address, you are specifying the network path the SnapReplicate™ traffic will take.
The source node must be able to connect via HTTPS to the target node (similar to how the browser user logs into StorageCenter using HTTPS). HTTPS is used to create the initial SnapReplicate configuration. Next, several SSH sessions are established to ensure two-way communications between the nodes is possible. This connection is established by providing the IP address.
- Next, provide the username (softnas) and the password (if default, this is the instance id) of the target instance. Type the password again to verify, then click Next.
The IP address/DNS name and login credentials of the target node will be verified. If there is a problem, an error message will be displayed. Click the Previous button to make the necessary corrections and then click the Next button to continue.
- Read the final instructions and then click the Finish button.
The SnapReplicate™ relationship between the two SoftNAS Cloud® controller nodes will be established. The corresponding SyncImage of the SnapReplicate™ will be displayed.
After data from the volumes on the source node is mirrored to the target, once per minute SnapReplicate™ transfers keep the target node hot with data block changes from the source volumes.
The tasks and an event log will be displayed in the Replication Control Panel section. This indicates that a SnapReplicate™ relationship is established and that replication should be taking place.
Configuring SNAP HA™
SnapReplicate™ establishes a replication relationship, one that can be manually triggered or scheduled, but is not automated. For true high availability in a failover situation, SNAP HA™ must be configured as well.
- To configure SNAP HA™, complete the SNAP HA wizard, beginning by clicking Add SNAP HA™.
Note: Configuration of SnapReplicate™ is a prerequisite to setup of SNAP HA™. If SnapReplicate™ is not configured, the Add SNAP HA™ button will be grayed out.
If you have not yet configured a notification email, the opportunity to provide one will be presented prior to continuing SNAP HA™. Provide an email address for support reports and logs to be sent to, and click OK.
- Click Add SNAP HA™ once more if this occurs. The Add High Availability wizard will begin.
The next screen depends upon whether your storage pool has made use of MSFT disks added from within the SoftNAS UI (as explained in Adding Block Storage via the SoftNAS UI), or if you added Azure Blob Storage disks, or added your block storage disks through the Azure Portal.
If you added Azure Blob Storage or used the Azure Portal to add your disks, then you would first have to provide Azure account credentials before being prompted to enter your Virtual IP Address.
If (as directed in this guide) you added Microsoft disks using the SoftNAS UI, you will have supplied Azure credentials already. In this case, the wizard will skip ahead to the Virtual IP screen. This is because your credentials are cached in order to speed up the process.
- Here you will create and add an IP Address that is not in the same CIDR block as the instances. (In simplest terms, ensure that the IP address does not start with the same numbers as the two instances.) Click Next.
Click Finish on the Finish HA Setup screen.
- Your SNAP HA™ pairing is created.
To test, shut down one of the instances. The other will become primary after a few moments. Alternatively, select Actions, and Takeover to simulate a failover.