Skip to end of metadata
Go to start of metadata

Symptoms

Without proper configuration, a SoftNAS instance leveraging S3-compatible cloud disk extenders can perform poorly. To get the best performance possible for a SoftNAS deployment with S3-compatible cloud disks, keep in mind the following:

Purpose

The following best practice article will provide you guidance for creating a high-performance SoftNAS instance leveraging S3 (or S3-compatible) cloud disk extenders. It will be organized according to the following categories: 

  • General
  • Sizing
  • CPU
  • RAM
  • Network

Resolution

General Guidelines

  • A Storage Pool should have a one-to-one correspondence with an S3 Cloud Disk in a JBOD Storage Pool. It is not recommended to use RAID-0, RAID-1, or RAID-Z with S3 Cloud Disks. Capacity expansion can be utilized to add additional capacity to a JBOD Storage Pool by adding another S3 Cloud Disk.
  • S3 Cloud Disk object storage should be located "near" the instance that is using the object storage. This means in the same region on public clouds and with minimal latency on private clouds.
  • Do not use a "block cache file" on SoftNAS Cloud versions that have support for that feature.
  • Block disks such as EBS or VMDK can be used to provide Read Cache and Write Log for Storage Pools which are backed by S3 Cloud Disks when different performance requirements exist.
  • A Storage Pool backed by S3 Cloud Disks should be configured for "Sync Mode" of "Standard"

Sizing

To size a solution involving Cloud Disk Extender one should consider the solution sizing based upon a block (VMDK/EBS) implementation and then add additional resources for the additional functions that must occur in order to provide the virtualization of S3-compatible storage as block storage. Additional processes run within the context of the SoftNAS instance that provide the functionality allowing access to each bucket for which the cloud disk extender has been configured.  Stated another way, the number of buckets that are configured via cloud disk extender influences the amount of additional resources that are required to access the same overall capacity of storage.

CPU

If using cloud disk extenders in your instance/s, it is important to configure your instance with additional processing power (CPU), above and beyond what is required for traditional block-based storage access. Presenting S3 storage as block-based storage requires a number of additional functions to be executed, including, for example, SSL/TLS key exchange and encryption, MD5 block computations, network stack processing, as well as optional encryption options. To avoid performance issues:

  • Do not use cloud disk extender on single vCPU instances.
  • 2 vCPU instances may be suitable for test scenarios. Two vCPU instances may still prove insufficient if your S3-compatible test/POC environment requires decent performance metrics. 
  • For a production environment, a minimum of 4 vCPU instances is highly recommended. Many workloads will perform better with additional vCPU. 
  • For each 75 MB/s of throughput required to perform the same task with block-based storage, an additional two vCPU is highly recommended. 
  • CPU utilization should be monitored during proof-of-concept and initial production stages to verify that sufficient CPU has been provisioned for the provided workload.
  • Monit email alerts should be monitored and indications of high CPU utilization should be reviewed with respect to the Cloud Disk Extender configuration.
  • If operating in a trusted environment, and available as an option for the S3-compatible object storage being used, CPU usage can be reduced by using http rather than https.
  • CPU usage can be further reduced by disabling optional encryption options.

Example:

A customer wants to use S3 object storage to save money over EBS. The current workload operates between 100-150MB/s of throughput and is running on an m4.xlarge instance. First, evaluate the current workload, to ensure that it averages a healthy 50% CPU usage.  To provide the same 150MB/s of S3 throughput, the general guideline requests 4 additional vCPU over and above the current instance's existing 4 vCPU base. As a result, the CPU recommendation points to an m4.2xlarge instance, in order to provide four additional vCPU.

RAM

As mentioned previously in this document, each instance of the cloud disk extender represents a process that is running inside of the SoftNAS instance for virtualizing the object storage as block storage.

  • Cloud Disk Extender should not be used in production on systems with less than 8GB of RAM.
  • Memory footprints less than 8GB of RAM may be suitable for test or PoC environments only.
  • A general guideline of 512MB of RAM should be provisioned above the normally required memory for a given workload.
  • Remember that half of the RAM is utilized for file-system caching. Additional resources are needed for the network file services and the base operating environment (~2GB of RAM).

Network

Cloud Disk Extender utilizes the network interface of an instance in order to access the object storage. Sufficient network bandwidth must be provisioned in order to reach maximum performance profiles using Cloud Disk Extender. When considering the desired available throughput to the object store also consider the amount of network throughput for network file services (NFS, CIFS, iSCSI, AFP) and SnapReplicate/SNAP HA which, in most configurations and platforms, all come from the same pool of available network bandwidth.

  • A somewhat safe calculation can be to determine the available network speed on the instance, and to divide it divided by 3, in order to calculate 1/3 for file services, 1/3 for replication, and 1/3 for object storage I/O.
  • When calculating, consider that SnapReplicate only replicates the write bandwidth, not the read bandwidth.
  • Be sure to convert properly between bits and bytes when comparing network throughput (usually expressed in bits) to disk throughput (usually expressed in bytes)
  • There is inherent overhead in the protocols used on the network (request/response, headers, checksums, control data, etc) such that full network saturation does not yield the full bandwidth as useful throughput. Consider only anticipating 90% of the link-speed as usable throughput.
  • Most clouds (and most data centers) do not provide full link-speed bandwidth on a sustained basis as systems are utilizing shared resources. Systems designed to run at fully provisioned capacity (of any metric) should be assigned to dedicated hosts rather than shared tenancy.

Example:

The customer uses NFS, SnapReplicate+SNAP HA, and would like to use object storage. Expected throughput is about 40MB/s with 90% reads. According to calculation, the network throughput for the source node reads as follows:

  • 4MB/s writes to NFS (incoming)
  • 36MB/s reads to NFS (outgoing)
  • 4MB/s writes to SnapReplicate (outgoing)
  • 4MB/s writes to Object Storage (outgoing)
  • 36MB/s reads to Object Storage (incoming)

Total: 40MB/s incoming 44MB/s outgoing

Calculating the total throughput in bits, this converts to 320mbps incoming and 352mbps outgoing.

According to calculation, the network throughput for the target node reads as follows:

  • 4MB/s writes from SnapReplicate (incoming)
  • 4MB/s writes to Object Storage (outgoing)

Total: 4MB/S incoming and 4MB/S outgoing

In bits, this works out to 32mbps incoming 32mbps outgoing.

A 100 mbps network connection is certainly not sufficient for this configuration, however, a 1gbps connection should be enough, even considering protocol overhead and avoiding 100% saturation of the network.

Platform-Specific Notes

Amazon Simple Storage Service (S3)

Transactional Pricing

Amazon S3 (and many other object storage providers) have a multi-faceted approach to pricing of object storage. While capacity is one component of cost, there are also charges for requests to create new objects and access existing objects. These transaction charges can add up such that the perceived cost savings of S3 vs. EBS are non-existent or even become expenses rather than savings. A SoftNAS Solutions Architect can assist customers in evaluating if S3-compatible storage is appropriate for specific applications.

Additional Information

Upgrades and Updated Recommendations

SoftNAS Cloud 3.4.6 includes improvements to the implementation of S3 Cloud Disk Extender. "Block Cache File" is no longer supported as a cache mechanism. After upgrading to 3.4.6 (or later) and rebooting, it is possible and recommended to delete "s3cachepool" pools that were only used for the block cache file storage. The block devices used for these pools can be reassigned as read cache, write log, or de-provisioned.

Additionally, after upgrading to 3.4.6, it is necessary to reboot the instance in order for all of the improvements to be installed. S3 Cloud Disks will continue to function, but until the system is rebooted not all of the improvements will have been applied.