Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Symptoms

 wrong The wrong sizing selection may cause slowness, failovers and even outages, . In addition, it  may lead to a wide range of false alarms specially in HA mode, where both machines are unable to check each other.

...

When it comes to network measurements we are interested in the summation of the maximum of NetworkIn and NetworkOut, we can generate them using cloudwatch as below:

 


1-     From cloudwatch >> Metrics >> EC2

...

3-     Select NetworkIN and NetworkOut

 



4-     Click “Graphed metrics” tab, from “Statistics” Choose Maximum [for both metrics]

 

 


5-     Zoom into the area of interest and from the drop-down menu before “Actions” choose “Stacked area”

Image Modified


6-     Do that for both instances

 

 

 

 

 

 







II CPU:

For CPU, we are only interested in the maximum value for CPUUtilization metric, and you can draw it for both machines on the same chart as follows:

 


1-     From cloudwatch >> Metrics >> EC2

...

3-     Click “Graphed metrics” tab and from “Statistics” Choose “Maximum” for each metric

 

 




III EBS volumes

One of the major storage issues is the IOPS bottlenecks, when you are trying to pull/push a big amount of data through a tight hose, so you will suffer from high iowait time and a big increase in the memory that will badly effect the system.

...

3-     Repeat for VolumeWriteBytes

 

 



4-     From Graphed metrics click on the Statistics dropdown menu and check Sum

...

7-     Repeat for the second instance


 

 

Pre-support Analysis:

SoftNAS HA highly relies on the link between the two instances in addition to the proper communication between each instance and its storage backend and it uses it for checking the health of each machine, and if something is affecting the link between them [for example the very high NetworkIO or IOPS demand] that will lead to “Storage heartbeat failure” errors, NTP and IAM warnings and will also lead to false positive failovers especially in non-EBS optimized instances.

...

Note: Cloudwatch metrics are per Minute and the numbers in the link are per second

 



Additional Information

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/acw-ug.pdf

Update History

01-03-2018 Template Created