Skip to end of metadata
Go to start of metadata

Symptoms


SoftNAS Appliance in a frozen or hung state during normal operation due mostly to high CPU and MEMORY utilization

 Purpose

In this KB we would discuss the various ways you can recover a hung SoftNAS instance without a reboot. However, to effectively recovery the SoftNAS instance without a reboot one must know exactly what is the culprit of the issue to begin with.

The three most import things to look for when you notice that your SoftNAS appliance is performing poorly or unresponsive are:

  1.  MEMORY (RAM) :- A system can go out of memory or even swap if there are a lot of read/write operations going on at the same time requiring extra CPU and RAM to process
  2. CPU :- As we know the CPU is the heart of the system, when it is over utilized it can cause so many problems that are but not limited to system lockups, Slow response time or even system crash etc.
  3. PROCESSES: Sometimes Linux processes can go rogue, consuming a noticeable amount of resources hence staving other processes to function properly.

Resolution

 Launch an ssh session (PUTTY or your favorite Client)

First let's take a look at the system's overall health by running the commands below:

  • uptime :- This will tell us the current load on the system
  • htop :- This command will give you graphical like representation of the resources that are taking up more CPU and MEMORY. It also give you Memory, swap usage and load average. Please refer to    figure 1 below for example:
  • ps --forest -eo user,pid,ppid,%cpu,cmd : - This command will give us a breakdown of the processes that are utilizing more memory and CPU along with their child processes. It should make it easier    to kill them should in case you need to
  • ps aux --sort -rss | head -n 20 :-  will also give you the list of the first 20 processes that are consuming more CPU and RAM



  Figure 1.



  1. Now that we've identified the processes that are taking up most of the resources it is time to get rid of some of them to bring down the system load. You can use the command below to kill some of the Protocol services (CIFS and NFS)
    # kill <process id>
    # kill -9 <process id>

  2. Then you can restart the NAS services
    /var/www/softnas/scripts/start-nasservices.sh

  3. Another very important thing to look for would be to check if you are not running out of file descriptors (FDs). Running out of FDs will render your system unusable
    grep max /var/log/messagesYou should get something like the output below

    Feb 19 06:42:18 Localhost kernel: VFS: file-max limit 1000000 reached

    Feb 19 06:42:18 Localhost kernel: VFS: file-max limit 1000000 reached

    Feb 19 06:42:18 Localhost kernel: VFS: file-max limit 1000000 reached

    Feb 19 06:42:18 Localhost kernel: VFS: file-max limit 1000000 reached

    Feb 19 06:42:18 Localhost kernel: VFS: file-max limit 1000000 reached

    Feb 19 06:42:18 Localhost kernel: VFS: file-max limit 1000000 reached

    Feb 19 06:42:18 Localhost kernel: VFS: file-max limit 1000000 reached

    Feb 19 06:42:18 Localhost kernel: VFS: file-max limit 1000000 reached

    Feb 19 06:42:18 Localhost kernel: VFS: file-max limit 1000000 reached

    Feb 19 06:42:18 Localhost kernel: VFS: file-max limit 1000000 reached

  4. If you get any output above after running the command in step #4 then it is time to increase your file descriptors. Below are the commands to increase file descriptors

    sysctl -w fs.file-max=1000000

    then verify by doing 'cat /proc/sys/fs/file-max'

    To make it permanent, add this to /etc/sysctl.conf

    Edit this line in /etc/sysctl.conf fs.file-max = 500000 to fs.file-max = 1000000 for example

    End result: fs.file-max=1000000

    # sysctl -p :- To commit the change without a reboot

Additional Information

You can read a great deal of performance related problems that can affect the overall functionality of SoftNAS here


Update History


 

 KB Created 
  
  

  • No labels