Tuesday 23 October 2012

Job Engine


The job engine is a key part of OneFS and is responsible for maintaining the health of your cluster.

On a healthy cluster, the job engine should load at boot time and remain active unless manually disabled.  You can check that the job service is running via the below:

zeus-1# isi services isi_job_d
Service 'isi_job_d' is enabled.

The above shows that the service is enable, however running isi services by itself will not report the status of this services, to see (or modify) isi_job_d you will need to specify the -a option so that all services are returned. 

The minus -a option is a little verbose and returns 58 services as opposed to the default view of just 18, you might want to pipe the output through grep.

zeus-1# isi services -a | grep isi_job_d
isi_job_d            Job Daemon         Enabled

The below commands can be used to stop and start the job engine.

zeus-1# isi services -a isi_job_d disable
The service 'isi_job_d' has been disabled.

zeus-1# isi services -a isi_job_d enable
The service 'isi_job_d' has been enabled.
  
The isi job list command can be used to see all defined job engine jobs.  The below are the more common jobs that you may see running.

Name             Policy     Description                   
--------------------------------------------------------------------------------------
AutoBalance      LOW        Balance free space in the cluster.
Collect          LOW        Reclaims space that couldn't be freed due to node or disk issues.
FlexProtect      MEDIUM     Reprotect the file system.
MediaScan        LOW        Scrub disks for media-level errors.
MultiScan        LOW        Runs Collect and AutoBalance jobs concurrently.
QuotaScan        LOW        Update quota accounting for existing files.
SmartPools       LOW        Enforces SmartPools file policies.
SnapshotDelete   MEDIUM     Free space associated with deleted snapshots.
TreeDelete       HIGH       Delete a path in /ifs. 

In the above the Policy column refers to the schedule of the job and also the impact (the amount of CPU it can utilise).  Running isi job policy list will return the default scheduling.

zeus-1# isi job policy list
Job Policies:                                                                  
Name            Start        End          Impact    
--------------- ------------ ------------ ----------
HIGH            Sun 00:00    Sat 23:59    High

One of the key things to remember is that OneFS can only execute one policy at a time, as each job is scheduled a job ID is assigned to the job.  The job engine then executes the job with the lowest (integer) priority.  When two jobs have the same priority the job with the lowest job ID is executed first.

If for any reason you need a job other than the current running job to execute, you can either start a job with a low priority than any currently scheduled, or you can pause all currently scheduled jobs apart from the one you want to run.

To see the current running job, you can execute isi job status, this will also return information on paused , failed and recent jobs.

zeus-1# isi job status          

Running jobs:                                                                  
Job                        Impact Pri Policy     Phase Run Time  
-------------------------- ------ --- ---------- ----- ----------
AutoBalance[8]             Low    4   LOW        1/3   0:00:01

No paused or waiting jobs.
No failed jobs.

Recent job results:                                                                                                                                                                                                                        
Time            Job                        Event                         
--------------- -------------------------- ------------------------------
10/17 15:38:15  MultiScan[1]               Succeeded (LOW) 

When Jobs Fail To Run

There are a few situations where jobs wont run.  Firstly if there are no scheduled jobs, this is common on newly commissioned clusters where there is little or no data.  You can manually kick off a job to ensure everything runs as expected.

If you have jobs that are schedule but are not running then one of the below may be the reason.

A node is offline or has just rebooted.

The job engine will only run when all nodes are available, if a node has gone offline or if a node has only just booted then you may find that no jobs are running.

Coordinator node is unavailable.

The job engine relies on one of the nodes acting as a job coordinator node, this is usually the first node in the cluster) if this node is unreachable, heavily loaded or read-only then the jobs will be suspended.  You can identify the coordinator node and its health by running the below.

zeus-1# isi job status -r       
coordinator.connected=True
coordinator.devid=1
coordinator.down_or_read_only=False

The isi job history command can be used to see confirm when jobs last ran and how long they took.

--limit     [number of jobs to return, 0 returns all]
-v           [verbose output]
--job       [return information about a particular job type]
  
zeus-1# isi job history --limit=0 --job=AutoBalance -v
Job events:                                                                                                                                                                                                                               
Time            Job                        Event                         
--------------- -------------------------- ------------------------------
10/18 16:37:25  AutoBalance[8]             Waiting
10/18 16:37:25  AutoBalance[8]             Running (LOW)
10/18 16:37:25  AutoBalance[8]             Phase 1: begin drive scan
10/18 16:37:26  AutoBalance[8]             Phase 1: end drive scan
        Elapsed time:                        1 second
        Errors:                              0
        Drives:                              4
        LINs:                                3
        Size:                                0
        Eccs:                                0
10/18 16:37:27  AutoBalance[8]             Phase 2: begin rebalance
10/18 16:37:27  AutoBalance[8]             Phase 2: end rebalance
        Elapsed time:                        1 second
        Errors:                              0
        LINs:                              169
        Zombies:                             0
10/18 16:37:28  AutoBalance[8]             Phase 3: begin check
10/18 16:37:28  AutoBalance[8]             Phase 3: end check
        Elapsed time:                        1 second
        Errors:                              0
        Drives:                              0
        LINs:                                0
        Size:                                0
        Eccs:                                0
10/18 16:37:28  AutoBalance[8]             Succeeded (LOW) 

The job engine may log information /var/log/messages and /var/log/isi_job_d.log


Friday 19 October 2012

Upgrades, Removing Nodes & Re-image

OS Updates


With a monthly release schedule, upgrading the OneFS operating system could be seen as a regular task.  Fortunately Isilon support aren't insistent on customers running the latest build.  As with most thing, reviewing the build release notes will give you a good views as to whether a build should be deployed.

The upgrade process is very straightforward and can be driven from both the WebUI and the command line offering the same upgrade options.

The below covers the upgrade process from the command line

  • Request build from Isilon support
  • ssh to the cluster, download build (via ftp) to any directory below /ifs
  • cd into the directory containing the build and run isi update

Update Options


-r            [Nodes upgraded & restarted individually (rolling) so cluster remains online]
--manual      [Prompt before rebooting each node following the upgrade]
--drain-time  [Specify how long to give clients to disconnect before rebooting the node]

(Drain time is in seconds but can change to hours with h, days with d and weeks with w)


If you don't go with the rolling reboot, then you will be prompted to reboot the cluster once the update process has completed.

If you do go with a rolling upgrade then the upgrade process will loop through the nodes in sequential order starting with the node after the one you are on - os if you have a four node cluster and run the upgrade from node 2 then the upgrade will run in this order 3, 4, 1, 2.

zeus-1# isi update 
Connecting to remote and local upgrade processes...
         successfully connected to node [  1].
Loading image...

Please specify the image to update. You can specify the image from:
-- an absolute path (i.e. /usr/images/my.tar)
-- http (i.e. http://host/images/my.tar)
Please specify the image to update:/ifs/build/OneFS_v6.5.4.4_Install.tar.gz
Node version :v6.5.3.1032 B_6_5_3_QFE_1032(RELEASE) (0x605030000040800)
Image version:6.5.4.4 B_6_5_4_76(RELEASE) (0x60504000040004c)
Are you sure you wish to upgrade (yes/no)?yes
Please wait, updating...
Initiating IMDD...
         node[  1] initialized.
Verifying md5...
Installing image...
         node[  1] installed.
Restoring user changes...
         node[  1] restored.
Checking for Firmware Updates...
Firmware update checks skipped...
         node[  1] Firmware check phase completed.
Updating Firmware...
Firmware updates skipped...
         node[  1] Firmware update phase completed.
Upgrade installed successfully.
Reboot to complete the process? (yes/no [yes])yes
Shutting down services.
         node[  1] Services shutdown.

Update Failures


If the upgrade fails due to the update process timing out when a node fails to shutdown / reboot cleanly then you can restart the update process.  Just manually reboot the node that failed and launch the update  once more and OneFS will be intelligent enough to continue through the nodes it missed.

Three update logs are written to /var/log for each attempt to upgrade a cluster named update_engine, upgrade_engine and update_proxy, with update_engine probably being the most useful.


Removing Nodes


One of the great things about the Isilon platform is that you can add in nodes of different types (as long as you have a minimum of three of the same type).  This can make replacing older hardware with newer hardware generations much easier.

Removing a node is referred to as SmartFailing.  Multiple nodes can be SmartFailed at the same time, providing that that at least 50% + 1 node remain within the cluster.  For safety 50% + 2 would be a better maximum as you could then survive a further node failure during the SmartFail without the cluster going read-only.

SmartFail

zeus-1# isi devices -a smartfail -d 1

!! Node 1 is currently healthy and in use. Do you want to initiate the
!! smartfail process on node 1? (yes, [no])

>>> yes

A FlexProtect job will start a priority of 1, which will cause any other running jobs to pause until the SmarFail process completes.  The time to SmartFail a node will depend on a number of variables such as; node type, amount of data on node(s), capacity within cluster, average file size, cluster load and job impact setting.  Finger in the air would suggest 1 - 2 days per node.
isi status displays an S next to any nodes that are SmartFailing

hermese-1# isi stat -q
Cluster Name: hermese
Cluster Health:     [ATTN]
Cluster Storage:  HDD                 SSD           
Size:             13G (13G Raw)       0             
VHS Size:         0                  
Used:             159M (1%)           0 (n/a)       
Avail:            13G (99%)           0 (n/a)       

                   Health Throughput (bps)    HDD Storage      SSD Storage
ID |IP Address     |DASR|  In   Out  Total| Used / Size      |Used / Size
---+---------------+----+-----+-----+-----+------------------+-----------------
  1|172.30.0.100   |--S-|    0|    0|    0|  159M/  13G(  1%)|    (No SSDs)   
------------------------+-----+-----+-----+------------------+-----------------

 Cluster Totals:        |    0|    0|    0|  159M/  13G(  1%)|    (No SSDs)    


Failing The SmartFail

If during the SmartFail process you decide that you no longer want to fail the node, you can cancel the process by executing a StopFail.

hermese-1# isi devices -a stopfail -d 1

!! This node is currently in the process of being smartfailed. We
!! recommend that you allow the process to complete. Do you want to
!! abort the smartfail process for this node? (yes, [no])

>>> yes
'stopfail' action succeeded.


Re-image / Re-format

In certain scenarios (single-node test clusters) you might want to re-image a node, the isi_reimage command can be used to accomplish this.  When used in conjunction with with the -b options, it is possible to re-image the node with any build you have media for the node.

isi_reimage -b OneFS_v5.5.4.21_Install.tar.gz

The isi_reformat_node command can be used reset the configuration on a node, format the dirves and reimage.  The command performs a variety of functions such as checking ware on SSD drives before proceeding with the reformat.

isi_reformat_node with the --factory options will format / reimage the node, turn off the nvram battery and power off the node.  Useful if you are pulling a node for long-term storage or shipping to another site.  

As with isi_reimage, you don't want to run either of these command on a node that is a member of a multi-node cluster.