Executing Intel® Cluster Checker 2.x through a job scheduler

Intel® Cluster Checker can be integrated with a job scheduler or resource manager. Special consideration must be made for node configuration. This article will show an example that works in OpenPBS or TORQUE.

Intel Cluster Checker stores node configuration as comments within its node list. The node list produced by a job scheduler lacks this configuration information. To work around this issue, it is necessary to combine the two files into a single runtime node list. Additional steps must be added to the job script prior to executing clck.

You must confirm that the hostnames in the Intel Cluster Checker node list and the scheduler's node list are identical. If using TORQUE, names in the Intel Cluster Checker node list must exactly match those output by the pbsnodes command.

When submitting the job, lock-out the entire node, regardless of processors or cores available. Intel Cluster Checker performance tests can impact execution performance of other apps running on the same node. If you cannot do this, then limit Intel Cluster Checker to a level 1 or level 2 health check only. That shouldn’t disturb other jobs running on the same node.

The job script will need to do the following:

Eliminate any duplicate lines on the job node list. TORQUE will output the hostname once for each process. Create a new temporary file instead of altering the list directly.
Alter the Intel Cluster Checker node list to use only nodes in the job node list. This can be created using grep.
Set the Intel Cluster Checker log directory. This saves debug files generated during execution.
Run clck using the temporary node list.

Example

Here is an example TORQUE 4.2.8 script for a level 4 health check. In this example, the Intel Cluster Checker node list is named “nodes.list” and the XML configuration file is named “config.xml”. Change these to match your file names.

#PBS -V
#PBS -N clckjob
#PBS -q queue_name
#PBS -e localhost:$HOME/clckjob.err
#PBS -o localhost:$HOME/clckjob.log
#PBS -l nodes=5
### Lock nodes to a single job
#PBS -l naccesspolicy=SINGLEJOB
#PBS -l walltime=00:05:00
### clck requires a Bourne-compatible shell
#PBS -S /bin/sh

### Clean up old runs
rm –f tmp_nodelist clck_joblist
### Create log directory
mkdir –p $HOME/CLCK_LOGS
### Set log environment
export CLCK_LOG_DIR= $HOME/CLCK_LOGS
### Sort scheduler node list and remove duplicate lines
sort $PBS_NODEFILE | uniq > tmp_nodelist
### Create updated node list; If a node is listed in tmp_nodelist
### then copy the entry from the configuration node list to the runtime node list
grep –Fwf tmp_nodelist nodes.list > clck_joblist
### Run Intel Cluster Checker
clck -t -L 4 -D -F clck_joblist –c $HOME/config.xml
echo "End of job"

Note: There is an unlikely, but possible scenario: two valid node names exist at the beginning and end of a line in the Intel Cluster Checker node list. This may cause a problem when using the example script. For example:

node01 #type : compute #Same hardware as node02

In this situation, the grep command will need to use a more thorough regular expression or the line in the node list will need to be changed. This change would resolve the problem:

node01 #type : compute
#node01 uses the same hardware as node02

Executing Intel® Cluster Checker 2.x through a job scheduler

Example

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112