Intel® Cluster Checker 2.x includes the ability to extend its capabilities, allowing the user to create additional custom checks with the <generic> test module. The <generic> test module can evaluate:
- uniformity: execute a command on all nodes and confirm that the output is identical on all nodes. This is the default.
- correctness: execute a command on all nodes and evaluate that the output exactly matches an expected result.
This test module can be used to implement checks specific to a cluster or site, and are especially useful for temporary or one-time checks.
Configuration
To include the <generic> module, add the following tag as a direct child element of <cluster> in the configuration file.
<include_module>generic</include_module>
The module will also need to be configured. Individual tests are configured using the <test> element; one time for each test. Only two options are available for each <test>:
- <command> (required): the command to execute on all compute nodes. This can be a simple or compound command, or even a script.
- <result> (optional): the expected STDOUT of the executed command. STDOUT must exactly match the <result> value. If <result> is omitted, STDOUT of the executed command on each node must be identical.
For example:
<generic> <!-- Conform that all nodes run kernel "2.6.32.x86_64"--> <test> <command>uname -r</command> <result >2.6.32.x86_64</result > </test> <!-- Check that user micuser is configured identically on all nodes --> <test> <command>grep ^micuser /etc/passwd</command > </test> <!-- Repeat additional <test> elements as needed --></generic >
NOTE: The generic test cannot be executed as a superuser. If this is attempted, the test module will report a configuration error and a "Failed" result.
Script Execution
The generic module can be configured to call any command to which the user has execution privilege. This includes scripts, allowing the capabilities of the <generic> test module to be greatly extended. Since the <generic> module is capable of arbitrary command execution, care must be taken not to perform unwanted actions—deleting needed files for example.
Custom scripts must exit with a return code of "0" if execution is successful; otherwise, Intel® Cluster Checker will treat the result as a test failure. It is important to remember that all STDOUT produced by the script commands will be evaluated. Any output that does not need to be evaluated should be redirected to a log file.
Node Selection
Generic tests execute on all compute nodes, but not on "head" or "other" type nodes. It also does not execute on k1om architecture nodes. The <generic> test module does not support groups, so node selection cannot be modified in the configuration file.
Output
The results of the generic test are compared either to the expected result or to each other. Errors will include the STDOUT produced by each node. Increasing the verbosity to "3" or greater will also display the STDOUT produced by the <command> executed, even if no error occurs.
If the <command> produces no STDOUT, it will be treated as an error, regardless of uniformity or expected result.
Error handling
If the command used returns a non-zero value, the <generic> module will report “no output” regardless of actual results. For commands such as grep, this may produce erroneous results. If this happens, pipe final output through cat or a similar command.
To evaluate the actual return code of a command, prevent command output and echo the return code to STDOUT.
<command>command > /dev/null; echo $?</command>
It may also be advantageous to send the output of command to a log file.
Examples
The following examples are provided to demonstrate how the generic modules can be used. These tests are only examples and should not be considered the best or only means to perform these tests.
The <generic></generic> opening and closing tags have been left out. Multiple <test> tags can be used in a single <generic> module.
Check that Turbo mode is disabled on all Intel® Xeon Phi™ coprocessors using the micsmc command.
<test> <command>/usr/bin/micsmc --turbo status | grep enabled | cat</command> <result>0</result> </test>
For this test, searching for the desired value “disabled” produces a match for each coprocessor in a host. This prevents successful validation on a heterogeneous cluster. By searching for the opposite of the desired value, different numbers of coprocessors per host can be installed.
Confirm that the bridge device br0 is configured on all compute nodes with an IP address.
<test> <command>/sbin/ifconfig br0 | grep -c “inet addr”</command> <result>1</result> </test>
Confirm that the mic.ko module is configured identically for modprobe
<test> <command>grep "^options" /etc/modprobe.conf.d/mic.conf | sed 's/\s+/ /g'</command> </test>
Check firmware uniformity on coprocessors and record this information.
The output is recorded in the Intel® Cluster Checker database, but can also be displayed at execution time by setting verbosity to “3” or greater.
<test> <command>/usr/bin/micinfo | grep “Flash Version” | uniq</command> </test>
Other values can be added by including an additional <test> element for each additional value to be checked.