Intel® Cluster Checker verifies the configuration and performance of Linux based clusters and checks compliance with the Intel® Scalable System Framework architecture specification. If issues are found, Intel® Cluster Checker diagnoses the problems and may provide recommendations on how to repair the cluster.
Intel® Cluster Checker has the following features:
- Dynamic detection of cluster configuration, operation, and performance issues.
- Problem diagnoses with severity and confidence levels.
- On-demand data collection.
Intel® Cluster Checker is installed as part of the following suites:
- Intel® Parallel Studio XE 2018 Cluster Edition.
- For Intel® Scalable System Framework partners, as a stand-alone product.
The following flowchart represents the usage model for working with the Intel® Cluster Checker.
Prerequisites
- Install Intel® Cluster Checker using the bundled installer.
- We recommend running the tool as a non-root user. Before using Intel® Cluster Checker for the first time, the runtime environment must be setup. Two files are included to setup the runtime environment, clckvars.sh for shells with Bourne syntax and clckvars.csh for shells with C-shell syntax. Source the appropriate file from the command line. For example:
source /opt/intel/clck/2018.0/bin/clckvars.sh
- Create a text file that lists the compute nodes in the cluster using one hostname per line. In these examples, this file is named "nodefile". Here is an example for one head node and four compute nodes:
frontend #role: head
node1
node2
node3
node4
For detailed system requirements, see the "System Requirements" section in the Intel® Cluster Checker Release Notes
Step 1: Collect data
Run the following from a command line. nodefile should be in a shared & writeable location.
clck-collect -a -f nodefile
Step 2: Analyze the data
Run this from a command line:
clck-analyze -f nodefile
Resolve any issues reported in step 2 and repeat steps 1 and 2 until you are satisfied with the results.
By default, diagnosed signs are not included in the analyzer output. If the analyzer reports issues, then it will be beneficial to output diagnosed signs on subsequent runs. More data about signs and diagnoses can be found in the User's Guide. Run this from a command line to print diagnosed signs:
clck-analyze -f nodefile -p diagnosed_signs
There will be occasions where modifications of the default XML configuration file are needed. This can happen when more output is desired, test parameters need to be modified, the log level must be changed, etc. More information can be found in the User's Guide.
Troubleshooting/FAQ
Files will be installed into /opt/intel/clck/2018.0.
- For help with the collector, run:
clck-collect --help
- For help with the analyzer, run:
clck-analyze --help
- To view collected data, use the database query tool.
For help with the query tool, run:clckdb --help
- To customize the analysis behavior:
Make a copy of the default XML file.cp /opt/intel/clck/2018.0/etc/clck.xml ~
Edit the XML file options.
To use a custom XML file with the analyzer, run the following (if the custom XML file is named "~/clck.xml"):clck-analyze -f nodefile -c ~/clck.xml
Documentation and Resources
All of the following documents can be found at https://software.intel.com/en-us/intel-cluster-checker-support/documentation:
Document | Description |
---|---|
Intel® Cluster Checker Developer’s Guide | Contains a breakdown of the following components: the knowledge base, the connector, and the database schema. |
Intel® Cluster Checker User's Guide | Contains a description of the product, including the following components and processes: the analyzer, knowledge base, connector, data collection, data providers, and the database schema. |
Intel® Cluster Checker Release notes | Contains a brief overview of the product, new features, system requirements, installation notes, documentation, known limitations, technical support, and the disclaimer and legal information. |