This article is an extension of the existing How do I use Intel® MKL with Java*? article from Intel® MKL Knowledge Base, and we will be reusing the code samples mentioned there.
Here we will explore how we can make use of Intel® Xeon Phi coprocessor available in the system to speed up the mathematical computations by using Automatic Offload enabled functions in Intel® MKL. The Compiler Assisted Offload (CAO) model is not supported to offload computations using JNI and MKL.
Follow the steps 1,2 and 3 as explained in How do I use Intel® MKL with Java*?.
This article explains how to build and run on a Linux platform. Before, proceeding, you should have the Intel compilers, Intel MKL and Java paths to be set.
For easy reference, these are detailed here too.
To call cblas_dgemm from java application following steps should be performed:
Step 1. Create a java file with cblas dgemm description (CBLAS.java):
/*CBLAS.java*/
public final class CBLAS {
private CBLAS() {}
static {
System.loadLibrary("mkl_java_stubs"); /*load library (which will contain wrapper for cblas function. See step 3)*/
}
public final static class ORDER {
private ORDER() {}
/** row-major arrays */
public final static int RowMajor=101;
/** column-major arrays */
public final static int ColMajor=102;
}
public final static class TRANSPOSE {
private TRANSPOSE() {}
/** trans='N' */
public final static int NoTrans =111;
/** trans='T' */
public final static int Trans=112;
/** trans='C' */
public final static int ConjTrans=113;
}
public static native void dgemm(int Order, int TransA, int TransB, int M, int N, int K, double alpha, double[] A, int lda, double[] B, int ldb, double beta, double[] C, int ldc); /*inform java virtual machine that function is defined externally*/
}
Compile the CBLAS.java using the java compiler as below:
javac CBLAS.java
Step 2. Generate headers files from the java class file which was created in the previous step.
These headers should be used in C file in the next step.
javah CBLAS
Three header files will be generated: CBLAS.h, CBLAS_ORDER.h CBLAS_TRANSPOSE.h
Step 3. Use definition of Java_CBLAS_dgemm in header CBLAS.h that was generated in step 2 to write C file CBLAS.c.
Here’s generated header CBLAS.h – do not edit it!
/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class CBLAS */
#ifndef _Included_CBLAS
#define _Included_CBLAS
#ifdef __cplusplus
extern "C" {
#endif
/*
* Class: CBLAS
* Method: dgemm
* Signature: (IIIIIID[DI[DID[DI)V
*/
JNIEXPORT void JNICALL Java_CBLAS_dgemm(JNIEnv *, jclass, jint, jint, jint, jint, jint, jint, jdouble, jdoubleArray, jint, jdoubleArray, jint, jdouble, jdoubleArray, jint);
#ifdef __cplusplus
}
#endif
#endifDefinition of function Java_CBLAS_dgemm should be used in wrapper for MKL. Create C file CBLAS.c:
/*CBLAS.c*/
#include <jni.h>
#include <assert.h>
#include "mkl_cblas.h"
JNIEXPORT void Java_CBLAS_dgemm (JNIEnv *env, jclass klass, jint Order, jint TransA, jint TransB, jint M, jint N, jint K, jdouble alpha, jdoubleArray A, int lda, jdoubleArray B, jint ldb, jdouble beta, jdoubleArray C, jint ldc){
jdouble *aElems, *bElems, *cElems;
aElems = (*env)-> GetDoubleArrayElements (env,A,NULL);
bElems = (*env)-> GetDoubleArrayElements (env,B,NULL);
cElems = (*env)-> GetDoubleArrayElements (env,C,NULL);
assert(aElems && bElems && cElems);
cblas_dgemm ((CBLAS_ORDER)Order,(CBLAS_TRANSPOSE)TransA,(CBLAS_TRANSPOSE)TransB, (int)M,(int)N,(int)K,alpha,aElems,(int)lda,bElems,(int)ldb,beta,cElems,(int)ldc);
(*env)-> ReleaseDoubleArrayElements (env,C,cElems,0);
(*env)-> ReleaseDoubleArrayElements (env,B,bElems,JNI_ABORT);
(*env)-> ReleaseDoubleArrayElements (env,A,aElems,JNI_ABORT);
}This file should be compiled to create native library libmkl_java_stubs.so (Loading of this library in java is described in step 1)
icc -shared -fPIC -o libmkl_java_stubs.so CBLAS.c -I. -I$MKLROOT/include -Wl,--start-group $MKLROOT/lib/intel64/libmkl_intel_lp64.a $MKLROOT/lib/intel64/libmkl_intel_thread.a $MKLROOT/lib/intel64/libmkl_core.a -Wl,--end-group -openmp -lpthread -lm -ldl
Step 4. Create main DgemmComputation.java
/*DgemmComputation.java*/
import java.util.Random;
import java.util.Scanner;
public final class DgemmComputation {
/** Incarnation prohibited. */
private DgemmComputation() {
}
/** No command-line options. */
public static void main(String[] args) {
double [] A; double [] B; double [] C;
int M, N, K, lda, ldb, ldc;
int[] matrixSize = new int[] {1024, 2048, 4096, 8192, 12288, 16384, 20480,24576};
int Order = CBLAS.ORDER.RowMajor;
int TransA = CBLAS.TRANSPOSE.NoTrans;
int TransB = CBLAS.TRANSPOSE.NoTrans;
for (int i = 0; i < matrixSize.length; i++ ) {
System.out.println("Matrix size is: " + matrixSize[i]);
M=matrixSize[i]; N=matrixSize[i]; K=matrixSize[i];
lda=K; ldb=N; ldc=N;
A = new double[matrixSize[i] * matrixSize[i]];
B = new double[matrixSize[i] * matrixSize[i]];
C = new double[matrixSize[i] * matrixSize[i]];
double alpha=1, beta=-1;
DgemmComputation compute_dgemm = new DgemmComputation();
//Initializing A with random values
A = compute_dgemm.initMatrix(matrixSize[i], A);
//Initializing B with random values
B = compute_dgemm.initMatrix(matrixSize[i], B);
// Compute the function
final long startTime = System.currentTimeMillis();
CBLAS.dgemm(Order,TransA,TransB,M,N,K,alpha,A,lda,B,ldb,beta,C,ldc);
final long endTime = System.currentTimeMillis();
System.out.println("Matrix Size and Total Execution Time:"+ matrixSize[i]+ "," + (endTime - startTime));
System.out.println("Computation of dgemm over...");
}
}
// Matrix Initialization function
private double[] initMatrix(int size, double [] mat) {
int i;
double value;
mat = new double[size*size];
Random rand = new Random();
for (i=0; i<size*size; i++) {
value = rand.nextDouble()*i;
mat[i] = Math.round(value*100.0)/100.0;
}
return mat;
}
/** Print the matrix X assuming raw-major order of elements. */
private void printMatrix(String prompt, double[] X, int I, int J) {
System.out.println(prompt);
for (int i=0; i<I; i++) {
for (int j=0; j<J; j++){
System.out.print("i: " + i + ", j: " + j + " and the value is");
System.out.print("\t" + X[i*J+j]);
System.out.println();
}
}
}
}Compile the application
javac DgemmComputation.java
Step 5. Execute the application.
Before executing this application, we have to set a couple of environment variables.
To enable Intel MKL to automatically offload computation, use the following command
export MKL_MIC_ENABLE=1
You can check how the dgemm computation is split across the host CPU and the Intel Xeon Phi coprocessor by using
export OFFLOAD_REPORT=2
java.library.path should point to directory where library libmkl_java_stubs.so is placed. This example assumes that stubs shared library is located next to the created Java executable.
java -Djava.library.path=. DgemmComputation
Here is the screenshot of the output from the result run on a Haswell E5-2695 V3 server with 2 Intel Xeon Phi coprocessors. You can see that MKL has divided the work among the host cpu, and 2 Xeon Phi coprocessors dynamically by itself and this work division is done depends on the input matrix sizes. Users can also set this by using MKL_MIC_WORKDIVISION env variables. For more details, please refer How to control the work division article.
