This article is an extension of the existing How do I use Intel® MKL with Java*? article from Intel® MKL Knowledge Base, and we will be reusing the code samples mentioned there.
Here we will explore how we can make use of Intel® Xeon Phi coprocessor available in the system to speed up the mathematical computations by using Automatic Offload enabled functions in Intel® MKL. The Compiler Assisted Offload (CAO) model is not supported to offload computations using JNI and MKL.
Follow the steps 1,2 and 3 as explained in How do I use Intel® MKL with Java*?.
This article explains how to build and run on a Linux platform. Before, proceeding, you should have the Intel compilers, Intel MKL and Java paths to be set.
For easy reference, these are detailed here too.
To call cblas_dgemm from java application following steps should be performed:
Step 1. Create a java file with cblas dgemm description (CBLAS.java):
/*CBLAS.java*/ public final class CBLAS { private CBLAS() {} static { System.loadLibrary("mkl_java_stubs"); /*load library (which will contain wrapper for cblas function. See step 3)*/ } public final static class ORDER { private ORDER() {} /** row-major arrays */ public final static int RowMajor=101; /** column-major arrays */ public final static int ColMajor=102; } public final static class TRANSPOSE { private TRANSPOSE() {} /** trans='N' */ public final static int NoTrans =111; /** trans='T' */ public final static int Trans=112; /** trans='C' */ public final static int ConjTrans=113; } public static native void dgemm(int Order, int TransA, int TransB, int M, int N, int K, double alpha, double[] A, int lda, double[] B, int ldb, double beta, double[] C, int ldc); /*inform java virtual machine that function is defined externally*/ }
Compile the CBLAS.java using the java compiler as below:
javac CBLAS.java
Step 2. Generate headers files from the java class file which was created in the previous step.
These headers should be used in C file in the next step.
javah CBLAS
Three header files will be generated: CBLAS.h, CBLAS_ORDER.h CBLAS_TRANSPOSE.h
Step 3. Use definition of Java_CBLAS_dgemm in header CBLAS.h that was generated in step 2 to write C file CBLAS.c.
Here’s generated header CBLAS.h – do not edit it!
/* DO NOT EDIT THIS FILE - it is machine generated */ #include <jni.h> /* Header for class CBLAS */ #ifndef _Included_CBLAS #define _Included_CBLAS #ifdef __cplusplus extern "C" { #endif /* * Class: CBLAS * Method: dgemm * Signature: (IIIIIID[DI[DID[DI)V */ JNIEXPORT void JNICALL Java_CBLAS_dgemm(JNIEnv *, jclass, jint, jint, jint, jint, jint, jint, jdouble, jdoubleArray, jint, jdoubleArray, jint, jdouble, jdoubleArray, jint); #ifdef __cplusplus } #endif #endif
Definition of function Java_CBLAS_dgemm should be used in wrapper for MKL. Create C file CBLAS.c:
/*CBLAS.c*/ #include <jni.h> #include <assert.h> #include "mkl_cblas.h" JNIEXPORT void Java_CBLAS_dgemm (JNIEnv *env, jclass klass, jint Order, jint TransA, jint TransB, jint M, jint N, jint K, jdouble alpha, jdoubleArray A, int lda, jdoubleArray B, jint ldb, jdouble beta, jdoubleArray C, jint ldc){ jdouble *aElems, *bElems, *cElems; aElems = (*env)-> GetDoubleArrayElements (env,A,NULL); bElems = (*env)-> GetDoubleArrayElements (env,B,NULL); cElems = (*env)-> GetDoubleArrayElements (env,C,NULL); assert(aElems && bElems && cElems); cblas_dgemm ((CBLAS_ORDER)Order,(CBLAS_TRANSPOSE)TransA,(CBLAS_TRANSPOSE)TransB, (int)M,(int)N,(int)K,alpha,aElems,(int)lda,bElems,(int)ldb,beta,cElems,(int)ldc); (*env)-> ReleaseDoubleArrayElements (env,C,cElems,0); (*env)-> ReleaseDoubleArrayElements (env,B,bElems,JNI_ABORT); (*env)-> ReleaseDoubleArrayElements (env,A,aElems,JNI_ABORT); }
This file should be compiled to create native library libmkl_java_stubs.so (Loading of this library in java is described in step 1)
icc -shared -fPIC -o libmkl_java_stubs.so CBLAS.c -I. -I$MKLROOT/include -Wl,--start-group $MKLROOT/lib/intel64/libmkl_intel_lp64.a $MKLROOT/lib/intel64/libmkl_intel_thread.a $MKLROOT/lib/intel64/libmkl_core.a -Wl,--end-group -openmp -lpthread -lm -ldl
Step 4. Create main DgemmComputation.java
/*DgemmComputation.java*/ import java.util.Random; import java.util.Scanner; public final class DgemmComputation { /** Incarnation prohibited. */ private DgemmComputation() { } /** No command-line options. */ public static void main(String[] args) { double [] A; double [] B; double [] C; int M, N, K, lda, ldb, ldc; int[] matrixSize = new int[] {1024, 2048, 4096, 8192, 12288, 16384, 20480,24576}; int Order = CBLAS.ORDER.RowMajor; int TransA = CBLAS.TRANSPOSE.NoTrans; int TransB = CBLAS.TRANSPOSE.NoTrans; for (int i = 0; i < matrixSize.length; i++ ) { System.out.println("Matrix size is: " + matrixSize[i]); M=matrixSize[i]; N=matrixSize[i]; K=matrixSize[i]; lda=K; ldb=N; ldc=N; A = new double[matrixSize[i] * matrixSize[i]]; B = new double[matrixSize[i] * matrixSize[i]]; C = new double[matrixSize[i] * matrixSize[i]]; double alpha=1, beta=-1; DgemmComputation compute_dgemm = new DgemmComputation(); //Initializing A with random values A = compute_dgemm.initMatrix(matrixSize[i], A); //Initializing B with random values B = compute_dgemm.initMatrix(matrixSize[i], B); // Compute the function final long startTime = System.currentTimeMillis(); CBLAS.dgemm(Order,TransA,TransB,M,N,K,alpha,A,lda,B,ldb,beta,C,ldc); final long endTime = System.currentTimeMillis(); System.out.println("Matrix Size and Total Execution Time:"+ matrixSize[i]+ "," + (endTime - startTime)); System.out.println("Computation of dgemm over..."); } } // Matrix Initialization function private double[] initMatrix(int size, double [] mat) { int i; double value; mat = new double[size*size]; Random rand = new Random(); for (i=0; i<size*size; i++) { value = rand.nextDouble()*i; mat[i] = Math.round(value*100.0)/100.0; } return mat; } /** Print the matrix X assuming raw-major order of elements. */ private void printMatrix(String prompt, double[] X, int I, int J) { System.out.println(prompt); for (int i=0; i<I; i++) { for (int j=0; j<J; j++){ System.out.print("i: " + i + ", j: " + j + " and the value is"); System.out.print("\t" + X[i*J+j]); System.out.println(); } } } }
Compile the application
javac DgemmComputation.java
Step 5. Execute the application.
Before executing this application, we have to set a couple of environment variables.
To enable Intel MKL to automatically offload computation, use the following command
export MKL_MIC_ENABLE=1
You can check how the dgemm computation is split across the host CPU and the Intel Xeon Phi coprocessor by using
export OFFLOAD_REPORT=2
java.library.path should point to directory where library libmkl_java_stubs.so is placed. This example assumes that stubs shared library is located next to the created Java executable.
java -Djava.library.path=. DgemmComputation
Here is the screenshot of the output from the result run on a Haswell E5-2695 V3 server with 2 Intel Xeon Phi coprocessors. You can see that MKL has divided the work among the host cpu, and 2 Xeon Phi coprocessors dynamically by itself and this work division is done depends on the input matrix sizes. Users can also set this by using MKL_MIC_WORKDIVISION env variables. For more details, please refer How to control the work division article.