Channel: Intel Developer Zone Articles
Viewing all articles
Browse latest Browse all 3384

Using Intel® MKL in math intensive Java applications on Intel® Xeon Phi


This article is an extension of the existing How do I use Intel® MKL with Java*? article from Intel® MKL Knowledge Base, and we will be reusing the code samples mentioned there.

Here we will explore how we can make use of Intel® Xeon Phi coprocessor available in the system to speed up the mathematical computations by using Automatic Offload enabled functions in Intel® MKL.  The Compiler Assisted Offload (CAO) model is not supported to offload computations using JNI and MKL.

Follow the steps 1,2 and 3 as explained in How do I use Intel® MKL with Java*?. 

This article explains how to build and run on a Linux platform. Before, proceeding, you should have the Intel compilers, Intel MKL and Java paths to be set.

For easy reference, these are detailed here too.

To call cblas_dgemm from java application following steps should be performed:

Step 1. Create a java file with cblas dgemm description (CBLAS.java):

public final class CBLAS {
 private CBLAS() {}
   static {
     System.loadLibrary("mkl_java_stubs"); /*load library (which will contain wrapper for cblas function. See step 3)*/
   public final static class ORDER {
     private ORDER() {}
     /** row-major arrays */
     public final static int RowMajor=101;
     /** column-major arrays */
     public final static int ColMajor=102;
   public final static class TRANSPOSE {
     private TRANSPOSE() {}
     /** trans='N' */
     public final static int NoTrans =111;
     /** trans='T' */
     public final static int Trans=112;
     /** trans='C' */
     public final static int ConjTrans=113;
  public static native void dgemm(int Order, int TransA, int TransB, int M, int N, int K, double alpha, double[] A, int lda, double[] B, int ldb, double beta, double[] C, int ldc); /*inform java virtual machine that function is defined externally*/

Compile the CBLAS.java using the java compiler as below:

javac CBLAS.java

Step 2. Generate headers files from the java class file which was created in the previous step.

These headers should be used in C file in the next step.

javah CBLAS

Three header files will be generated: CBLAS.h, CBLAS_ORDER.h CBLAS_TRANSPOSE.h

Step 3. Use definition of Java_CBLAS_dgemm in header CBLAS.h that was generated in step 2 to write C file CBLAS.c.

Here’s generated header CBLAS.h – do not edit it!

/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class CBLAS */
#ifndef _Included_CBLAS
#define _Included_CBLAS
#ifdef __cplusplus
extern "C" {
 * Class:     CBLAS
 * Method:    dgemm
 * Signature: (IIIIIID[DI[DID[DI)V
   JNIEXPORT void JNICALL Java_CBLAS_dgemm(JNIEnv *, jclass, jint, jint, jint, jint, jint, jint, jdouble, jdoubleArray, jint, jdoubleArray, jint, jdouble, jdoubleArray, jint);
#ifdef __cplusplus

Definition of function Java_CBLAS_dgemm should be used in wrapper for MKL. Create C file CBLAS.c:

#include <jni.h>
#include <assert.h>
#include "mkl_cblas.h"

JNIEXPORT void Java_CBLAS_dgemm (JNIEnv *env, jclass klass,    jint Order, jint TransA, jint TransB, jint M, jint N, jint K,   jdouble alpha, jdoubleArray  A, int lda, jdoubleArray B, jint ldb,  jdouble beta,  jdoubleArray C, jint ldc){
    jdouble *aElems, *bElems, *cElems;
    aElems = (*env)-> GetDoubleArrayElements (env,A,NULL);
    bElems = (*env)-> GetDoubleArrayElements (env,B,NULL);
    cElems = (*env)-> GetDoubleArrayElements (env,C,NULL);
    assert(aElems && bElems && cElems);
cblas_dgemm ((CBLAS_ORDER)Order,(CBLAS_TRANSPOSE)TransA,(CBLAS_TRANSPOSE)TransB, (int)M,(int)N,(int)K,alpha,aElems,(int)lda,bElems,(int)ldb,beta,cElems,(int)ldc);
    (*env)-> ReleaseDoubleArrayElements (env,C,cElems,0);
    (*env)-> ReleaseDoubleArrayElements (env,B,bElems,JNI_ABORT);
    (*env)-> ReleaseDoubleArrayElements (env,A,aElems,JNI_ABORT);

This file should be compiled to create native library libmkl_java_stubs.so (Loading of this library in java is described in step 1)

icc -shared -fPIC -o libmkl_java_stubs.so CBLAS.c -I. -I$MKLROOT/include -Wl,--start-group $MKLROOT/lib/intel64/libmkl_intel_lp64.a $MKLROOT/lib/intel64/libmkl_intel_thread.a $MKLROOT/lib/intel64/libmkl_core.a -Wl,--end-group -openmp -lpthread -lm -ldl

Step 4. Create main DgemmComputation.java

import java.util.Random;
import java.util.Scanner;

public final class DgemmComputation {
    /** Incarnation prohibited. */
    private DgemmComputation() {

    /** No command-line options. */
    public static void main(String[] args) {

        double [] A; double [] B; double [] C;
        int M, N, K, lda, ldb, ldc;

        int[] matrixSize = new int[] {1024, 2048, 4096, 8192, 12288, 16384, 20480,24576};
        int Order = CBLAS.ORDER.RowMajor;
        int TransA = CBLAS.TRANSPOSE.NoTrans;
        int TransB = CBLAS.TRANSPOSE.NoTrans;

        for (int i = 0; i < matrixSize.length; i++ ) {
          System.out.println("Matrix size is: " + matrixSize[i]);
          M=matrixSize[i]; N=matrixSize[i]; K=matrixSize[i];

          lda=K; ldb=N; ldc=N;

          A = new double[matrixSize[i] * matrixSize[i]];
          B = new double[matrixSize[i] * matrixSize[i]];
          C = new double[matrixSize[i] * matrixSize[i]];
          double alpha=1, beta=-1;

          DgemmComputation compute_dgemm  = new DgemmComputation();
          //Initializing A with random values
	  A = compute_dgemm.initMatrix(matrixSize[i], A);

 	  //Initializing B with random values
	  B = compute_dgemm.initMatrix(matrixSize[i], B);

	  // Compute the function
          final long startTime = System.currentTimeMillis();

	  final long endTime = System.currentTimeMillis();

 	  System.out.println("Matrix Size and Total Execution Time:"+ matrixSize[i]+ "," + (endTime - startTime));
          System.out.println("Computation of dgemm over...");

    // Matrix Initialization function
    private double[] initMatrix(int size, double [] mat) {
	int i;
 	double value;
 	mat = new double[size*size];
	Random rand = new Random();

	for (i=0; i<size*size; i++) {
	  value = rand.nextDouble()*i;
	  mat[i] = Math.round(value*100.0)/100.0;
 	return mat;

    /** Print the matrix X assuming raw-major order of elements. */
    private void printMatrix(String prompt, double[] X, int I, int J) {

        for (int i=0; i<I; i++) {
            for (int j=0; j<J; j++){
                System.out.print("i: " + i + ", j: " + j + " and the value is");
                System.out.print("\t" + X[i*J+j]);

Compile the application                                                                                                                                                                                     

javac DgemmComputation.java

Step 5. Execute the application.

Before executing this application, we have to set a couple of environment variables.

To enable Intel MKL to automatically offload computation, use the following command


You can check how the dgemm computation is split across the host CPU and the Intel Xeon Phi coprocessor by using


java.library.path should point to directory where library libmkl_java_stubs.so is placed. This example assumes that stubs shared library is located next to the created Java executable.

java -Djava.library.path=. DgemmComputation

Here is the screenshot of the output from the result run on a Haswell E5-2695 V3 server with 2 Intel Xeon Phi coprocessors.   You can see that MKL has divided the work among the host cpu, and 2 Xeon Phi coprocessors dynamically by itself and this work division is done depends on the input matrix sizes.  Users can also set this by using MKL_MIC_WORKDIVISION env variables. For more details, please refer How to control the work division article.  

Java MKL Xeon Phi



Viewing all articles
Browse latest Browse all 3384

Trending Articles

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>