Using Intel® MKL in math intensive Java applications on Intel® Xeon Phi

This article is an extension of the existing How do I use Intel® MKL with Java*? article from Intel® MKL Knowledge Base, and we will be reusing the code samples mentioned there.

Here we will explore how we can make use of Intel® Xeon Phi coprocessor available in the system to speed up the mathematical computations by using Automatic Offload enabled functions in Intel® MKL. The Compiler Assisted Offload (CAO) model is not supported to offload computations using JNI and MKL.

Follow the steps 1,2 and 3 as explained in How do I use Intel® MKL with Java*?.

This article explains how to build and run on a Linux platform. Before, proceeding, you should have the Intel compilers, Intel MKL and Java paths to be set.

For easy reference, these are detailed here too.

To call cblas_dgemm from java application following steps should be performed:

Step 1. Create a java file with cblas dgemm description (CBLAS.java):

/*CBLAS.java*/
public final class CBLAS {
 private CBLAS() {}
   static {
     System.loadLibrary("mkl_java_stubs"); /*load library (which will contain wrapper for cblas function. See step 3)*/
   }
   public final static class ORDER {
     private ORDER() {}
     /** row-major arrays */
     public final static int RowMajor=101;
     /** column-major arrays */
     public final static int ColMajor=102;
   }
   public final static class TRANSPOSE {
     private TRANSPOSE() {}
     /** trans='N' */
     public final static int NoTrans =111;
     /** trans='T' */
     public final static int Trans=112;
     /** trans='C' */
     public final static int ConjTrans=113;
  }
  public static native void dgemm(int Order, int TransA, int TransB, int M, int N, int K, double alpha, double[] A, int lda, double[] B, int ldb, double beta, double[] C, int ldc); /*inform java virtual machine that function is defined externally*/
}

Compile the CBLAS.java using the java compiler as below:

javac CBLAS.java

Step 2. Generate headers files from the java class file which was created in the previous step.

These headers should be used in C file in the next step.

javah CBLAS

Three header files will be generated: CBLAS.h, CBLAS_ORDER.h CBLAS_TRANSPOSE.h

Step 3. Use definition of Java_CBLAS_dgemm in header CBLAS.h that was generated in step 2 to write C file CBLAS.c.

Here’s generated header CBLAS.h – do not edit it!

/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class CBLAS */
#ifndef _Included_CBLAS
#define _Included_CBLAS
#ifdef __cplusplus
extern "C" {
#endif
/*
 * Class:     CBLAS
 * Method:    dgemm
 * Signature: (IIIIIID[DI[DID[DI)V
 */
   JNIEXPORT void JNICALL Java_CBLAS_dgemm(JNIEnv *, jclass, jint, jint, jint, jint, jint, jint, jdouble, jdoubleArray, jint, jdoubleArray, jint, jdouble, jdoubleArray, jint);
 
#ifdef __cplusplus
}
#endif
#endif

Definition of function Java_CBLAS_dgemm should be used in wrapper for MKL. Create C file CBLAS.c:

/*CBLAS.c*/
#include <jni.h>
#include <assert.h>
#include "mkl_cblas.h"

JNIEXPORT void Java_CBLAS_dgemm (JNIEnv *env, jclass klass,    jint Order, jint TransA, jint TransB, jint M, jint N, jint K,   jdouble alpha, jdoubleArray  A, int lda, jdoubleArray B, jint ldb,  jdouble beta,  jdoubleArray C, jint ldc){
    jdouble *aElems, *bElems, *cElems;
 
    aElems = (*env)-> GetDoubleArrayElements (env,A,NULL);
    bElems = (*env)-> GetDoubleArrayElements (env,B,NULL);
    cElems = (*env)-> GetDoubleArrayElements (env,C,NULL);
    assert(aElems && bElems && cElems);
 
cblas_dgemm ((CBLAS_ORDER)Order,(CBLAS_TRANSPOSE)TransA,(CBLAS_TRANSPOSE)TransB, (int)M,(int)N,(int)K,alpha,aElems,(int)lda,bElems,(int)ldb,beta,cElems,(int)ldc);
 
    (*env)-> ReleaseDoubleArrayElements (env,C,cElems,0);
    (*env)-> ReleaseDoubleArrayElements (env,B,bElems,JNI_ABORT);
    (*env)-> ReleaseDoubleArrayElements (env,A,aElems,JNI_ABORT);
}

This file should be compiled to create native library libmkl_java_stubs.so (Loading of this library in java is described in step 1)

icc -shared -fPIC -o libmkl_java_stubs.so CBLAS.c -I. -I$MKLROOT/include -Wl,--start-group $MKLROOT/lib/intel64/libmkl_intel_lp64.a $MKLROOT/lib/intel64/libmkl_intel_thread.a $MKLROOT/lib/intel64/libmkl_core.a -Wl,--end-group -openmp -lpthread -lm -ldl

Step 4. Create main DgemmComputation.java

/*DgemmComputation.java*/
import java.util.Random;
import java.util.Scanner;

public final class DgemmComputation {
    /** Incarnation prohibited. */
    private DgemmComputation() {
 
    }

    /** No command-line options. */
    public static void main(String[] args) {

        double [] A; double [] B; double [] C;
        int M, N, K, lda, ldb, ldc;

        int[] matrixSize = new int[] {1024, 2048, 4096, 8192, 12288, 16384, 20480,24576};
 
        int Order = CBLAS.ORDER.RowMajor;
        int TransA = CBLAS.TRANSPOSE.NoTrans;
        int TransB = CBLAS.TRANSPOSE.NoTrans;

        for (int i = 0; i < matrixSize.length; i++ ) {
          System.out.println("Matrix size is: " + matrixSize[i]);
          M=matrixSize[i]; N=matrixSize[i]; K=matrixSize[i];

          lda=K; ldb=N; ldc=N;

          A = new double[matrixSize[i] * matrixSize[i]];
          B = new double[matrixSize[i] * matrixSize[i]];
          C = new double[matrixSize[i] * matrixSize[i]];
       
          double alpha=1, beta=-1;

          DgemmComputation compute_dgemm  = new DgemmComputation();
 
          //Initializing A with random values
	  A = compute_dgemm.initMatrix(matrixSize[i], A);

 	  //Initializing B with random values
	  B = compute_dgemm.initMatrix(matrixSize[i], B);

	  // Compute the function
          final long startTime = System.currentTimeMillis();

          CBLAS.dgemm(Order,TransA,TransB,M,N,K,alpha,A,lda,B,ldb,beta,C,ldc);
       
	  final long endTime = System.currentTimeMillis();

 	  System.out.println("Matrix Size and Total Execution Time:"+ matrixSize[i]+ "," + (endTime - startTime));
          System.out.println("Computation of dgemm over...");
 	}
    }

    // Matrix Initialization function
    private double[] initMatrix(int size, double [] mat) {
	int i;
 	double value;
 	mat = new double[size*size];
	Random rand = new Random();

	for (i=0; i<size*size; i++) {
	  value = rand.nextDouble()*i;
	  mat[i] = Math.round(value*100.0)/100.0;
 	}
 	return mat;
    }  

    /** Print the matrix X assuming raw-major order of elements. */
    private void printMatrix(String prompt, double[] X, int I, int J) {
        System.out.println(prompt);

        for (int i=0; i<I; i++) {
            for (int j=0; j<J; j++){
                System.out.print("i: " + i + ", j: " + j + " and the value is");
                System.out.print("\t" + X[i*J+j]);
                System.out.println();
	     } 
        }
    }
}

Compile the application

javac DgemmComputation.java

Step 5. Execute the application.

Before executing this application, we have to set a couple of environment variables.

To enable Intel MKL to automatically offload computation, use the following command

export MKL_MIC_ENABLE=1

You can check how the dgemm computation is split across the host CPU and the Intel Xeon Phi coprocessor by using

export OFFLOAD_REPORT=2

java.library.path should point to directory where library libmkl_java_stubs.so is placed. This example assumes that stubs shared library is located next to the created Java executable.

java -Djava.library.path=. DgemmComputation

Here is the screenshot of the output from the result run on a Haswell E5-2695 V3 server with 2 Intel Xeon Phi coprocessors. You can see that MKL has divided the work among the host cpu, and 2 Xeon Phi coprocessors dynamically by itself and this work division is done depends on the input matrix sizes. Users can also set this by using MKL_MIC_WORKDIVISION env variables. For more details, please refer How to control the work division article.

Using Intel® MKL in math intensive Java applications on Intel® Xeon Phi

Trending Articles

DONALD L. NEMETH AGE 86, OF SH...

LAG, Lacp configuration on Mellanox switches

Mp3 Download: Stormzy - Cigarettes & Cush (feat. Kehlani & Lily Allen)

Camila Cabello – C,XOXO (Magic City Edition) [iTunes Plus M4A + M4V]

Not right!

[LATEST][RECOVERY][UNOFFICIAL]TWRP 3.7.0_12-v2 for Moto G Stylus 5G...

Could Not Find the Application that Created this file

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

ZARIA CUMMINGS

Edna Murto, 90, longtime resident of Ely, dies

Practice Sheet of Right form of verbs for HSC Students

Windows Update / Microsoft Update の接続先 URL について

MIB2 Patch (CP Off + FEC/SWaP) [Technisat/Preh/Delphi/Harman]...

Uline Warehouse Associate Interview

Moondru Mudichu 02-02-2017 – Polimer tv Serial

Maureen Rose Gradvohl, 67

Adobe Master Collection 2025 RUS-ENG v7-m0nkrus

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

EXERCISE

[GET] Jack Griffin-Parry – The Clothing Brand Blueprint ($150.00)