Quantcast
Channel: Intel Developer Zone Articles
Viewing all articles
Browse latest Browse all 3384

Using SPIR for fun and profit with Intel® OpenCL™ Code Builder

$
0
0

Introduction

In this short tutorial we are going to give you a brief introduction to Khronos SPIR, touch on the differences between a SPIR binary and an Intel proprietary Intermediate Binary, and then are going to demonstrate couple of ways of creating SPIR binaries using tools shipped with Intel® INDE and way of consuming SPIR binaries in your OpenCL program.

What is SPIR?

SPIR stands for The Standard Portable Intermediate Representation. It is a portable binary encoding of OpenCL C device programs based on LLVM IR. The main goal of SPIR was to enable application developers to avoid shipping their kernels in a source form, while maintaining portability between vendors and devices. Note, that SPIR is a Khronos extension, so you should check whether device you are considering to deploy to supports cl_khr_spir extension. All 4th Generation and above Intel® Processors running Windows and Adroid OSes with the latest drivers support SPIR. If you installed Intel® OpenCL™ Code Builder, which is now part of Intel® INDE and Intel® Media Server Studio suites, you can query platform and device information from within the Microsoft Visual Studio:

Platform Info

For example, the latest Intel® Windows graphics driver for 4th Generation Intel® Core Processors and above (10.18.14.4080) supports three OpenCL devices: OpenCL 1.2 CPU and GPU devices and Experimental OpenCL 2.0 CPU device.

Platform Info Details

And all of these devices support cl_khr_spir extension:

Device Extensions SPIR

How is SPIR binary different from Intermediate Binary?

SPIR binary is portable between the devices from a single vendor and between devices provided by different vendors. For example, if you have a computer with an Intel Processor with Intel® Processor Graphics that has an NVidia or AMD graphics card, you should see at least three devices: Intel CPU device, Intel GPU device and an NVidia or AMD device. Your SPIR binary should work on all three of them.

How to produce SPIR binary with an Intel command line compiler?

Intel® OpenCL™ Code Builder comes with command line compilers for OpenCL C. At the command line, type the following:

ioc64 -cmd=build -input=SobelKernels.cl -device=gpu -spir64=SobelKernels_x64.spir -bo="-cl-std=CL1.2"

or you can type the following shorter version:

ioc64 -cmd=build -input=SobelKernels.cl -spir64=SobelKernels_x64.spir -bo="-cl-std=CL1.2"

Note, that specifying the device just enables you to make sure that your kernel can compile for a particular device, but does not affect the result: in the first case the kernel will be built for the GPU device, in the second for the default CPU device, but the resulting SPIR file in both cases will be the same! Also note that you should be able to produce SPIR files even if the platform you are developing on does not support SPIR. SPIR generation is also supported on Linux as part of Intel® OpenCL Code Builder that comes with Intel® Media Server Studio: you can used either a standalone Kernel Builder tool of the Eclipse plug-in to generate SPIR binaries on supported Linux platforms.

How to produce SPIR binary with Intel® INDE’s Kernel Builder?

Open a solution with the OpenCL file in it. Click on the OpenCL file, right click and select Create Code Builder Session from the popup menu:

 Create Code Builder Session

Click on the resulting Session in the Code Builder Session Explorer and select Build:

 Build

If the build is successful, you should see .spir files among the artifacts.

Build Artifacts

Please use the _x64.spir version for your purposes. Note that corresponding SobelKernels_x86.ll and SobelKernels_x64.ll files above are textual representations of the SPIR binaries SobelKernels_x86.spir and SobelKernels_x64.spir respectively. Advanced developers can examine these files for their analysis work. Note, that when you build your application in Win32 mode, the CPU device will require _x86.spir file while the GPU device is always consumes _x64.spir files. When you build your application in x64 mode, both the CPU device and the GPU device will consume the same _x64.spir files.

How to consume SPIR binary in your OpenCL™ program?

Before trying to build a program from a SPIR binary, you should check whether the platform and the device(s) you are planning to target support cl_khr_spir extension. Use clGetPlatformInfo call with CL_PLATFORM_EXTENSIONS flag and search for the cl_khr_spir string. You should also use clGetDeviceInfo call with CL_DEVICE_EXTENSIONS flag to check SPIR support of each device on your platform. You will need to read a binary file (in our case SobelKernels_x64.spir) into a character array with regular C or C++ APIs. Then you need to create a program with clCreateProgramWithBinary call. You will then need to build the program with clBuildProgram and provide “-x spir” in addition to regular optimization flags, like “-cl-mad-enable”. You are now ready to create your kernels as you would normally would.

Advantages of a SPIR Binary

SPIR binary is portable between devices and between vendors.

SPIR binary is smaller than the native Intermediate Binary.

Disadvantages of a SPIR Binary

Additional time is required to build a program from a SPIR binary as opposed to building a program from an Intermediate Binary, since additional translation and optimization steps are involved.

You need to provide build flags when building a program from a SPIR binary, since some of the optimizations specified by the flags are done by later compilation stages.

Building and Running a SPIR Sample

Make sure to install Intel® OpenCL™ Code Builder that comes with either Intel® INDE or Intel® Media Server Studio (see References section). Open Sobel_OCL solution (the source code for the sample is available at the end of the article) and build it:

Building SPIR Sample

Now go ahead and create a new Code Builder Kernel Development session as follows:

New Kernel Development Session

In the New Session popup dialog, make sure to name the session SobelKernels, specify location as directory where your solution is located, select Add CL Files: in the Session Content section and select SobelKernels.cl file provided in the solution. Uncheck Duplicate files to session folder check box and click Done button:

New Session Dialog

Now you can select Session ‘SobelKernels’ and build it:

Build Session

You should see Build Artifacts and Kernels folders populated:

Build Results

Now, you have everything ready to run your workload. Open the command window and change directory to where your solution file is located and type the following at the command prompt:

.\x64\Release\Sobel_OCL.exe 100 gpu intel 2048 2048 show_CL spir

You should see the following output:

Sobel Output

Notice, that four *_validation.ppm files appeared in your directory. When you examine them, they all should contain the following picture of a nicely Sobelized dog:

Sobelized Dog

Now, you can run the tutorial on the CPU OpenCL device by typing the following:

.\x64\Release\Sobel_OCL.exe 100 cpu intel 2048 2048 show_CL spir

You can also run the tutorial with openclc and ir options on the GPU device:

.\x64\Release\Sobel_OCL.exe 100 gpu intel 2048 2048 show_CL openclc

.\x64\Release\Sobel_OCL.exe 100 gpu intel 2048 2048 show_CL ir

Examine OpenCL C Program Build Log printed at the beginning of each run and compare it with the SPIR run on the GPU device. Notice that there is no output with ir option, since Intermediate Binary is fully built and ready to go.

You can also run the tutorial with openclc and ir options on the CPU device: just remember before running with ir option to change the session options

Session Options

and specify CPU device there:

Options Dialog

and don’t forget to provide appropriate build options:

Options Dialog Build Options

And select the right Session Architecture:

Options Dialog General Options

then rebuild the session. Notice that SobelKernels.asm appeared under Build Artifacts folder. SobelKernels.ir will be overwritten with the CPU device Intermediate Representation binary. You can now safely run

.\x64\Release\Sobel_OCL.exe 100 cpu intel 2048 2048 show_CL ir

Conclusion

In this short tutorial we gave a brief introduction to SPIR, its differences from Intel proprietary Intermediate Binary format, ways to produce SPIR binaries with Intel command line compiler shipped with Intel® INDE and with the Intel® Kernel Builder add-on to Microsoft Visual Studio and then how to consume resulting SPIR binary in your OpenCL program.

Acknowledgements

Thank you to Uri Levy to reviewing this article and providing great feedback!

References

  1. Khronos SPIR website: https://www.khronos.org/spir
  2. Khronos SPIR FAQ: https://www.khronos.org/faq/spir
  3. Intel® INDE: https://software.intel.com/en-us/intel-inde
  4. Intel® Media Server Studio: https://software.intel.com/en-us/intel-media-server-studio
  5. Intel®  Code Builder for OpenCL™ API for Linux*: https://software.intel.com/en-us/articles/intel-code-builder-for-opencl-api
  6. User Manual for OpenCL™ Code Builder: https://software.intel.com/en-us/code-builder-user-manual

About the Author

Robert Ioffe is a Technical Consulting Engineer at Intel’s Software and Solutions Group. He is an expert in OpenCL programming and OpenCL workload optimization on Intel Iris and Intel Iris Pro Graphics with deep knowledge of Intel Graphics Hardware. He was heavily involved in Khronos standards work, focusing on prototyping the latest features and making sure they can run well on Intel architecture. Most recently he has been working on prototyping Nested Parallelism (enqueue_kernel functions) feature of OpenCL 2.0 and wrote a number of samples that demonstrate Nested Parallelism functionality, including GPU-Quicksort for OpenCL 2.0. He also recorded and released two Optimizing Simple OpenCL Kernels videos and GPU-Quicksort and Sierpinski Carpet in OpenCL 2.0 videos.

You might also be interested in the following:

Optimizing Simple OpenCL Kernels: Modulate Kernel Optimization

Optimizing Simple OpenCL Kernels: Sobel Kernel Optimization

GPU-Quicksort in OpenCL 2.0: Nested Parallelism and Work-Group Scan Functions

Sierpiński Carpet in OpenCL 2.0

Download the Sample


Viewing all articles
Browse latest Browse all 3384

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>