Recipe: Building and Optimizing the Hogbom Clean Benchmark for Intel® Xeon Phi™ Coprocessors

Overview

This article provides a recipe for compiling and running the Hogbom Clean benchmark for the Intel® Xeon Phi™ coprocessor and discusses the various optimizations applied to the code.

Introduction

Hogbom Clean is a part of the ASKAP benchmark package. The ASKAP benchmark package is used to benchmark a variety of platforms for the Australian SKA Pathfinder (ASKAP) Science Data Processor. The Hogbom Clean (tHogbomClean) benchmark implements the kernel of the Hogbom Clean deconvolution algorithm.

Preliminaries

1)   This recipe assumes that you are using a system equipped with an Intel Xeon Phi coprocessor. If they are not already present, install the Intel® Manycore Plastform System Stack (Intel® MPSS) and Intel® C++ compiler 13.1 or higher on your host system
2)   Download the ASKAP benchmarks from : https://github.com/ATNF/askap-benchmarks
3)   Running the benchmark requires the existence of a point spread function (PSF) image and a dirty image (the image to be cleaned) in the work directory. These can be downloaded from
•   http://www.atnf.csiro.au/people/Ben.Humphreys/dirty.img
•   http://www.atnf.csiro.au/people/Ben.Humphreys/psf.img

Compiling and running on the Intel Xeon Phi Coprocessor

1)   Set up the compiler environment:
$ source /opt/intel/composer_xe_2013_sp1/bin/compilervars.sh intel64
2)   Unpack the source code and build the executables for the Intel Xeon Phi coprocessor
$ unzip askap-benchmarks-master.zip
$ cd askap-benchmarks-master/tHogbomCleanMIC/
$ make clean
$ make
3)   Since, the benchmark offloads work to the coprocessor, the execution begins on the host. On the host, run the benchmark for the coprocessor. Ensure the PSF image and the dirty image are present in the work directory.
$ ./tHogbomCleanMIC

Modifications and Optimizations

Several changes were made to the OpenMP version of the benchmark to run the benchmark on the Intel® Many Integrated Core (Intel® MIC) Architecture and achieve optimal performance. The OpenMP version and the Intel MIC architecture version of the code can be found in the tHogbomCleanOMP and the tHogbomCleanMIC directories respectively in the github repository. The benchmark uses the Intel® Xeon Phi coprocessor in the offload mode wherein the host offloads a portion of the work to the coprocessor. To enable offloading, various functions in the benchmark were decorated with __declspec(target(mic)) to inform the compiler that the respective functions were intended for use on the coprocessor. Also, STL vectors were replaced with simple arrays.

Three primary optimizations were applied to the subractPSF and findPeak functions. The first two optimizations aid the compiler in vectorizing the code whereas the third optimization focuses on eliminating critical sections. The details of the three optimizations are discussed in the following sections. We encourage you to contrast the two OpenMP and the Intel MIC architecture versions to better understand the optimizations.

subtractPSF – Simplifying the loop index

Most of the computation within this function is concentrated within the two for loops. By simply expanding the macros and simplifying the loop index, we are able to vectorize the loop.

findPeak – Vectorizing the loop before the critical section

By modifying the code to its current form, the compiler is able to not only reuse the result of the fabsf function call (thereby reducing the actual number of function calls) but is also able to recognize the indexed max idiom and is thus able to vectorize it.

findPeak – Eliminating the critical section

Critical sections in parallel code are detrimental to the performance of the code. This effect is further amplified for the Intel Xeon Phi coprocessor due to the large number of threads and makes it imperative to reduce the number of critical sections or eliminate them completely, if possible. In this case, it is possible to completely eliminate the critical section and replace it with a serial loop as demonstrated by the code.

SUMMARY

The HogBom Clean benchmark was ported to run on the Intel Xeon Phi coprocessor in offload mode. Three key optimizations were applied to the OpenMP version of the benchmark to achieve optimal performance on the coprocessor.

Recipe: Building and Optimizing the Hogbom Clean Benchmark for Intel® Xeon Phi™ Coprocessors

Overview

Introduction

Preliminaries

Compiling and running on the Intel Xeon Phi Coprocessor

Modifications and Optimizations

subtractPSF – Simplifying the loop index

findPeak – Vectorizing the loop before the critical section

findPeak – Eliminating the critical section

SUMMARY

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112