Quantcast
Channel: Intel Developer Zone Articles
Viewing all articles
Browse latest Browse all 3384

Optimizing Canny Edge Detector with Intel® Cilk™ Plus

$
0
0

Introduction

The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. It is one of the most commonly used image processing tools. This article demonstrates how Intel® Cilk™ Plus technology improved performance of Canny edge detector implementation.

Solution

The optimization is based on an original code by Mike Heath and Prof. Sudeep Sarkar (University of South Florida, 1996).

This algorithm runs in multiple stages which contains a number of loops for computation. The hotspots loops are in Gaussian smoothing, computing derivative, and non-maximal suppression. We applied pragma simd to vectorize those loops in different ways.

Loop in Gaussian smoothing contains non-unit stride access for loop-carried arrays, and the loop trip count is short, less than 20. This caused inefficient performance for vectorization of inner loop. We used pragma simd to vectorize the outer loop instead, which makes 4x performance improvements on systems supporting Intel® AVX instructions.
 
scalar version:
   for (c=0; c<cols; c++) {
      for (r=0; r<rows; r++) {
         sum = 0.0;
         dot = 0.0;
         for (rr=(-center); rr<=center; rr++) {
            if (((r+rr) >= 0) && ((r+rr) < rows)) {
               dot += tempim[(r+rr)*cols+c] * kernel[center+rr];
               sum += kernel[center+rr];
            }
         }
         (*smoothedim)[r*cols+c] = (short int)(dot*BOOSTBLURFACTOR/sum + 0.5);
      }
   }
SIMD version:
   for(r=0; r< rows;r++){
#pragma simd linear(c:1) private(sum, dot)
      for(c=0; c<cols; c++){
         sum = 0.0;
         dot = 0.0;
         for(rr=-center; rr<=center; rr++){
            if (((r+rr) >= 0) && ((r+rr) < rows)) {
               dot += tempim[(r+rr)*cols+c] * kernel[center+rr];
               sum += kernel[center+rr];
            }
         }
         (*smoothedim)[r*cols+c] = (short int)(dot*BOOSTBLURFACTOR/sum + 0.5);
      }
   }
This change creates code that ran 4x faster on our machine.  
 
Reference
More practices on Intel® Cilk™ Plus technology:
 

Viewing all articles
Browse latest Browse all 3384

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>