Issue 1351: OpenCL build - kernel_ThresholdRectToPix() not accounting for padding bits in the output pix?!

zhivko.tabakov@gmail.com · jimregan · commit 07be522e437b · 2015-05-13T21:23:23.000+01:00
https://code.google.com/p/tesseract-ocr/issues/detail?id=1351 What steps will reproduce the problem? 1.Use tesseract build with OpenCL. 2.Pass full color image with width which is not multiple of 32. 3.Recognition is way too slow and does not recognize anything. I read the article on http://www.sk-spell.sk.cx/tesseract-meets-the-opencl-first-test and decided to give OCL a try. The initial result was as per point 3 above. After some debugging I figured the problem is that the OCL version of threshold rect generation does not account for padding bits in the output pix lines. To prove my discovery I made a quick fix in oclkernels.h replacing the definition of kernel_ThresholdRectToPix Just a reminder: it is necessary to force OCL kernel recompilation after changing this source (e.g. delete “kernel - <device>.bin” from the exec folder). The fix is working but I am not sure about it since the original source apparently works for other people (as per the article). If I am right the OS/GPU are irrelevant since the bug is algorithmic, but mine are Windows/AMD. Also similar fix is applicable to kernel_ThresholdRectToPix_OneChan(), but there the input array might have some padding bytes as well, so its indexing will need further adjustments. I can come with some prove/fix for it either - I have not played with it yet. Disclaimer: I have no prior experience with image processing and tesseract source or with GPU computing and OpenCL (but please do explain if I am wrong).
diff --git a/opencl/oclkernels.h b/opencl/oclkernels.h
@@ -1045,19 +1045,19 @@ KERNEL(
 // imageData is input image (24-bits/pixel)
 // pix is output image (1-bit/pixel)
 KERNEL(
-\n#define CHAR_VEC_WIDTH 8 \n
+\n#define CHAR_VEC_WIDTH 4 \n
 \n#define PIXELS_PER_WORD 32 \n
 \n#define PIXELS_PER_BURST 8 \n
 \n#define BURSTS_PER_WORD (PIXELS_PER_WORD/PIXELS_PER_BURST) \n
  typedef union {
   uchar s[PIXELS_PER_BURST*NUM_CHANNELS];
-  uchar8 v[(PIXELS_PER_BURST*NUM_CHANNELS)/CHAR_VEC_WIDTH];
+  uchar4 v[(PIXELS_PER_BURST*NUM_CHANNELS)/CHAR_VEC_WIDTH];
  } charVec;
 
 __attribute__((reqd_work_group_size(256, 1, 1)))
 __kernel
 void kernel_ThresholdRectToPix(
-    __global const uchar8 *imageData,
+    __global const uchar4 *imageData,
     int height,
     int width,
     int wpl, // words per line
@@ -1066,6 +1066,7 @@ void kernel_ThresholdRectToPix(
     __global int *pix) {
 
     // declare variables
+    uint pad = PIXELS_PER_WORD * wpl - width;//number of padding bits at the end of each output line
     int pThresholds[NUM_CHANNELS];
     int pHi_Values[NUM_CHANNELS];
     for ( int i = 0; i < NUM_CHANNELS; i++) {
@@ -1076,22 +1077,22 @@ void kernel_ThresholdRectToPix(
     // for each word (32 pixels) in output image
     for ( uint w = get_global_id(0); w < wpl*height; w += get_global_size(0) ) {
         unsigned int word = 0; // all bits start at zero
-
+        //decrease the pixel index for the padding at the end of each output line (=number of lines * padding)
+        uint pxIdxOffset = ( w / wpl) * pad;// = ( ( PIXELS_PER_WORD * w) / ( width + pad)) * pad;
         // for each burst in word
         for ( int b = 0; b < BURSTS_PER_WORD; b++) {
-
             // load burst
             charVec pixels;
             for ( int i = 0; i < (PIXELS_PER_BURST*NUM_CHANNELS)/CHAR_VEC_WIDTH; i++ ) {
-                pixels.v[i] = imageData[w*(BURSTS_PER_WORD*(PIXELS_PER_BURST*NUM_CHANNELS)/CHAR_VEC_WIDTH) + b*((PIXELS_PER_BURST*NUM_CHANNELS)/CHAR_VEC_WIDTH)  + i];
+                pixels.v[i] = imageData[w*(BURSTS_PER_WORD*(PIXELS_PER_BURST*NUM_CHANNELS)/CHAR_VEC_WIDTH) + b*((PIXELS_PER_BURST*NUM_CHANNELS)/CHAR_VEC_WIDTH)  + i - pxIdxOffset];
             }
 
             // for each pixel in burst
             for ( int p = 0; p < PIXELS_PER_BURST; p++) {
                 for ( int c = 0; c < NUM_CHANNELS; c++) {
                     unsigned char pixChan = pixels.s[p*NUM_CHANNELS + c];
                     if (pHi_Values[c] >= 0 && (pixChan > pThresholds[c]) == (pHi_Values[c] == 0)) {
-                        word |=  (0x80000000 >> ((b*PIXELS_PER_BURST+p)&31));
+                        word |=  (((uint)0x80000000) >> ((b*PIXELS_PER_BURST+p)&31));
                     }
                 }
             }