A two dimensional Block Sum computation is performed by summation of every element contained in a block of size m×n which lies within a matrix of size M×N, where M>m and N>n. When the block sum is computed for a window of size m×n around every element of a matrix of size M×N, creating a new matrix of dimensions (M−m+1)×(N−n+1) replacing every element of the original matrix with the block sum of the window around it, this is called a sliding window block sum computation.
Sliding window block sum computation is an important common step in many key low level vision kernels. In the Harris Corner Detection algorithm (described in C. Harris and M. Stephens, “A Combined Corner and Edge Detector,” Alvey Vision Conference, 1988), the block sum of squares of pixel intensity gradients of a sub-window around every pixel needs to be computed for identifying the sub-window which is potentially, a good corner. Thus this block sum of squares of pixel intensity gradients is a good feature to track. Similarly in a ORB feature detection and description algorithm (E. Rublee, V. Rabaud, K. Konolige, G. Bradski, “ORB: An Efficient Alternative to SIFT or SURF,” ICCV, 2564-2571, 2011), every pixel in the window region around an identified feature is smoothened by substituting a 5×5 block sum around that pixel. Such examples of sliding window block sum calculations are numerous in embedded vision space.
Given the importance of sliding window block sum computation in vision applications, a fast technique to compute block sums for a sliding window would speed up performance of many vision kernels. Since vision algorithms typically involve similar computation tasks across huge image blocks or across the entire image and also need to operate at high frames per second (FPS). Vector single instruction multiple data (SIMD) engines are best suited for solving vision tasks. In these applications high capacity vector processing can boost performance.