In processing two-dimensional (2-D) images, it is known to apply a small matrix or kernel to successive pixel blocks to generate output pixels. For example, in 2-D convolution operations, an nx rn rnatrix (“convolution mask”) is typically applied to an image following a raster pattern and, for each pixel in the image, the convolution mask is centered on that pixel and convolved with the corresponding nxm pixels in the image to compute an output pixel value. The output pixels so generated then collectively form a new (processed) digital image. Depending on the convolution mask used, 2-D convolution operations can filter out noise, enhance object edges, or achieve other desired effects on a digital image. Similarly, in 2-D correlation operations, a matrix is applied to an image in raster mode and computed with each pixel and its neighboring pixels to generate a corresponding output pixel. These and other kernel-based 2-D processing operations can be implemented in software or hardware and applied to still images or frames in video sequences.
FIG. 1 shows an exemplary image 100 having a 26×20 array of pixels, each small square representing one pixel. A 3×3 kernel 102 may be applied to the image 100, starting from the top left corner (i.e., Pixel A1) and passing from edge to edge, line by line in raster mode. For each pixel in the image 100, the digital values of a corresponding pixel block—that pixel and its eight neighboring pixels—need to be retrieved from a memory device or an input buffer before those pixel data are computed with the values of the kernel matrix. The steps of retrieving the pixel data and computing the pixel data with the kernel are typically pipelined, driven by a clock at pixel rate. As the kernel is advanced to the next pixel, for example, from Pixel K16 (marked “1”) to Pixel L16 (marked “2”) in the image 100, only one new column of pixel data (i.e., those of Pixels M15-17) needs to be retrieved. That is, during each clock cycle in the pipelined process, only one new column of pixel data is retrieved from the memory or input buffer, and one output pixel value is usually generated at the same time. Thus, when those pixel blocks located completely within the boundaries of the image 100 (“internal pixel blocks”) are processed, there can be a one-to-one relationship between the rate of retrieving pixel data columns and the outgoing pixel rate.
However, such a fetch-one-column-and-output-one-pixel timing pattern cannot be maintained when the kernel reaches an edge of the image and is about to start scanning and operating on a new line or frame. FIG. 2 illustrates this problem, again with the image 100 and the kernel 102. The pixel block covered by the kernel 102 in FIG. 2, which is centered on Pixel Z16, may be referred to as an “edge pixel block.” As the kernel 102 moves from the last pixel on line 16 (Pixel ZI6) to the first pixel on line 17 (Pixel A17), two new columns of pixel data (i.e., those of Pixels A16-18 and B16-18) need to be fetched before the output pixel value corresponding to Pixel A17 can be computed. A similar problem exists with kernels of other sizes and when the kernel transitions to a new frame in a video sequence, In order to maintain the outgoing pixel rate during the transition, conventional 2-D image processing approaches would require either extra clock cycles for the extra column(s) to be fetched or a memory bandwidth much larger than what is used for non-edge pixel blocks. Neither of these solutions is desirable for lack of efficiency,