The invention is related to methods and apparatus for performing digital filtering and convolution operations.
Filtering in the digital domain may be performed using a finite impulse response (FIR) filter defining a number of filter coefficients. The filter coefficients represent a unit impulse response function having a desired frequency response. Filtering is performed by convolving an input signal with the filter coefficients. Two dimensional arrays of input data can be filtering using a matrix of filter coefficients often referred to as a kernel.
Traditionally there are two methods to implement two dimensional separable filter and convolution. The first method involves multiple passes. The first pass performs filtering on arrays of data along one dimension and the second pass operates on arrays of data along a second dimension. This method is simple in implementation but it requires reading of the input data from memory twice and requires writing of the intermediate and final filtering results twice. In some applications, particularly computer graphics, the results of filtering may additionally be alpha blended with other input data, which requires reading of the other input data from memory and writing the output of alpha blending to memory.
These memory reads and writes are to off-chip memory such as DDR or SDRAM, which have high latency relative to the speed of a processor. The multiple memory reads and writes increase the memory bandwidth requirements, increase power consumption, and decrease performance of the overall system.
The second method involves only a single pass. This method reads multiple lines oriented in one dimension into line buffers in on-chip storage. Filtering is then performed on the lines stored in the buffers. For a 1920×1080 screen resolution, a 5×5 filter kernel, and 4 bytes per pixel, this method requires 37.5 kB of on-chip memory. The line buffer size increases with vertical filter kernel size. For 9×9 filter kernel, the line buffer needs to be 67.5 Kbytes. Such a large on-chip memory is expensive and consumes a large amount of power. For low power applications, e.g., embedded systems, larger on-chip memory is therefore not acceptable
In view of the foregoing, what is needed is a two-dimensional filtering apparatus and method that has drastically reduced power and memory requirements.