Filtering is perhaps a fundamental operation of image processing and computer vision. In the most general sense of the term “filtering”, the value of the filtered (output) image at a given location is a function of the values of the image function in a neighbourhood around the corresponding location in the input image. The image function may be an intensity variable or a colour vector, among other properties.
Low-pass filtering, for example, computes a weighted average of pixel values in a neighborhood. The weights, which determine the impulse response, distinguish the filter type. Typical applications of low-pass filtering are image smoothing and de-noising. The implicit assumption in these applications is that natural images usually have an image function whose value varies slowly over space, so that nearby pixels are likely to have similar values. Local deviations from this “smooth” function are likely to be noise. The noise values that “corrupt” the smooth pixel values exhibit lower mutual correlation than the signal values themselves; and the noise function is assumed to be a zero-mean process. So, by averaging the pixel values, the noise is removed, while the signal (the desired image function) is preserved. The assumption of slow spatial variation of the image function fails at edges, which are consequently blurred by linear low-pass filtering. To overcome this, nonlinear filters have been proposed, which allow edge-preserving filtering.
One example of non-linear edge-preserving filtering is the method of “bilateral filtering”. In bilateral filtering, the influence of a given input pixel on the output depends not only on the position, but also on the value of that input pixel. A linear filter comprises a set of weights which depend only on the position of a pixel with respect to the centre of the neighborhood. By contrast, the weights in a bilateral filter take into account both the spatial deviation of an input pixel from the central location and the deviation in value from the central pixel. For example, a bilateral filter can be defined for an image I at position p as:
                    bf        ⁡                  (          I          )                    p        =                  1                  W          p                    ⁢                        ∑                      q            ∈                          N              ⁡                              (                p                )                                                    ⁢                                            G                              σ                s                                      ⁡                          (                                                                p                  -                  q                                                            )                                ⁢                                    G                              σ                r                                      ⁡                          (                                                                                    I                    p                                    -                                      I                    q                                                                              )                                ⁢                      I            q                                          W      p        =                  ∑                  q          ∈                      N            ⁡                          (              p              )                                          ⁢                                    G                          σ              s                                ⁡                      (                                                        p                -                q                                                    )                          ⁢                              G                          σ              r                                ⁡                      (                                                                          I                  p                                -                                  I                  q                                                                    )                              This defines a weighted average over a neighborhood, where the weight is the product of a Gaussian Gσs on the spatial distance and another Gaussian Gσr on the pixel value difference. The latter is also known as the “range weight”. The effectiveness of the bilateral filter in respecting strong edges comes from the inclusion of this range term in the weight. The range weight prevents pixels on opposite sides of an image edge from influencing one another, since, although such pixels are spatially close, they are remote in range (value).
In the publication by J. Chen, S. Paris and F. Durand, entitled “Real-time Edge-aware Image Processing with the Bilateral Grid”, ACM Transactions on Graphics (TOG), vol. 26, 2007, a data structure called the “bilateral grid” was developed based on the principles of the bilateral filter. The bilateral grid proposed was a 3D array combining the two-dimensional spatial domain with a one-dimensional reference range dimension. Typically, the range axis is image-intensity. In such a three-dimensional space, the extra (intensity) dimension makes the Euclidean distance meaningful for edge-aware image manipulation. A bilateral grid is regularly sampled in each dimension. Thus, the grid may have lower resolution in both spatial domain and range than the original intensity image. Intuitively, the sampling rate of the spatial axes controls the amount of smoothing, while the sampling rate of the range axis controls the degree of edge preservation. The sampling gives rise to a structure corresponding to an array of cells or bins in three dimensions. Each pixel in the image is counted in one of the cells. The basic bilateral grid can then be defined by the following equations:
Initialization: for all (i, j, k)Γ(i,j,k)=(0,0)Filling: for all Pixel Positions (x, y)Γ([x/ss],[y/ss],[I(x,y)/sr])+=(I(x,y),1)
Here [●] is the closest integer (rounding) operation, while ss and sr are the sampling rates of the spatial axes and range axis, respectively. As can be seen from the foregoing equations, the creation of a bilateral grid involves promoting source data into a higher number of dimensions. In the current case, a two-dimensional image is promoted into three-dimensions. Note that the bilateral grid can be filled with any type of data such as scalars and vectors. For many operations on the grid, it is important to maintain information about the occupancy or weight of each grid cell. Here, occupancy refers to the number of input pixels accumulated in each cell. A homogenous-coordinate representation is a convenient way to accommodate this requirement. Thus, color vectors may be stored, for example, as (wR, wG, wB, w). The homogeneous coordinates lend themselves to calculations such as weighted averaging. To return to standard coordinates, the (weighted) colour-coordinates are divided (normalized) by the homogeneous coordinate, w.
To recover (extract) an image from the bilateral grid, a process known as “slicing” can be used. This is symmetrical with creation of the grid, in that, if a grid maintains the full resolution of an image, slicing will restore the original image exactly. To “slice” the bilateral grid using a reference image E, the grid is accessed, using tri-linear interpolation, as follows:Γ(x/ss,y/ss,E(x,y)/sr)Note that implicit in this slicing operation is the conversion from homogeneous coordinates to standard coordinates (by dividing by the occupancy/weight value). The operations performed on the grid between its creation and slicing define the edge-aware image-processing. The particular operations to be performed will often indirectly determine the required grid sampling rates.
The description above, in which image intensity and cell occupancy are accumulated, represents just one example of a bilateral grid. Other uses and constructions are also possible. Several of these are explored in the publication by J. Chen et al., as referenced above. For example, it is not essential that range information (such as intensity) is actually accumulated in the grid. Equally, it is possible that one image defines the range dimension in the grid, but properties of another image are actually accumulated in the cells. It is also possible to construct a grid in more than three dimensions, for example, having two spatial dimensions and two range dimensions. Higher dimensional grids are able to represent joint variations in a larger set of quantities or measurements.
In many practical applications the bilateral grid can be generated as a heavily sub-sampled representation. For example a 70×70×10 grid may be sufficient for an 8 megapixel image, in many cases. However, even with such a relatively low resolution, the computational requirements for typical processing operations on the grid are significant. In one sense, this is inevitable, due to the inherent expansion in the dimensionality of the underlying data. The complexity increases rapidly as the resolution in the multi-dimensional grid is increased or as operations are performed that use neighbourhoods of larger size.