Digital image processing involves implementing mathematical algorithms on a computer or other electronic processor to manipulate or otherwise process a picture stored in digital form. As stored in memory, a digital image is a grouping of numerical data representing an image in encoded form. For example, an image can be represented as an array of pixels (picture elements), each of which constitutes a small portion of the overall image. In the simplest of digital encoding schemes, a picture is cast in terms of black and white pixels only, with the black pixels represented as a binary “1” and the white pixels represented as a binary “0.” The pixels in grey scale and color images can each be encoded as a string of bits, e.g., 8 bits/pixel in a grey scale picture and 256 bits/pixel in a color picture. Image processing may be used for purposes such as image modification, pattern recognition, and feature extraction. Feature extraction involves detecting or isolating various desired portions of a digitized image or video stream. For example, feature extraction may be used to automatically identify buildings and roadways in an aerial image, or to identify and process characters in an image containing text.
The Radon transform is a mathematical algorithm that is occasionally used in image processing and analysis for detecting linear structures and other features in a digital image. The Radon transform integrates a function over lines in a plane, mapping a function of position to a function of slope and y-intercept, per the following equation:
      R    ⁢          {              f        ⁡                  (                      x            ,            y                    )                    }        =            ∫              -        ∞            ∞        ⁢                  ∫                  -          ∞                ∞            ⁢                        f          ⁡                      (                          x              ,              y                        )                          ⁢                  δ          ⁡                      (                          y              -                              (                                  mx                  +                  b                                )                                      )                          ⁢                                  ⁢                  ⅆ          x                ⁢                                  ⁢                              ⅆ            y                    .                    where “m” is the line slope and “b” is the y-axis intercept. The Radon transform can also be expressed with respect to “line normal” parameters, where instead of being characterized in terms of a slope and y-axis intercept, a line is characterized in terms of a value “r” or “ρ”, the perpendicular distance from the line to the coordinate origin, and an angle “α” or “θ” between the perpendicular and the x-axis (see FIG. 2A):
                              R          ′                ⁡                  (                      r            ,            α                    )                    ⁡              [                  f          ⁡                      (                          x              ,              y                        )                          ]              =                  ∫                  -          ∞                ∞            ⁢                        ∫                      -            ∞                    ∞                ⁢                              f            ⁡                          (                              x                ,                y                            )                                ⁢                      δ            ⁡                          (                              r                -                                  x                  ⁢                                                                          ⁢                  cos                  ⁢                                                                          ⁢                  α                                -                                  y                  ⁢                                                                          ⁢                  sin                  ⁢                                                                          ⁢                  α                                            )                                ⁢                                          ⁢                      ⅆ            x                    ⁢                                          ⁢                      ⅆ            y                                ,An application of the Radon transform for image processing is shown, for example, in U.S. Pat. No. 6,873,721 to Beyerer et al. However, it is more typically the case that a Hough transform is used for such applications. The Hough transform is a discreet form of the Radon transform, and is thereby well suited to processing digital images in addition to being less computationally intensive.
As noted, digital images contain a number of data points/pixels. Some of these data points will be indicative of features in the original picture, and some may result from noise. For example, FIG. 1A shows an aerial view (e.g., from a surveillance aircraft) of a road, a nearby structure, and brush/trees. FIG. 1B shows a possible digital image of the aerial view, and FIG. 1C shows a pixilated version of FIG. 1B. As can be seen, the image contains noise from signal interference, equipment limitations, environmental conditions, or the like, which partially obscures the features of interest. Accordingly, the features of interest are difficult to identify, especially within the context of automatic computer processing. The purpose of the Hough transform is to statistically determine the line equations that best fit the data in the image, as indicated in FIG. 1D.
The Hough transform involves converting data from Cartesian space (e.g., the x-y coordinate plane) to “Hough space.” In the former, line data (for example) is expressed in terms of x and y coordinates, line slope, and a y-axis intercept. In the latter, line data is represented in terms of a normal parameterization. To elaborate, FIG. 2A shows a line in Cartesian space. The line can be represented by the standard line equation y=mx+b, where “m” is the slope of the line and “b” is the y-axis intercept. The line can also be represented in terms of normal parameters “θ” and “ρ”. The value ρ represents the length between the coordinate origin and the nearest point on the line, which according to elementary geometry is the length of the line segment that is both normal to the line and that intersects the origin. θ is the angle between the x axis and normal, as indicated. The line equation for this geometry is given by:x cos θ+y sin θ=ρθ→[0, π]As should be appreciated, each pair of normal parameters (ρ, θ) uniquely defines a line. Similarly, every line in Cartesian space (FIG. 2A) corresponds to a single, unique point (ρ, θ) on the ρ and θ coordinate axes in Hough space (FIG. 2B).
While a line in the x-y plane corresponds to a point in the ρ-θ Hough space plane, the infinite number of lines that can be drawn through a point in x-y space map to a sinusoidal curve in the ρ-θ plane. (In other words, each line is a ρ-θ point in Hough space, and all the possible lines through a single point in the x-y plane form a sinusoidal curve in Hough space.) This relationship is illustrated for a point (3, 3) in FIGS. 3A and 3B. The Hough transform takes advantage of this relationship, and the fact that the sinusoidal curves in Hough space corresponding to points along a line in x-y space have a common intersection point. In other words, for two or more co-linear points in the x-y plane, each will map to a separate sinusoidal curve in the ρ-θ plane, with the curves intersecting at one and only one point (θ→0 to π; 2π=360°). An example in shown in FIGS. 4A and 4B. FIG. 4A shows four co-linear points A1-A4. Each point maps to a sinusoidal curve A1′-A4′, respectively, in the normal parameterization plane (Hough space) as shown in FIG. 4B. (Again, each curve in Hough space corresponds to the all the possible lines drawn through a point on the x-y plane per the equation ρ=x cos θ+y sin θ.) The four curves A1′-A4′ intersect at one point P1 corresponding to the normal parameters (ρ′, θ′). These normal parameters in turn define the line in the x-y plane along which points A1-A4 fall. More generally, a set of points which form a straight line will produce Hough transforms which cross at the parameters for that line.
The Hough transform is used in digital image processing by translating data points (e.g., image pixels) from x-y space to curves in Hough space, and then, in effect, identifying points of intersections between the curves. The points where curves intersect in turn provide the line equations, in terms of (ρ, θ) pairs, for the line(s) that best fit the data. The curves in Hough space can also be analyzed to determine points where a maximum number of curves intersect at one point. Noise data falling outside the maximum values is eliminated, rendering the Hough transform useful for noisy images/signals. FIGS. 5A-5C show a simplified example. In FIG. 5A, an image is stored as an array of pixels (here, a 7×7 array), with white pixels having a 0 value and black pixels, representing data points (possible features of interest in the image), having a 1 value. The pixels are logically arranged with respect to the x-y coordinate axes, as indicated. FIG. 5B shows an accumulator table, which could be implemented as a memory array. (Cells in the accumulator array with a zero value are not shown, for clarity purposes.) Values for ρ are logically arranged along one axis, with divisions according to a selected resolution/granularity level and depending on the size of the pixel array. For example, for a 7×7 pixel array it might be expected to have values for ρ of between 7 and −7. Possible gradations might be 0.1, 0.2, or 0.5. The values for θ are along the other axis. Again, different levels of gradation are possible, e.g., values from 0 to π in π/8 increments.
In carrying out the Hough transform, each pixel point in x-y space is translated to discreet points on a curve in Hough space, according to the equation ρ=x cos θ+y sin θ. Thus, for pixel A (x=2, y=0) in FIG. 5A, the equation would be ρ=2 cos θ+0 sin θ=2 cos θ. Then, a series of values for ρ are calculated for the point, one for each gradation along the θ axis. (Thus, for pixel A, there would be a series of values, for example, of: ρ=2 cos 0; 2 cos π/8; 2 cos π/4; 2 cos 3π/8; 2 cos π/2; 2 cos 5π/8; 2 cos 3π/4; 2 cos 7π/8; and 2 cos π.) For each value, the appropriate cell in the accumulator table is incremented by 1. Subsequently, this is done for every black pixel in the x-y plane. The values for the data in FIG. 5A are plotted in FIG. 5B, which also graphically illustrates how the accumulator data corresponds to the sin wave plots, e.g., as in FIG. 4B. Many of the accumulator cells will have a zero or 1 value. Accumulator cells with a 2 value indicate a spot where two curves in Hough space intersect, corresponding to two co-linear points in the x-y plane. An accumulator cell with a maximum value corresponds to a line that best fits the data, while excluding noise. For example, cell B in FIG. 5B has an accumulator value of 4, corresponding to 4 co-linear points in x-y space. Cell B also provides the equation of the line through the 4 points, e.g., ρ=−1.4, θ=3π/4=135°, illustrated in FIG. 5C. Note that the equation of the line is given even though (i) certain data points along the line are missing, and (ii) the image contains noise pixels such as pixels C and D. Regarding pixel C in FIG. 5A, the (ρ, θ) values given by cells E and F in FIG. 5B correspond to the line formed by pixels C and A. (The values in cells E and F are equivalent, i.e., define the same line in x-y space.) Regarding pixel D in FIG. 5A, cell G in FIG. 5B corresponds to the line formed by pixels D and H in FIG. 5A. The algorithm could choose to ignore these lines as noise by selecting the accumulator cell with the maximum data as representing the best fit for all the data. If there is more than one accumulator cell with large or maximum value, each indicates a likely feature line in the digital image.
The Hough transform can also be adapted for use in identifying circular image features, features identified according to other regular mathematical formulas, or random features (e.g., using the generalized Hough transform). Further information about the Hough transform and its implementation in the context of digital image processing can be found in a number of U.S. patents, including U.S. Pat. No. 5,832,138 to Nakanishi et al., the aforementioned patent to Beyerer et al., U.S. Pat. No. 5,063,604 to Weiman, U.S. Pat. No. 6,732,046 to Joshi, U.S. Pat. No. 3,069,654 to Hough, U.S. Pat. No. 6,826,311 to Wilt, and U.S. Pat. No. 5,220,621 to Saitoh, all hereby incorporated by reference herein in their entireties.
In image processing or analysis using the Hough transform or otherwise, it is typical for an image to be processed in row-column order such that each pixel in the source image is visited as part of a transformation or some other form of analysis. For example, as described above with reference to FIGS. 5A-5C, in a typical algorithm using the Hough transform, each pixel of interest (e.g., a 1 or other non-zero value) in the array in FIG. 5A would be sequentially translated into Hough space, namely, into a plurality of normal parameters, for incrementing the appropriate accumulator cells in the memory array in FIG. 5B. In general, this type of process is written in a high-level programming language using nested “for” loops, such as the following:
FOR INTEGER y ← 0 TO height DO BEGIN  FOR INTEGER x ← 0 TO width DO BEGIN    perform some action on (x, y)  ENDENDIn operation, such a program would first visit the y=0 row, and perform an action on each x in that row (e.g., x→0 to 6 in FIG. 5A). For example, for each pixel in the row the program might perform a Hough transform as described above. Then, the program would move on to the y=1 row, the y=2 row, and so on until the last row in the matrix was reached.
As should be appreciated, this type of program offers a convenient structure for image analysis in the sense that certain optimizations can be applied to allow for the fast sequential access of pixel data. This program structure is also well suited for use in transformations where it is imperative that every pixel is examined. However, considering the large amount of data typically present in a digital image (e.g., potentially hundreds of thousands of pixels) and the processor-intensive computations involved in processing images by way of a Hough or similar transform (e.g., many computations per pixel), it is not well suited for applications with limited processor and/or memory resources, e.g., in wireless devices such as mobile phones, portable computer terminals, and digital cameras.