The present invention generally relates to digital signal pattern matching and more specifically some embodiments of the invention relate to methods and apparatus for matching segments across multiple related images.
Digital signal pattern matching involves a plurality of multi-dimensional digital signals and is a process of matching all or part of one or more of the signals to all or part of another one or more of the signals. For example, where the digital signals represent digitized images and the plurality relates to a sequence of digital images, a digital signal matching process might be used to identify corresponding portions of the images between two images that have a known relationship. One such relationship is where the digital images in the plurality form a timed sequence such as a video sequence and the two images that are being processed are two adjacent in time images.
Matching is often used to detect motion in the video sequence, by identifying an object in the scene captured in one digital image, identifying that same object in another scene and noting the position change from one image to the other. For example, where the video sequence being processed is a football game, the process might be used to detect the motion of an object such as a football. It should be noted that the matching processes described herein are not limited to actual objects in the scene, but might reference portions of objects. For example, in a video sequence of a beach ball having several solid colored portions of differing colors might be processed with each different colored portion being treated as a different object.
An image frame can be divided up into a plurality of segments such that each pixel is associated with exactly one segment, or in some cases with a small number of segments where a pixel is judged to be on a border of the image where a small number of segments meet. Typically, a segment is a portion of a digital signal wherein the pixel color values are substantially uniform. Several methods of dividing an image frame into segments according to the pixel color values of the image frame are described in Prakash II. As used herein, a segment is a portion of the image frame or frames that is a contiguous region having a relatively small amount of color variation throughout. For example, one image frame might include a region where the color values of the pixels in that region are all blue or variations of blue such that the region can be identified as a segment.
In the general case of motion matching, a segment of one image is identified from one digital signal representing an image frame and matched to a corresponding segment or portion of another image represented by another digital signal. The matching process, in this example, might consider the blue region (mentioned above) a segment and seek to match that segment with a similar blue region in another image frame.
While it need not be the case, matching is often an attempt to xe2x80x9ctrackxe2x80x9d a segment in a video sequence as it moves within the frame window of the video sequence. Thus, digital signal pattern matching can be used in various applications such as video compression, medical imaging and object tracking. For example, a digital image processor can determine how a segment moved from one image frame of a video sequence to the next image frame of the video sequence by noting the position of a segment in a first image frame, extracting that segment and matching it against a second image frame, noting the position of a corresponding (matched) segment found in the second image frame and using the difference between the positions as an indication of motion. Often, the motion between two frames of an N-dimensional sequence is described as an N-dimensional vector. Thus, where the video sequence is a sequence of two-dimensional images, the motion of a segment S can be expressed by the two-dimensional vector uS=(xS, yS), where xS is the relative displacement of the segment in the horizontal direction and yS is the relative displacement of the segment in the vertical direction. Typically, the units of measurement for the displacements are in pixels.
One known method for digital signal pattern matching is the exhaustive search method. This computationally expensive method involves comparing a segment from a first frame against the pixels of a second image frame at each possible location of the corresponding segment in the second image frame, within a limited search area. This method is computationally expensive because the pixel color values (usually organized in a N-dimensional array, where N is the dimension of the images) of the segment are compared to candidate corresponding pixels about as many times as there are pixels in the second image frame.
Logically, this process can be described as follows. Suppose one of the pixels of the segment (or any other pixel with a fixed relationship to one or more of the pixels of the segment) is designated the segment reference pixel and one of the pixels of the image frame to which the segment is to be matched is designated the frame reference pixel. Then, each step of comparing is a process of overlaying the segment reference pixel on the frame reference pixel and then performing a calculation such as a sum of the differences of overlaying pixels. This step is then repeated for each possible frame reference pixel until the corresponding segment, or a best choice for the corresponding segment, is identified.
Where the calculation is the sum of the differences of overlaying pixels, the sum will be zero if an exact match occurs, i.e., if the segment is positioned over the image frame such that each pixel in the segment overlays a pixel in the image frame having the same pixel color value as the pixel in the segment. An acceptable match occurs when the sum of the absolute values of the differences (referred to herein as the Ll norm) is less than or equal to some threshold. Lower Ll norms indicate better matches.
The widely used video compression method of the Moving Picture Experts Group (MPEG) attempts to match segments comprising blocks of 16xc3x9716 pixels by placing each block at each possible location within an image frame, or portion thereof, and subtracting the pixel values. Once again, a match occurs when the sum of the absolute values of the differences between the pixel values is close to or equal to zero.
In other embodiments of the state of the art, the matching routines are not limited to the Ll norm. Any form of minimizing norm is sufficient. This is mathematically known as the Lp norm where Pxe2x89xa71.
The exhaustive search method is described below with reference to FIGS. 1A and 1B. In those figures, different colors are represented by different cross-hatching. Referring to FIG. 1A, a segment 10 has been identified and is to be matched against an image frame 11 (shown in FIG. 1B). FIG. 1B shows an image frame 11 containing six segments, labeled segment 12, segment 14, segment 16, segment 18, segment 20 and segment 22 (the background).
In the matching process, segment 10 is overlaid on image frame 11 at a given position and the color values of each pixel of segment 10 are subtracted from the values of corresponding pixels in image frame 11. In order to subtract pixel values, we assume a monochromatic image with linear color values; i.e., the difference between a pixel value of 30 and 31 is the same as the difference between a pixel value of 80 and 81. The monochromatic image is a function of the various color components of the original image. In another embodiment, motion matching routines are separately applied to each color component of the original image. Multiple possible matches can be generated in this manner. In this case, the intersection of the results is taken in order to determine the correct match. When the segment 10 is directly over a matching segment (segment 12, in this case) of image frame 11, then the sum of the absolute values of the differences will be zero or close to zero. The best match, as segment 10 is placed at each location within image frame 11, occurs when segment 10 is directly over segment 12 of image frame 11. Thus, the best match for segment 10 in FIG. 1B is segment 12.
Another known method of pattern matching is the traditional correlation method. The traditional correlation method entails overlaying the segment on the image frame, as described above and, at each pixel location, multiplying the segment pixel color value by the pixel color value of the corresponding image frame pixel. As the segment is overlaid in different locations, the goal of the method is to find the location that maximizes the sum of the products from each of the pixel locations.
For example, FIG. 2A shows a graph, with the horizontal axis representing displacement in an image frame and the vertical axis representing pixel value. The pixel value of a first image segment are greatest at the spikes of graph portion 21. FIG. 2B is a graph showing a low graph portion 22 and two spikes (graph portion 23). The spikes of graph portion 23 represent a second image segment that matches the first image segment. If graph portion 21 is placed over graph portion 22, as seen in FIG. 2C, and the corresponding pixel values multiplied, the sum of the products will be fairly small. Similarly, still referring to FIG. 2C, if graph portion 21 is placed almost on top of graph portion 23, but slightly offset, the sum will be greater. However, the largest sum occurs when graph portion 21 is placed directly over graph portion 23. Therefore, the second image segment is considered to be the best match.
The traditional correlation method is a variation of the exhaustive search method described above (the Ll norm method being another variation). The traditional correlation method can be made faster than the Ll norm method by taking advantage of certain mathematical properties of correlation. Correlation of signals is a well-known process of shifting one signal over another and multiplying the signals.
Consider two signals defined as functions of an independent variable t. The correlation of the two signals can be determined using Fourier transforms. Specifically, if p(t) and q(t) are the two signals the correlation Cp,q(v) of p( ) and q( ) is:
Cp,q(v)=Fxe2x88x92l[P(s)Q*(s)]
where Fxe2x88x92l is the inverse Fourier transform, P(s) is the Fourier transform of p(t) and Q*(s) is the complex conjugate of the Fourier transform of q(txe2x88x92v). Where the signals have more than one dimension, t, s and v can be expressed as vectors t, s and v of the appropriate dimension (vectors are denoted herein as bolded variables).
Relating this to the traditional correlation method introduced above, the segment being matched can be represented by the function f( ) such that the pixel located at a location represented by a vector r will have a pixel color value of f(r) and the image frame with which the segment is being matched can be represented by the function g( ) such that the pixel located at a location in that image frame represented by a vector r will have a pixel color value of g(r). For simplicity, assume that the segment is part of a first image frame and the image frame with which the segment is being matched is a second image frame and the first and second image frames are represented by N-dimensional arrays of pixel values having the same array sizes. To avoid interference in the correlation process from pixels in the first image frame that are not part of the segment, those pixels might have their pixel color values adjusted to zero, so that they do not contribute to any sum of multiplied pixel values; i.e., f(r)=0 for all r that point to pixels outside the segment being matched.
Using the above notation and functions, the sum of the products when the first and second images are aligned and overlaid is shown in Equation 1.                     j        =                              ∑            r                    ⁢                      xe2x80x83                    ⁢                                    f              ⁡                              (                r                )                                      ⁢                          g              ⁡                              (                r                )                                                                        (                  Equ          .                      xe2x80x83                    ⁢          1                )            
Now consider j as a function of u, a vector representing an offset between the first and second images. In other words, when the first and second image frames are offset by u and the corresponding pixel color values multiplied and summed, the result is shown in Equation 2.                               j          ⁡                      (            u            )                          =                              ∑            r                    ⁢                      xe2x80x83                    ⁢                                    f              ⁡                              (                r                )                                      ⁢                          g              ⁡                              (                                  r                  -                  u                                )                                                                        (                  Equ          .                      xe2x80x83                    ⁢          2                )            
Using the above equations, the process of matching a segment to an image frame can be transformed into a process for identifying the value of u that results in the desired value for j(u), a task that would take only one or a few calculations for each value of u once j( ) is determined. The function j( ) can be determined by noting that Equation 2 is in the form of a convolution. Thus, j(u) is the inverse Fourier transform of F(s)G*(s), where F(s) and G*(s) are the Fourier transformations of f(r) and g(rxe2x88x92u), respectively.
There are numerous other methods of matching segments from one frame with segments from another frame, but previous techniques have either been computationally expensive or useful only for image frames with very specific characteristics. Therefore, what is needed is an improved method for matching a segment within an image frame in the general case without extensive computation.
The present invention provides for an efficient method of matching a segment in one image with a segment in another image. Fourier transforms are implemented to aid in the process.
A method of identifying a displacement of a segment present in a first image and a second image, according to one embodiment of the invention, the displacement representing a relative change in position of the segment between the first and second images, the method comprising: deriving a first image frame by copying the first image and setting all pixel values to zero for pixels that do not fall within the segment; deriving an inverse image frame, wherein a pixel value in the inverse image frame is an inverse of a corresponding pixel value in the second image, with corresponding pixel values being for pixels located in corresponding positions; and determining the value of u that yields a minimum for the absolute value of the quantity j(u)xe2x88x92A, wherein             j      ⁡              (        u        )              =                            ∑          r                ⁢                  xe2x80x83                ⁢                              f            ⁡                          (                              r                -                u                            )                                ⁢                      h            ⁡                          (              r              )                                          =      A        ,
r is a variable that ranges over the second image, f is a function that returns a pixel value at the pixel location of the first image frame, the pixel location being specified by the function argument, h is a function that represents the inverse image frame and returns a value that is the inverse of the value that a function g returns, g is a function that returns a pixel value at the pixel location of the second image specified by the function argument, u is a motion vector representing the correct displacement of the segment within g for a match, and A equals the area of the segment represented as a number of pixels.
A method of identifying a displacement of a segment present in a first image and a second image, according to another embodiment of the invention, the displacement representing a relative change in position of the segment between the first and second images, the method comprising: deriving a first image frame by copying the first image and setting all pixel values equal to zero for pixels that do not fall within the segment; deriving an inverse image frame, wherein a pixel value in the inverse image frame is an inverse of a corresponding pixel value in first image frame, with corresponding pixel values being for pixels located in corresponding positions; and determining the value of u that yields a minimum for the absolute value of the quantity j(u)xe2x88x92A, wherein             j      ⁢              (        u        )              =                            ∑          r                ⁢                  xe2x80x83                ⁢                              h            ⁢                          (              r              )                                ⁢                      g            ⁢                          (                              r                -                u                            )                                          =      A        ,
r is a variable that ranges over the first image, h is a function that represents the inverse image frame and returns a value that is the inverse of the value that a function f returns, f is a function that returns a pixel value at a pixel location of the first image frame, u is a motion vector representing the correct displacement of the segment within g for a match, and A equals the area of the segment represented as a number of pixels.
A further understanding of the nature and advantages of the inventions herein may be realized by reference to the remaining portions of the specification and the attached drawings.