The invention relates to matching a template to samples.
A template is a reference object having certain quantifiable characteristic properties or attributes that can be related to corresponding quantifiable characteristic properties or attributes of other objects referred to as samples. The difference between these properties or attributes is computed and the sample producing the smallest distance measure relative to the template is considered a xe2x80x9cmatch.xe2x80x9d Template matching is an important task in the field of computer vision, image processing, and pattern and voice recognition. In image processing, for example, video applications, template matching is applied to block motion estimation, stereo correspondence, pattern matching, and accumulative nonlinear optimization.
In general, given a template, Td (where d is the dimension of the template), and r samples in a search range, Sid, i=1, . . . , r, the goal is to find the sample in Sid, i=1, . . . , r, which has the smallest distance measure (i.e. has a minimum error) to the template Td. The dimension d of the template Td refers to the number of attributes characterizing the template. The attributes will be matched with attributes of the samples Sid. Dimension d will also be referred to as the dimensionality of the attribute space. The distance between the sample, Sid, and the template, Td, is frequently defined as the sum of absolute difference D(Td, Sid)=xcexa3dj=1 |T(j)xe2x88x92Si(j)| or as the sum of square differences D(Td, Sid)=xcexa3dj=1 |T(j)xe2x88x92Si(j)|2, where j represents the various coordinates in attribute space. Here, the distance measure D(Td,Sid) represents the matching error between the sample, Sid, and the template, Td. For the application of accumulative nonlinear optimization, the distance measure (or error measure) can be defined as D(Si)=xcexa3dj=1F(Si), in which F(Si) is a positive function. The goal of optimization is to find the Si that D(Si) is minimal in the search range.
The complexity of the straightforward algorithm for finding the global minimum using exhaustive search is of the order O(r*d). Because in many applications, the dimensionality of the attribute space and the number of samples can be very large, template matching can be a time-consuming bottleneck. Consider, for example, a video compression application. Each frame in a video sequence is first divided into square image blocks for block motion estimation that usually uses template matching techniques. Each block of the current frame is compared with the blocks in a search range of the previous frame. The attributes used in this application are usually the chromatic values of the pixels in the block. As a simple example, the block size chosen to be 16xc3x9716 with each pixel containing R, G, and B information. Thus, the attribute space is 768-dimensional (d=16xc3x9716xc3x973=768). If the search range is chosen to be 64xc3x9764, then the number of samples will be 4096 (i.e., r=4096). Notice that this operation is required to be repeated for each block in the frame and for each frame in the video sequence.
In the past few decades, many methods have been proposed to speed up the computations of template matching for different applications. For example, in the field of object recognition, the samples in the search space can be fixed and compared with different templates (inputs). A K-dimensional binary search tree (d=K) can then be used to partition in advance the search space of the samples into hyper-rectangular buckets. The search process for the matching sample includes a global search of the order O(log r) for a target bucket and a local search for the desired sample in the target bucket and the neighboring buckets in the K-dimensional space. The performance degrades exponentially with increasing dimensionality because the query hyper-sphere tends to intersect many more adjacent buckets, causing the number of points to be examined to increase dramatically. Another drawback of this method is that whenever the samples in the search space are changed, a new K-dimensional binary search tree has to be built again, which is quite time-consuming and is usually performed in advance if the search space is static. Due to the above two drawbacks, this method is not suitable for speeding up the template matching where the samples in the search space is dynamic or when the dimensionality of the attribute space is large, such as in the case of motion estimation for video coding.
References which describe the above-mentioned K-dimensional binary search tree include:
1. J. L. Bentley, xe2x80x9cMultidimensional Binary Search Trees Used for Associative Searching,xe2x80x9d Comm. ACM, vol.18, no.9, pp.509-517, Sep., 1975.
2. J. L. Bentley, xe2x80x9cMultidimensional Binary Search Trees in Database Applications,xe2x80x9d IEEE Trans. Software Engineering, vol.5, no.4, pp.333-340, July, 1979.
For those applications for which it is impractical to partition and restrict the search space before a search, other methods need to be used to reduce the number of computations. In video compression applications, for example, gradient descent methods (e.g., three-step search) have been used to narrow the search space for block motion estimation, i.e., to omit the search in certain dimensions of the search space. The search space can also be restricted by using motion prediction from neighboring blocks or from previous frames. Another way of reducing the computational cost is to omit the search in certain dimensions of the attribute space by using a subsampling or early jump-out technique while accumulating the respective distance measures. However, all of the above-mentioned approaches do not guarantee that a global minimum will be found.
A reference which describes a three-step search is: T. Koga, K. Linuma, A. Hirano and T. Ishiguro, xe2x80x9cMotion compensated interframe coding for video conferencing,xe2x80x9d Proceedings of National Telecommunication Conference, pp. G5.3.1-5.3.5, November, 1981.
The work we refer to for using motion prediction is: J. Chalidabbongse and C. C. Jay Kuo, xe2x80x9cFast motion vector estimation using multiresolution-spatio-temporal correlations,xe2x80x9d IEEE Trans. on Circuits and Systems for Video Technology, vol. 7, pp. 477-488, 1997.
U.S. Pat. No. 5,682,209 describes an early jump-out (or early-exit) technique which omits some dimensions of the attribute space, instead of omitting some dimensions of the search space as is the case in a three-step search.
The invention features a method of omitting certain dimensions of the attribute space to find the global optima using a different strategy than conventional approaches, such as a three-step search and an early jump-out, neither of which can guarantee finding the globally optimal match.
In general, in one aspect, the invention provides an optimal match between attributes of a template and a sample, the attributes to be matched defining an attribute space, by first computing distance measures between the template and each of the samples in a first subspace of the attribute space that has a lower dimensionality than the attribute space. The smallest distance measure of the computed distance measures is then determined and a new distance measure is computed for the sample that has the smallest distance measure in the first subspace, in a second subspace of a higher dimensionality than the first subspace. The new distance measure is compared with the distance measures previously determined in the first subspace for the other samples. A new minimal distance measure is computed and the process is repeated until the dimensionality of the subspace at which the computed new distance measure is a minimum, is equal to the dimensionality of the attribute space.
Advantageous embodiments of the invention may include one or more of the following features.
The attributes are physical properties of an object, such as size, spatial orientation, luminance and color information of the object. For example, the attributes can include characteristic features of a video image, such as the luminance and chromatic values of the pixels in a predefined block. In another example, the attributes can also include characteristic features of an audio signal, for example audio signal generated and compared in speech recognition applications.
The dimensionality of the template attribute space can be different from that of the sample attribute space, but is preferably identical. The distance measures are preferably first computed in a first subspace having a dimensionality equal to one. The dimensionality of the second subspace is preferably at least one higher than the dimensionality of the first subspace.
A list of ordered candidates of globally best matches (e.g., the three best matches) can be compiled, instead of just one single globally optimal match, by continuing the remaining matching process after the globally optimal match is found and removed from the samples to be matched. With a similar process, multiple global optima can be found by continuing the matching process until a sample with larger matching error (i.e., larger than the global minimum error) is found.
In general, in applications where the computational time is strictly limited, a sub-optimal match between attributes of a template and a sample can be provided by repeating the following process for a predetermined computation duration. In this case, the sample having the largest accumulated dimensionality for the given time duration is chosen to be the optimal match. If more than one sample having the largest dimensionality is found, the sample with the smallest accumulated error measure will be chosen to be the optimal match. The process begins with computing distance measures between the template and each of the samples in a first subspace of the attribute space that has a lower dimensionality than the attribute space. The smallest distance measure of the computed distance measures is then determined and a new distance measure is computed in a second subspace of a higher dimensionality than the first subspace for the sample that has the smallest distance measure in the first subspace. The new distance measure is compared with the distance measures previously determined in the first subspace for the other samples. A new minimal distance measure is computed and the process is repeated until a time limit for performing the computation has been exhausted.
Advantageous embodiments of the invention may include one or more of the following features.
The attributes include characteristic features, such as a video image and the time limit for the computation is associated with a frame rate of a video image. The video image includes pixels, and the attributes include the luminance or chromatic values of the pixels in a predefined block.
In computing the distance measures between the template and each of the samples, the order of the samples for making the computations can be predetermined. For example, the order may be sequential, based on a raster scan (e.g., from top to bottom, right to left), spiral, or any other predefined order. In certain applications, the order can be determined from a histogram, for example, of a video image. In such an application, the histogram may represent a frequency table for each luminance value appearing in the video image.
In general, in another aspect, the invention can be implemented in the form of digital electronic circuitry, or in computer hardware, firmware, software, or combinations of these forms. For example, the invention can be implemented in a computer program product tangibly embodied in a machine-readable storage device for execution by a programmable processor; and method steps of the invention can be performed by a programmable processor executing a program of instructions to perform the above-described functions by operating on input data and generating output. The invention can advantageously be implemented in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedural or object-oriented programming language, or in assembly or machine language if desired; and in any case, the language can be a compiled or interpreted language. Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM disks. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).
To provide for interaction with a user, the invention can be implemented on a computer system having a display device such as a monitor or LCD screen for displaying information to the user and a keyboard and a pointing device such as a mouse or a trackball by which the user can provide input to the computer system. The computer system can be programmed to provide a graphical user interface through which computer programs interact with users.
Among the advantages of the invention are one or more of the following.
The invention advantageously and dramatically speeds up the computation of template matching while still ensuring finding a globally optimal match. Many conventional schemes for speeding up the computation of template optimal match, such as gradient descent, subsampling, or early jump-out techniques, may result in a trapped local minima, and thus does not guarantee determining the optimal globally match.
The reason that the computational cost of finding global optima can still be greatly reduced is as follows. In the process of calculating the distance measures between the samples and the template, most distance measures are computed only in a low-dimensional space with a dimensionality considerably less than that of the attribute space, and at the higher dimensionalities only for very few samples. In most cases, intermediate accumulations of the distance measures are already greater than the global minimum. Therefore, these accumulations can be terminated, eliminating the need to perform otherwise time-consuming computations associated with the remaining dimensions. Also, only the sample having the minimal distance measure to the template is selected to accumulate the distance measure in the next higher dimension. The higher dimension is recorded and the samples are sorted again according to the updated distance measures. The above process is repeated until the sample having the minimal distance measure has accumulated the error components in all dimensions d. With a suitable data structure, such as a heap, the overhead of each sorting step, except for the step for the first dimension, is less than log r. In block-matching motion estimation, for example, the computational cost can be reduced by about 90% to 99%, depending on the content of the template and the samples.
The invention can also provide a list of globally best matches. In some applications, a list of best matches are very useful for robust computation, human judgement, or other further processing.
Another advantage of this invention is that the global optimality can be sacrificed to satisfy the time constraint required by some applications. For example in video coding, if the time allowed for motion estimation is limited, then the invention can also provide a sub-optimal match by choosing the sample which has the largest accumulated dimensionality when the time limit is met. If there are more than one samples having the largest dimensionality, the one with a smaller distance measure will be chosen as the best match.
Other advantages and features of the invention will be apparent from the following description and from the claims.