This invention relates generally to correlation of pixelated images in the digital domain, and more specifically to a correlator enabling low-latency image processing, advantageously in self-contained fashion on a digital signal processor (DSP) deployed on an integrated circuit chip.
Devices having a tracking capability (such as a hand-held scanner) require navigation functionality in order to maintain awareness of the device""s present position on a piece of work. The surface texture of the work can provide a frame of reference for navigation. A known effective technique for enabling such navigation is to shine light at an angle on the work, and to process the resulting reflection, which will include the surface texture shadow of the work. This technique enables navigation using, for example, the fiber texture on the surface of a piece of paper from which an image is being scanned.
Part of such a navigation technique is correlation. In a series of frames representing portions of the image captured during motion across the image, correlation produces a numerical representation of xe2x80x9chow much the current frame looks like the previous frame.xe2x80x9d Deriving this numerical representation is analogous to laying a photograph slide of a current image over a negative of a reference image, and then moving the slides around until the least amount of light gets through. The numerical representation sought in correlation corresponds actually to the amount of light that actually gets through at the nadir point and thus quantifies the xe2x80x9cbest fitxe2x80x9d between the two images.
Correlation is typically performed in the digital domain in accordance with techniques described with reference to FIGS. 1A-1D. Reference image 101 on FIG. 1A comprises, for example, a 6xc3x976 array of reference pixels R0-R35. Each reference pixel R0-R35 will be understood to be a digital value representative of the information seen by that pixel when the image was captured. Compare image 102 on FIG. 1A comprises a 6xc3x976 array of compare pixels C0-C35 clipped for the purposes of correlation to a 4xc3x974 array 103. Referring now to FIG. 1B), compare array 103 is overlayed xe2x80x9cdead centerxe2x80x9don reference image 101, generating 16 calculations 104 as shown on FIG. 1B. In FIG. 1B, exemplary use is made in calculations 104 of the function (Rx-Cy)2, although other functions of R and C may be used in correlation, such as |Rxxe2x88x92Cy|.
The aggregate sum of all 16 calculations 104 on FIG. 1B goes forward to form output value O4 on result surface 105 depicted on FIG. 1D. With further reference to FIG. 10, result surface is typically a 3xc3x973 array of output values O0-O8.
Turning now to FIG. 1C, array 103 is now overlaid, for example, on reference image 101 one reference pixel to the right of dead center. The aggregate sum of calculations 104 on FIG. 1B corresponding to this overlay yields output value O5 on result surface 105 on FIG. 1D. With further reference to FIG. 1C, array 103 is now overlaid, for example on reference image 101 one pixel diagonally up and to right of dead center. The aggregate sum of calculations 104 on FIG. 1B corresponding to this overlay yields output value O2 on result surface 105 on FIG. 1D.
The result of the foregoing process is that result surface 105 on FIG. 1D comprises a series of output values O0-O8 each representative of correlation between array 103 and the corresponding patch of reference image 101 when array 103 is xe2x80x9cmoved aroundxe2x80x9d reference image 101. The lowest value of O0-O8 is the xe2x80x9cbest fitxe2x80x9d and is the correlation value for reference image 101 and compare image 102.
Although exemplary use in FIGS. 1A-1D has been made of a 6xc3x976 reference image 101 and compare image 102 (the compare image clipped to 4xc3x974 to facilitate xe2x80x9cmovementxe2x80x9d over reference image 101) in order to generate a 3xc3x973 result surface, there is no limitation on these numbers to perform correlation according to the foregoing technique. Any size of reference image and compare image may be correlated, and the amount of xe2x80x9cmovementxe2x80x9d enabled will dictate the size (and resolution) of the result surface.
Correlators of the current art using this technique typically store entire frames of digitized input pixel values in memory and then correlate the frames using an off-chip processor. Calculations are generally done serially for each output value over the result surface, calculations for the next output value not started until the previous output value has been determined. This results in a long latency from completion of the digitization of a frame until the result surface against the previous reference frame is calculated. There is also a high hardware overhead requiring at least two memory regions for the reference frame and the compare frame.
This type of batch processing causes slowdowns that could be remediated by more of a continuous and parallel processing of correlation calculations. It would also be advantageous to be able to perform correlation on-chip, which might become more feasible if the hardware requirements were optimized.
There is therefore a need in the art to perform correlation calculations in more of a xe2x80x9cstreamingxe2x80x9d fashion, preferably on-chip.
These and other objects, features and technical advantages are achieved by a correlator in which indexed patches of pixels on the current and reference frames are presented to correlation cells for processing in a xe2x80x9cstreamingxe2x80x9d fashion.
The inventive correlator derives its inventive concept from recognizing, in the current examples illustrated on FIGS. 1A-1D that the pixel values in compare array 103 (pixel values C7-C10, C13-C16, C19 -C22, and C25-C28 on FIG. 1A) are each used once and only once, in every calculation of an output value O0-O8. Thus, for example, if architecture is used where pixel value C7 is presented to nine calculators concurrently, and the appropriate reference pixel values are sent at the same time to the calculators, the nine calculators may individually execute a different calculation in unison, where each of the calculations is one of those required to determine a corresponding one of the output values. Therefore, C7 is not needed again, all of the calculations requiring C7 now having been made.
Repeating this process for a stream of compare pixel values C7-C10, C13-C16, C19-C22 and C25-C28 (as used in the example of FIG. 1A) enables all output values O0-O8 to be determined simultaneously after 16 iterations of the concurrent process. This xe2x80x9cstreamingxe2x80x9d process dramatically reduces the latency required to perform these calculations in comparison to corresponding xe2x80x9cbatchxe2x80x9d systems of the prior art. The only difference over the prior art process described in the previous section is that according to the inventive correlator, none of the output values are known until the 16th and final iteration is complete, whereupon all output values O0-O8 manifest themselves concurrently. In contrast, in the prior art, calculation of one output value is generally completed before the next is started. This difference is not disadvantageous, however, since the next step in analysis of output values is typically to identify the lowest one. It does not matter, therefore, if the values of output values manifest themselves serially or concurrently, since identification of the lowest value cannot be made until all output values are known.
While the inventive correlator is used for image processing (two dimensions) in a preferred embodiment, there is no reason why its principles will not apply to n-dimensional problems.
The architecture of the inventive correlator is, in a preferred embodiment, an array of correlation cells each containing a delay pipe, a math unit and an accumulator. An array of these correlation cells are tiled together to allow simultaneous processing by all cells. The array is disposed so that each cell accumulates an output value in a result surface. There is no electrical limit to the number of correlation cells that may be tiled together. A preferred embodiment uses nine cells tiled together into a 3xc3x973 correlation result surface. Other embodiments have been tested in accordance with the present invention having twenty-five cells tiled together into a 5xc3x975 correlation result surface.
A stream of compare pixel values is presented to the array wherein each compare pixel value is presented to each cell concurrently. A reference memory supplies the appropriate reference pixel values to the cells to enable all calculations for that compare pixel value to be done concurrently. The results of those calculations are summed in each cell""s accumulator. The process is repeated for each compare pixel value in the stream. When all compare pixel values in the stream have been processed, the values in the accumulators are compared. Generally, the lowest value is accepted as the correlation value.
It is therefore a technical advantage to speed up processing of correlation calculations by executing n calculations concurrently, where n is the number of output values expected in the result surface.
It is a further technical advantage of the present invention to speed up processing of correlation calculations by presenting compare pixel values in a stream to calculation units concurrently, sets of appropriate corresponding reference pixel values also presented to the calculation units synchronously in a stream. Such architecture enables simultaneous calculation and accumulation of output values in a streaming fashion.
It is a yet further advantage of the present invention to reduce hardware requirements for correlation by obviating the need for a designated memory region to store a frame of compare pixel values while calculation of a correlation result is in progress. By using an array of calculation cells in accordance with the present invention, the architecture may advantageously be embodied entirely on-chip in a digital signal processor (DSP).
It is a still further advantage of the present invention to optimize reference memory resources when correlating according to the invention. Reference pixel values may be xe2x80x9cpassedxe2x80x9d, when appropriate for a calculation, from one calculation cell to the next, requiring less than a complete refresh of all cells from reference memory each time a new compare pixel value is presented to all cells.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and the specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.