1. Technical Field
The present invention relates to image processing and enhancement by fusion of plural image frames. The invention performs frame fusion on a pixel-by-pixel basis by estimating velocities and occlusions between two frames. For each pixel, the possible matchings are those that minimize changes in a selected parameter of the image (generally the grey-level).
2. Background Art
Finding velocities in a sequence of images requires following points along their motion in the image. That is, when not occluded, the pixels of the current image are associated with the pixels of the next image. This association is based on the relative constancy of a given quantity estimated from the images at each pixel. In general, this quantity is the grey level value of the pixel, since it does not present large variation during motion. But, it might be defined on other measurements such as the curvature, the gradient and so forth. Given a velocity, one can define a measure that can say whether or not this velocity is accurate. We call this measure the xe2x80x9cerrorxe2x80x9d. The error is based on the variation of the quantity along the motion defined by the velocity. Possible velocities will have small errors attached. Estimating the velocities in this case consists of finding velocities that have small errors. Unfortunately, this property is not enough to define a unique velocity field. Indeed, there might exist in the next frame many points having the same grey level (or other selected quantity) as those of a given point of the current frame. This is the well-known aperture problem, which must be solved in order to find the velocities. The probability of matching plural points in the image with the same velocities decreases by the number of points. Many techniques try to exploit this observation. For example, the well-known correlation technique tries to match by neighborhood (generally defined by a square). But, this arbitrary neighborhood might be too large and therefore mix points having different velocities, or conversely too small to solve the aperture problem. The neighborhood around each point should be composed of only the points that move with same velocity, which set of points shall be referred to in this specification as a xe2x80x9cregionxe2x80x9d. The problem is then that such xe2x80x9cregionsxe2x80x9d are usually defined by velocities while being relied upon to provide an estimate of these same velocities.
A scene or image can include moving objects. Recovering the velocities requires performing a partitioning of the scene into objects (regions) and attributing to each region a model of velocity. The following sub-problems are easy to solve: (a) Given the velocities find the regions; and (b) Given the regions find the velocities. Unfortunately, in order to solve the entire problem exactly, one has to find regions and velocities simultaneously. Conventional approaches are based on the sequential use of techniques which solve one of the sub-problems stated above. The dominant motion approach involves processing a sequential estimation of the dominant motion, and the extraction of the attached region. Therefore this approach uses techniques that solve the first sub-problem on velocities that are obtained based upon the assumption of a dominant motion. A technique disclosed in Bouthemy et al., xe2x80x9cMotion segmentation and qualitative dynamic scene analysis from an image sequencexe2x80x9d, The International Journal of Computer Vision Vol. 10, No. 2, pages 157-182, April 1993, employs sequential use of techniques which solve, alternately, the first and then the second sub-problem. This sequence of processes is not proved to converge, and requires a good initialization of both region and velocity estimates. A technique disclosed in Schweitzer, xe2x80x9cOccam algorithms for computing visual motionxe2x80x9d, IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 17, No. 11, pages 1033-1042 (1995) employs a similar sequential process, but uses a splitting algorithm where regions are rectangles. This latter technique is sure to converge, but suffers from the over-simplification of the description of a region as a rectangle. Another disadvantage is that the initialization as one region for the entire picture might lead to a fixed point far from the solution. The aim of this latter technique is not necessarily to find the xe2x80x9cgoodxe2x80x9d velocities, but to find the best mapping in terms of compression. The problems of these techniques is that when solving the first sub-problem, they try to find velocities from unknown regions (and therefore possibly erroneous), and when solving the second sub-problem they try to find regions from unknown velocities.
Many techniques dealing with the problem of finding a unique global motion of a scene have been developed successfully. Even if all of these techniques can not be applied in general to recover multiple motions, some attempts have been proposed in some particular cases. The most significant example is the technique of publication of Bouthemy et al. referred to above. The hypothesis of a dominant image motion proposed in Cloutier et al., xe2x80x9cSegmentation and estimation of image motion by a robust methodxe2x80x9d, Proc. IEEE pages 805-809 (1995), assumes that the observed scene is made from moving objects having very different sizes (for example a little object and a large background.) A least median of squares estimators based on optical flow constraints is performed on the entire image to extract the model of the dominant motion. Then, the first subproblem is solved according to the knowledge of the dominant velocity: the region corresponding to the dominant motion is found. Once this dominant object has been detected, it is removed from the region of analysis, and the same process is repeated on the remaining part of the image. Two limitations on the use of this technique are: first, the underlying hypothesis is in general too restrictive for a real sequence, and, secondly, the link between dominant motion and dominant object must be investigated. Indeed, once the dominant motion has been computed, one has to decide for each point whether or not it moves according to the dominant motion and therefore whether or not it belongs to the dominant object. This decision is made by local estimates, around each pixel, and by an a priori thresholding, and therefore is very sensitive to noise.
Bouthemy et al.""s Motion Segmentation
Bouthemy et al. assume in their publication cited above that they initially have a segmentation of the velocities (for example obtained by dominant motion approach), and they propose a technique to improve its quality. They start their algorithm with the segmentation Ri, Vi, where Vi is the velocity model associated to the region Ri. Then, they make the boundary of the region move in order to decrease an energy which balances the matching error with the length of the boundaries. They recompute the velocity within the region when a significant change of shape of the region occurs. The initial velocity is used for initialization of the new estimation. Their algorithm suffers many problems. First, the initial segmentation has to be near the solution. Therefore their algorithm has to be seen as a way to improve the quality of velocity estimate rather than an algorithm that calculates the velocity. Secondly, the algorithm is not proved to converge. Moreover, it is very sensitive to local extrema. Thirdly, it attributes one (and only one) velocity to each region, and the segmentation of the region is based on these velocities. It is in a sense a segmentation from velocities estimation, whereas it should be velocity estimate from a segmentation. Finally, the occlusions are not taken into account.
Schweitzer""s Algorithm
The publication by Schweitzer cited above formulates the problem of motion estimation as a search for a function that can accurately predict frames. It balances the velocity field based upon determinations of (a) how good the prediction is and (b) how simple it is. The first requirement is measured as usual by the error terms. The simplicity of the vector field is set by Schweitzer in terms of encoding length. His algorithm is based on a segmentation procedure by splitting rectangles. Each rectangle is split horizontally or vertically into two other rectangles if the splitting increases the quality of the prediction more than a cost based on the increase of the complexity (appearance of a new rectangular region). Unfortunately, given a rectangle, the location of the split or boundary is problematic. In the algorithm of Schweitzer, one needs estimates of the velocities for each point in the rectangles. And, the segmentation depends on the pre-calculated velocities. Finally, the rectangle-based segmentation might not be sufficient to take into account non-rectangular objects.
Morel et al.""s Grey-Scale Segmentation of Images
A gray-scale segmentation technique disclosed in Morel et al., xe2x80x9cVariational methods in image segmentationxe2x80x9d, in H. Brezis, editor, Progress in Nonlinear Differential Equations and Their Applications, Birkhauser, 1995 which produces a piece-wise constant image that approximates the original image. The approximation is scaled: the larger the scale, the bigger the regions (the pieces of the segmentation). They propose to balance the quality of the approximation (which is measured by the grey-level difference between the original image and its approximation) by the complexity of the approximation (measured by the total length of the boundaries). They initialize the process by considering each pixel as a region. Then they merge region""s if the merging decreases the following energy:
E=∫(xcexc(x)xe2x88x92xcexc0(x))2+xcexLength(Bxcexc).
where xcexc0 denotes the original image, xcexc its piece-wise constant approximation, Bxcexc the boundaries of the regions of xcexc, and xcex a scale parameter. The algorithm ends when merging is no longer possible. Of course Morel et al.""s algorithm for segmenting grey-scale images does not give any information about velocities.
The invention is embodied in a process for obtaining information from at least two image frames of a sequence of frames, each of the frames including an array of pixels, each pixel having an amplitude, one of the two frames being designated as a reference frame and the other being a non-reference frame, the process including:
(1) defining a set of velocities with which the motion of pixels between the two frames may be modeled;
dividing each one of the two frames into plural regions;
(2) determining an error for each one of at least some of the velocities by carrying out the following steps for each one of the regions and for each union of pairs of the regions:
(A) mapping each pixel of the non-reference frame into the reference frame in accordance with the one velocity,
(B) computing an error amount which is a function of a difference in pixel amplitude attributable to the mapping;
(C) designating a minimum one of the error amounts computed for the velocities as the error for the one velocity, whereby a respective error is associated with each of the regions and with each union of pairs of the regions without regard to velocity; and
(3) merging qualified ones of the regions by the following steps:
(A) computing for each pair of regions a merging scale which depends upon a gain including a function of (a) the sum of the errors of each pair of regions and (b) the error of the union of the pair of regions;
(B) merging each pair of the regions for which the merging scale meets a predetermined criteria.
The merging scale preferably depends also upon a cost including a function of (a) the sum of the lengths of the boundaries of each pair of regions and (b) the length of the boundary of the union of the pair of regions.
The step of determining an error for each one of at least some of the velocities can include determining an error for each one of all of the velocities.
The process can further include, after the step of merging:
erasing the individual pairs of regions which have been merged and defining their unions as individual regions; and
repeating the steps of (a) computing an error for each one of at least some of the velocities, (b) computing a merging scale and merging each pair of regions for which the merging scale meets the criteria, whereby the process includes plural repetitive iterations.
Preferably the step of determining an error for each one of at least some of the velocities includes determining the error for a limited set of the velocities, the limited set of the velocities corresponding to those velocities associated with the N smallest errors computed during a prior iteration of the process, wherein N is an integer.
If each limited set of velocites associated with the N smallest errors is different for different regions, then the step of determining an error includes:
designating as the maximum error for a given region the largest error computed for that region in any prior iteration of the process;
and the step of computing the merging scale includes determining for each pair of regions whether a velocity included in the limited velocity set of one of the regions is not included in the limited velocity set of the other of the pair of regions, and assigning as the corresponding error for the other region the maximum error.
The mapping includes computing a new pixel amplitude in accordance with a weighted average of pixel amplitudes mapped into the reference frame, wherein the weight of each pixel amplitude mapped into the reference frame is a decreasing function of the mapped pixel""s distance from a given pixel location in the reference frame.
The mapping step of mapping pixels from the non-reference frame to the reference frame is a forward mapping, and the process can further include determining which ones of the pixels are occluded by carrying out the following steps:
(I) determining which pixels were not matched from the non-reference frame to the reference frame by the merging step following the forward mapping step and removing the pixels not matched from the reference frame;
(II) performing the step of determining an error and the step of merging, except that the mapping step includes a backward mapping of mapping from the reference frame to the non-reference frame, the backward mapping step employing a version of the reference frame in which the pixels not matched have been removed;
(III) determining which pixels were not matched from the reference frame to the non-reference frame by the merging step following the backward mapping step and removing the pixels not matched from the non-reference frame;
(IV) comparing the pixels remaining in the reference frame with the pixels remaining in the non-reference frame, and repeating steps I, II and III if there is a difference beyond a predetermined threshold.
The process assigns a velocity to each remaining pixel in the non-reference frame and then adds the remaining pixels of the non-reference frame to the reference frame in accordance with the velocity assigned to each pixel of the non-reference frame to produce an enhanced frame.
The process can further include deblurring the image of the enhanced frame to produce a super frame.
The dividing step can initialize the regions so that each pixel is an individual region.
The model velocities include at least one of: the set of translational velocities, the set of rotational velocities, or the set of zooms.
Preferably, the unions of pairs of regions constitute unions of pairs of adjacent regions only.
The merging scale is computed as a ratio obtained by dividing the cost by the gain, and wherein the predetermined criteria includes a maximum scalar value of the ratio above which merging is disallowed. The scalar value is selected in a range between an upper limit at which the entire image is merged and a lower limit at which no pixels are merged.
The process further includes defining the set of velocities as a simple set during the first one of the iterations of the process, and supplementing the set of velocities with additional velocities as the size of the regions grows. Preferably, the simple set includes the set of translational velocities, and the additional velocities include the set of rotational velocities and the set of zoom velocities.
The reference and non-reference frames can lie in a moving sequence of frames depicting an image having motion, the process further including designating one of the sequence of frames as the reference frame and successively designating others of the sequency of frames as the non-reference frame, and performing all of the foregoing steps for each one of the successive designations of the non-reference frame, whereby the superframe contains information from all the frames of the sequence. Furthermore, the process can further include designating successive ones of the sequence of frames as the reference frame and repeating all of the foregoing steps for each designation of the reference frame so that a super frame is constructed for each one of the sequence of frames.
The step of assigning a velocity to each remaining pixel can be carried out by selecting the velocity for the region of that pixel having the minimum error.