1. Technical Field
The present invention relates to image processing and enhancement by fusion of plural image frames. The invention performs frame fusion on a pixel-by-pixel basis by estimating velocities and occlusions between two frames. For each pixel, the possible matchings are those that minimize changes in a selected parameter of the image (generally the grey-level).
2. Background Art
Finding velocities in a sequence of images requires following points along their motion in the image. That is, when not occluded, the pixels of the current image are associated with the pixels of the next image. This association is based on the relative constancy of a given quantity estimated from the images at each pixel. In general, this quantity is the grey level value of the pixel, since it does not present large variation during motion. But, it might be defined on other measurements such as the curvature, the gradient and so forth. Given a velocity, one can define a measure that can say whether or not this velocity is accurate. We call this measure the "error". The error is based on the variation of the quantity along the motion defined by the velocity. Possible velocities will have small errors attached. Estimating the velocities in this case consists of finding velocities that have small errors. Unfortunately, this property is not enough to define a unique velocity field. Indeed, there might exist in the next frame many points having the same grey level (or other selected quantity) as those of a given point of the current frame. This is the well-known aperture problem, which must be solved in order to find the velocities. The probability of matching plural points in the image with the same velocities decreases by the number of points. Many techniques try to exploit this observation. For example, the well-known correlation technique tries to match by neighborhood (generally defined by a square). But, this arbitrary neighborhood might be too large and therefore mix points having different velocities, or conversely too small to solve the aperture problem. The neighborhood around each point should be composed of only the points that move with same velocity, which set of points shall be referred to in this specification as a "region". The problem is then that such "regions" are usually defined by velocities while being relied upon to provide an estimate of these same velocities.
A scene or image can include moving objects. Recovering the velocities requires performing a partitioning of the scene into objects (regions) and attributing to each region a model of velocity. The following sub-problems are easy to solve: (a) Given the velocities find the regions; and (b) Given the regions find the velocities. Unfortunately, in order to solve the entire problem exactly, one has to find regions and velocities simultaneously. Conventional approaches are based on the sequential use of techniques which solve one of the sub-problems stated above. The dominant motion approach involves processing a sequential estimation of the dominant motion, and the extraction of the attached region. Therefore this approach uses techniques that solve the first sub-problem on velocities that are obtained based upon the assumption of a dominant motion. A technique disclosed in Bouthemy et al., "Motion segmentation and qualitative dynamic scene analysis from an image sequence", The International Journal of Computer Vision Vol. 10, No. 2, pages 157-182, April 1993, employs sequential use of techniques which solve, alternately, the first and then the second sub-problem. This sequence of processes is not proved to converge, and requires a good initialization of both region and velocity estimates. A technique disclosed in Schweitzer, "Occam algorithms for computing visual motion", IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 17, No. 11, pages 1033-1042 (1995) employs a similar sequential process, but uses a splitting algorithm where regions are rectangles. This latter technique is sure to converge, but suffers from the over-simplification of the description of a region as a rectangle. Another disadvantage is that the initialization as one region for the entire picture might lead to a fixed point far from the solution. The aim of this latter technique is not necessarily to find the "good" velocities, but to find the best mapping in terms of compression. The problems of these techniques is that when solving the first sub-problem, they try to find velocities from unknown regions (and therefore possibly erroneous), and when solving the second sub-problem they try to find regions from unknown velocities.
Many techniques dealing with the problem of finding a unique global motion of a scene have been developed successfully. Even if all of these techniques can not be applied in general to recover multiple motions, some attempts have been proposed in some particular cases. The most significant example is the technique of publication of Bouthemy et al. referred to above. The hypothesis of a dominant image motion proposed in Cloutier et al., "Segmentation and estimation of image motion by a robust method", Proc. IEEE pages 805-809 (1995), assumes that the observed scene is made from moving objects having very different sizes (for example a little object and a large background.) A least median of squares estimators based on optical flow constraints is performed on the entire image to extract the model of the dominant motion. Then, the first subproblem is solved according to the knowledge of the dominant velocity: the region corresponding to the dominant motion is found. Once this dominant object has been detected, it is removed from the region of analysis, and the same process is repeated on the remaining part of the image. Two limitations on the use of this technique are: first, the underlying hypothesis is in general too restrictive for a real sequence, and, secondly, the link between dominant motion and dominant object must be investigated. Indeed, once the dominant motion has been computed, one has to decide for each point whether or not it moves according to the dominant motion and therefore whether or not it belongs to the dominant object. This decision is made by local estimates around each pixel, and by an a priori thresholding, and therefore is very sensitive to noise.
Bouthemy et al.'s Motion Segmentation
Bouthemy et al. assume in their publication cited above that they initially have a segmentation of the velocities (for example obtained by dominant motion approach), and they propose a technique to improve its quality. They start their algorithm with the segmentation R.sub.i, V.sub.i, where V.sub.i is the velocity model associated to the region R.sub.i. Then, they make the boundary of the region move in order to decrease an energy which balances the matching error with the length of the boundaries. They recompute the velocity within the region when a significant change of shape of the region occurs. The initial velocity is used for initialization of the new estimation. Their algorithm suffers many problems. First, the initial segmentation has to be near the solution. Therefore their algorithm has to be seen as a way to improve the quality of velocity estimate rather than an algorithm that calculates the velocity. Secondly, the algorithm is not proved to converge. Moreover, it is very sensitive to local extreme. Thirdly, it attributes one (and only one) velocity to each region, and the segmentation of the region is based on these velocities. It is in a sense a segmentation from velocities estimation, whereas it should be velocity estimate from a segmentation. Finally, the occlusions are not taken into account.
Schweitzer's Algorithm.
The publication by Schweitzer cited above formulates the problem of motion estimation as a search for a function that can accurately predict frames. It balances the velocity field based upon determinations of (a) how good the prediction is and (b) how simple it is. The first requirement is measured as usual by the error terms. The simplicity of the vector field is set by Schweitzer in terms of encoding length. His algorithm is based on a segmentation procedure by splitting rectangles. Each rectangle is split horizontally or vertically into two other rectangles if the splitting increases the quality of the prediction more than a cost based on the increase of the complexity (appearance of a new rectangular region). Unfortunately, given a rectangle, the location of the split or boundary is problematic. In the algorithm of Schweitzer, one needs estimates of the velocities for each point in the rectangles. And, the segmentation depends on the pre-calculated velocities. Finally, the rectangle-based segmentation might not be sufficient to take into account non-rectangular objects.
Morel et al.'s Grey-Scale Segmentation of Images
A gray-scale segmentation technique disclosed in Morel et al., "Variational methods in image segmentation", in H. Brezis, editor, Progress in Nonlinear Differential Equations and Their Applications, Birkhauser, 1995 which produces a piece-wise constant image that approximates the original image. The approximation is scaled: the larger the scale, the bigger the regions (the pieces of the segmentation). They propose to balance the quality of the approximation (which is measured by the grey-level difference between the original image and its approximation) by the complexity of the approximation (measured by the total length of the boundaries). They initialize the process by considering each pixel as a region. Then they merge regions if the merging decreases the following energy: EQU E=.intg.(u(x)-u.sub.o (x)).sup.2 +.lambda.Length(B.sub.u)
where u.sub.o denotes the original image, u its piece-wise constant approximation, B.sub.u the boundaries of the regions of u, and .lambda. a scale parameter. The algorithm ends when merging is no longer possible. Of course Morel et al.'s algorithm for segmenting grey-scale images does not give any information about velocities.