Video sequences typically comprise a series of non interlaced frames of video data, or a series of interlaced fields of video data. Interlaced sequences are produced by fields which carry data on alternate lines of a display, such that a first field will carry data for alternate lines, and a second field will carry data for the missing lines. The fields are thus spaced both temporally and spatially. Every alternate field in a sequence will carry data at the same spatial locations. A pair of interlaced field may be referred to as a frame and a non-interlaced field may also be referred to as a frame. Embodiment of this invention will be described in terms of a non-interlaced system, but the invention is equally applicable to systems processing either interlaced or non-interlaced video data.
Identification of motion in video sequences is well known and results in a motion vector for each pixel, or group of pixels that describes the motion between one frame and another. A well-known motion estimation method based on block matching techniques will be used to illustrate the invention, although other types of motion estimator would be equally suitable. Block based motion estimation methods generally consider two consecutive frames from a video sequence and subdivide them into multiple regions known as blocks or macroblocks. In a motion search procedure, each block is compared with pixel data from various candidate locations in the previous frame. The relative position of the best match gives a vector that describes the motion in the scene at that block position. Collectively, the set of motion vectors at each block position in a frame is known as the motion vector field for that frame. Note that use of the term “vector field” should not be confused with the use of “field” or “video field” to describe the data in an interlaced video sequence.
FIG. 1 illustrates a typical example of a block matching motion estimator. In all figures, including FIG. 1, motion vectors are shown with the head of the arrow at the centre of the block to which the vector corresponds. The frames are divided into blocks, and an object 101 in the previous frame has moved to position 102 in the current frame. The previous position of the object is shown superimposed on the current frame as 103. Motion estimation is performed for blocks rather than for objects, where a block of pixels in the current frame is matched with a block sized pixel area in the previous frame which is not necessarily block aligned. For example, block 104 is partially overlapped by the moving object 102, and has contents as illustrated at 105. Motion estimation for block 104, if it performs well, will find the pixel data area 106 in the previous frame, which can also be seen to contain the pixels illustrated in 105, i.e. a good match has been found. Superimposed back onto the current frame, the matching pixel data area is at 107. The motion vector associated with block 104 is therefore as illustrated by arrow 108.
Many block based motion estimators select their output motion vector by testing a set of motion vector candidates with a method such as a sum of absolute differences (SAD) or mean of squared differences (MSD), to identify motion vectors which give the lowest error block matches. FIG. 2 illustrates the candidate evaluation process for the block 201 in the current frame which has pixel contents shown in 211. In this simple example system, three motion vector candidates 206, 207 and 208 are considered which correspond to candidate pixel data areas at locations 202, 203 and 204 in the previous frame. The pixel contents of these pixel data areas can be seen in 212, 213 and 214 respectively. It is apparent that the pixel data at location 202 provides the best match for block 201 and should therefore be selected as the best match/lowest difference candidate. Superimposed back onto the current frame, the matching pixel data area is at 205 and the associated motion vector is 206.
Different systems have different requirements of the motion estimation. In a video encoder, the requirement is to form the most compact representation of a frame, by reference to a previous frame from the sequence. The requirement is generally to find motion vectors which give the lowest error block matches, and while the resulting motion vectors are usually representative of the actual motion of objects in the scene, there is no requirement that this is always the case. In other applications, such as de-interlacing or frame rate conversion, it is more important that the motion vectors represent the true motion of the scene, even if other distortions in the video mean that the block matches do not always give the lowest error. By applying appropriate constraints to the candidate motion vectors during motion search, the results can be guided towards “lowest error” or “true motion” as necessary.
Frame rate conversion systems are also well known. In such a system, a sequence of input fields or frames at an input frame rate is changed to a different output frame rate. Conventionally this is done by repeating and/or dropping frames until the desired output frame rate is achieved. An alternative method that results in smoother output motion is called motion compensated frame interpolation. In a motion compensated frame interpolation system, motion estimation is used to determine motion vectors that represent the true motion of objects in a scene and these motion vectors are used to create additional frames in which moving objects are represented at the correct intermediate positions. We describe the temporal position of a frame, with respect to the original frames of the input sequence, as the time instance of the frame. For a frame rate doubling application, where an interpolated frame is required at the temporal midpoint between two input frames at t=−1 and t=0, the time instance of the interpolated frame is at t=−0.5.
Motion compensated frame interpolation systems require the creation of intermediate frames between the two input frames. The temporal position at which each interpolated frame must be output is known from the input and output frame rates and is called the “ideal time instance” of the frame. In a conventional motion compensated frame interpolation system, each interpolated frame is created using the ideal time instance to determine the intermediate positions of objects in the scene.
A block diagram of a frame rate conversation system is shown in FIG. 15. The system comprises a video input 1501 which goes to a memory 1502 to store the input frame history and also to a motion estimation unit 1503 which performs motion estimation by comparing the current input frame from the video input 1501 with a previous frame from the memory 1502. Motion vectors are sent to an interpolation unit 1504 which constructs an interpolated frame from the input frames and provides a video output 1507. Knowing the motion vectors allows the interpolation unit 1504 to place pixels such that objects in the interpolated frame appear in the appropriate positions according to the trajectory of their motion. A timing control unit 1505 calculates the ideal time instances for the interpolated output frames. An image analysis unit 1506 may also analyse the input frame data to detect events such as scene changes and cross-fades where motion estimation is known to struggle. In these situations it is preferable to set the ideal time instances to t=−1 and thereby repeat the original input frame.
FIG. 3 illustrates an example frame rate conversion system where three input frames are used to create an output sequence with double the number of output frames. Output 1 can be seen to be a direct copy of the frame provided as Input 1. Output 2 must be created at the temporal midpoint between the Input 1 time instance and the Input 2 time instance. Output 3 can be seen to be a direct copy of the frame provided as Input 2. Output 4 must be created at the temporal midpoint between the Input 2 time instance and the Input 3 time instance. Output 5 can be seen to be a direct copy of the frame provided as Input 3.
FIG. 4 illustrates an example frame rate conversion system where 3 input frames are used to create an output sequence with 2.5× the number of output frames. Output 1 can be seen to be a direct copy of the frame provided as Input 1. Output 2 must be created at a time instance ⅖ of the way between Input 1 and Input 2. Output 3 must be created at a time instance ⅘ of the way between Input 1 and Input 2. Input 2 is not shown in the output sequence. Output 4 must be created at a time instance ⅕ of the way between Input 2 and Input 3. Output 5 must be created at a time instance ⅗ of the way between Input 2 and Input 3. Output 6 can be seen to be a direct copy of the frame provided as Input 3.
Various schemes exist for creating each block of pixels in an interpolated output frame. Where motion estimation has identified a perfect match between areas of pixel data in the input frames, interpolation may be as simple as copying pixels from one of the input frames to a position in the interpolated frame that is determined by the motion vector and the time instance of the interpolated frame. Where the match identified by motion estimation is not perfect, it is necessary to generate the block of pixels in the interpolated output frame from some combination of the pixel data from the input frames. Detailed description of appropriate methods is beyond the scope of this application, but will be known to those skilled in the art. A simple method suitable for illustrating the concept of this invention is the use of a weighted blend between the motion compensated pixel data from the two adjacent input frames.
FIG. 5 illustrates a weighted blend interpolation system. Each pixel value in an interpolated pixel area can be considered to be an interpolation between the pixel values from the corresponding locations in the pixel data areas at the motion compensated positions in the adjacent input frames. In FIG. 5, an interpolation between a black pixel (A) from the previous input frame and a white pixel (B) from the current input frame is required. The interpolated pixel at an intermediate time instance between the input frames can be created by an appropriate weighted blend between the black pixel and the white pixel. A suitable weighting function is to use the fractional time instance of the interpolated frame relative to the time instances of the input frames. For example, if the interpolated frame's time instance is located one quarter of the temporal distance between the previous input frame (containing pixel A) and the current input frame (containing pixel B), the interpolated pixel should use a 75% contribution from pixel A and a 25% contribution from pixel B, i.e. a ¾:¼ blend between the pixel A and pixel B. If the interpolated pixel was created at the midpoint between the two frames, an equal contribution from each pixel would be used, i.e. a ½:½ blend between the pixel A and pixel B.
The quality of interpolated frames produced by motion compensated frame interpolation is largely dependent upon the performance of the motion estimator. In situations where the motion estimator performs badly, and produces non-true motion vectors, interpolation using these poor vectors causes artefacts. Poor motion vectors may arise due to complex or erratic motion, objects being occluded/revealed between input frames, aliasing, or objects being scaled or rotated so that they cannot be described by a simple translation of pixels from one area of the screen to another.
FIG. 6 illustrates a conventional motion compensated frame interpolation system that creates output frames at double the input frame rate. In this example, the previous input frame's time instance is t=−1 and the current input frame's time instance is t=0, i.e. 100% of the input frame time interval. The motion vector that describes the motion of the object from position 601 to position 602 during this interval is shown as 603. Other vectors in the motion vector field have been omitted for clarity. As interpolated frames must be output at the time instance halfway between the input frame time instances position, i.e. at 50% of the input frame time interval, the ideal time instance is known to be t=−0.5 and the object's position at that time instance can be interpolated at a position half way along the motion vector, as shown at 604. The pixel data required to represent the object in the block sized pixel area 614 is created by interpolating between the block sized pixel area 611 (contents shown as 611-A) and the block of pixels 612 (contents shown as 612-A). The 50% blend result determined by the ideal time instance, between 611-A and 612-A is shown as 614-A. This method works well when the motion vector describing the movement of the object over an input frame interval accurately describes the true motion of an object. The resulting objects are smoothly interpolated with linear position between their positions in the previous input frame to their positions in the current input frame.
FIG. 7 illustrates a conventional motion compensated frame interpolation system that creates output frames at double the input frame rate in a case where motion estimation has performed poorly. In this example, the previous input frame's time instance is t=−1 and the current input frame's time instance is t=0. The motion vector that describes the true motion of the object from position 701 to position 702 during this interval could not be accurately determined by the motion estimator (for some reason). In place of the true motion vector, the motion estimator introduced the non-true motion vectors shown as 703 and 704. Again, other vectors in the motion vector field have been omitted for clarity. As interpolated frames must be output at the time instance halfway between the input frames the ideal time instance is t=−0.5. The object will therefore be interpolated at positions half way along the motion vectors 703 and 704. Interpolating along the motion vector 703, a block sized pixel area 715 is created by interpolating between the block sized pixel area 711 (containing the object pixels as shown in 711-A) and the block of pixels 722 (containing the background pixels as shown in 722-A). The result of the 50% blend between block sized pixel areas 711-A and 722-A is shown as 715-A as a ½:½ blend between the object and the background pixels. Interpolating along the motion vector 704, a block sized pixel area 716 is created by interpolating between the block sized pixel area 721 (containing the background pixels shown in 721-A) and the block of pixels 712 (containing the object pixels shown in 712-A). The result of the 50% blend between block sized pixel areas 721-A and 712-A is shown as 716-A as a ½:½ blend between the object and the background pixels. The interpolated frame now includes two interpolated versions of the original object, with a different opacity to the object in the input frames, and located in the wrong position. Interpolation using non-true motion vectors therefore causes significant visual artefacts in interpolated frames.
FIG. 8 illustrates how motion compensated frame interpolation can be used to generate two intermediate frames between a pair of input frames, at time instances other than the midpoint between the two input frames. This is suitable for an application such as 2.5× frame rate conversion. Object 801 in the previous frame has moved to position 811 in the current frame. The previous object position is superimposed on the interpolated frames as 821 and 831, and the current position of the object is superimposed on the interpolated frames as 824 and 834. Motion estimation has performed well, and the motion vector describing the motion of the object between input frames is shown as 826 and 836. In order to increase the frame rate it is necessary to create interpolated frames at fractional points between the time instance of the previous input frame and the time instance of the current input frame. In the example shown of the first span in a 2.5× frame rate conversion, it is necessary to create interpolated frames at ⅖ and ⅘ of the time instance span (i.e. t=−0.6 and t=−0.2). Assuming linear interpolation of object position it can be expected that in the first interpolated frame, the object will have moved ⅖ of the distance along the path defined by the motion vector 826 at the ⅖ time instance between the input frames. The interpolated block sized area of pixels at position 823 in the first interpolated frame can therefore be created by interpolation between the block sized pixel area 802 and the pixels in block 812 with a blend of ⅗ from the previous input frame and ⅖ from the current input frame. Similarly, assuming linear interpolation of object position, it can be expected that in the second interpolated frame the object will have moved ⅘ of the distance along the path defined by the motion vector 836 at the ⅘ time instance between the input frames. The interpolated block sized area of pixels at position 833 in the second interpolated frame can therefore be created by interpolation between the block sized pixel area 802 and the pixels in block 812 with a blend of ⅕ from the previous input frame and ⅘ from the current input frame.
To improve visual quality of motion compensated interpolation systems it is desirable to minimize the visibility of artefacts that arise in regions of poor motion estimation.