An observable scene in the real physical world can be represented as temporally and spatially related video data in a memory of a computer system. The video data represent the optical flow of the scene over time and space as varying light intensity values.
Efficient integration and representation of the video data is a key component in video encoding schemes, for example MPEG encoding. Video encoding can reduce the amount of memory used to store the data, as well as reduce the time required to process and communicate the data.
Scenes are generally composed of foreground and background components. Typically, the optical flow of the smaller foreground is in motion with respect to the larger and relatively static background. For example, in a stadium scene, a player or players moves back and forth against the background of the grandstands while the camera pans and zooms following the player. Various parts of the background are revealed as the player moves in the foreground.
One of the more pertinent problems of video encoding deals with encoding uncovered background of scenes. Classical motion compensated prediction schemes are unable to predict newly revealed background areas, and therefore, encoding is inefficient.
To overcome this problem, background memory techniques are known. More precisely, these techniques identify still regions of the scene as background which can be stored in a long-term memory. Whenever a background area is uncovered, and providing that the background area has previously been observed, data unavailable with classical prediction techniques can be retrieved from the background memory.
These techniques are effective for video-conferencing or video-phone sequences which are typically characterized by a still background. However, the model of a static background does not hold true for more complex scenes which include camera motion, e.g., panning or zooming, and multiple moving objects.
In order to integrate temporal information, a mosaic representation, also referred to as a salient still, has been shown to be efficient Basically, these techniques estimate the camera motion using global motion estimation, and align successive images, e.g., frames, in a video sequence by cancelling contributions due to camera motion. A mosaic is built by temporally integrating the aligned frames or images in a memory. In this way, the mosaic captures the information in multiple aligned frames of a video sequence.
However, these techniques are typically applied without distinction between the background and foreground of the scene. That is, the camera motion is only representative of the background motion. Therefore, the foreground appears blurred. Furthermore, as the foreground is integrated into the mosaic representation, the problem of uncovered background remains unsolved. Also, as far as video encoding is concerned, no residual signals are transmitted, leading to noticeable coding artifacts.
Sprites are well-known in the field of computer graphics. Sprites correspond to synthetic video objects which can be animated, and overlaid onto a synthetic or natural scene. For instance, sprites are widely used in video games. More recently, sprites have been proposed for video encoding.
The sprite can either be a synthetic or natural object determined from a sequence of images. In the latter case, the technique is applied to a video object whose motion can be modeled by a rigid body motion. Since the technique is applied to a coherently moving body, instead of the entire frame, some of the problems of frame aligned mosaicking are alleviated.
Generally, these techniques require that the sprite is identified and available before the encoding of a sequence of images begins. The sprite can be encoded using intraframe coding techniques. The encoded sprite can then be transmitted along with rigid body motion information to a decoder. The decoder can warp the sprite using the motion information to render the scene.
In the case of natural scenes, an analysis stage is required prior to encoding in order to build the static sprite. During the analysis, segmentation, global estimation and warping are performed on the video data of the frames. As a result, this method introduces a very significant delay, since a large number of frames needs to analyzed to build the static sprite. For many real-time encoding applications this delay is unacceptable.
Taking the above into consideration, it is desired to provide a method and apparatus which can dynamically build a sprite in a memory. Furthermore, the method should be suitable for real-time video data encoding.