1. Field of the Invention
Methods and apparatuses consistent with the present invention relate to video compression, and more particularly, to improving a compression efficiency of motion vectors of an unsynchronized picture by efficiently predicting the motion vectors using motion vectors of a lower layer.
2. Description of the Prior Art
Currently, with the advancements in information and communication technologies that include the Internet, communications supporting multimedia contents are fast increasing along with text-messaging and voice communication. The existing text-based communication systems are thus far insufficient to meet consumers' diverse needs, and multimedia services that can deliver various forms of information such as texts, images, music, and others, are increasing. Since multimedia data is typically massive in its content, a large storage medium and a wide bandwidth are required for storing and transmitting multimedia data. Accordingly, compression coding techniques are generally applied to transmit multimedia data including texts, image and audio data.
Generally, data compression is applied to remove data redundancy. Here, data can be compressed by removing spatial redundancy such as a repetition of the same color or object in pictures, temporal redundancy such as a little or no change in adjacent frames of moving pictures or a continuous repetition of sounds in audio, and a visual/perceptual redundancy, which considers human visual and perceptive insensitivity to high frequencies. In conventional video encoding methods, the temporal redundancy is removed by a temporal prediction based on motion compensation, while the spatial redundancy is removed by a spatial transform.
After removing the redundancies, multimedia data is transmitted over a transmission medium or a communication network, which may differ in terms of performance, as existing transmission mediums have varying transmission speeds. For example, an ultra high-speed communication network can transmit several tens of megabits of data per second, while a mobile communication network has a transmission speed of 384 kilobits per second. In order to support the transmission medium in such transmission environments and to transmit multimedia data stream with a transmission rate suitable for a transmission environment, a scalable video encoding method is implemented.
Such a scalable video encoding method makes it possible to truncate a portion of compressed bit stream and to adjust the resolution, frame rate and signal-to-noise ratio (SNR) of a video corresponding to the truncated portion of the bit stream. With respect to the scalable video coding, MPEG-4 Part 10 has already progressed its standardization work. Particularly, much research for implementing scalability in a video encoding method based on a multilayer has already been carried out. As an example of such a multilayered video coding, a multilayer structure is composed of a base layer, a first enhancement layer and a second enhancement layer, and the respective layers have different resolutions QCIF, CIF and 2CIF, and different frame rates.
Similarly with a single layer-based coding, in a multilayer-based coding, it is required to obtain motion vectors (MVs) on a layer basis to remove the temporal redundancy. The motion vectors may be separately searched and used for each layer, or may be searched in one layer and used (as they are or after being up/down-sampled) in other layers. The former case has both an advantage of searching and obtaining exact motion vectors and a disadvantage of serving the motion vectors generated for each layer as an overhead. Accordingly, in the former case, it is important to remove the redundancy between the motion vectors for the respective layers more efficiently.
At present, Joint Video Team (JVT), a cooperation between Moving Picture Experts Group (MPEG) of International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) and Video Coding Experts Group (VCEG) of International Telecommunications Union (ITU), has been researching a moving picture coding method that is the kernel of the next generation multimedia service. In particular, regarding the scalable video coding, a draft document of “Joint Scalable Video Model (JSVM) 1.0 document (Hong Kong, January, 2005)” has already been prepared and continuous technological supplements based on the draft document are under progress.
The JSVM 1.0 standard uses the scalable video coding method using a multilayer. However, an H. 264 method is adopted as an encoding method for each layer constituting the multilayer, and motion compensated temporal filtering (MCTF) is adopted as a method for embodying a temporal scalability in each layer.
FIG. 1 illustrates an example of a scalable video coding structure having two layers.
In FIG. 1, a white tetragon indicates a low frequency picture, and a black tetragon indicates a high frequency picture. In the coding structure, the upper layer has a frame rate of 30 Hz, and includes a plurality of temporal levels (four in number) according to a hierarchical MCTF separating process. In the same manner, a lower layer has a frame rate of 15 Hz, and includes temporal levels (three in number).
The JSVM 1.0 standard discloses a technique for predicting motion vectors of an a picture of an upper layer using a picture of a lower layer which has a temporal position, i.e., a picture order count (POC), which is consistent with the POC of any one of the upper layer pictures. For example, motion vectors of high frequency pictures 15 and 16 of the upper layer of FIG. 1 can be efficiently predicted from motion vectors of high frequency pictures 17 and 18 of the lower layer each having the same temporal position. Since they have the same temporal positions, their motion vectors can also be expected to be similar to each other.
Although the motion vectors of the pictures having corresponding lower layer pictures such as the pictures 15 and 16 (hereinafter referred to as “synchronized pictures”) can be efficiently predicted using the motion vectors of lower layer pictures, it is difficult apply the above-described motion vector prediction method to the pictures having no corresponding lower layer pictures such as high frequency pictures 11, 12, 13 and 14 existing at temporal level 3 (hereinafter referred to as “unsynchronized pictures”). Thus, only a method of independently encoding motion vectors and a method of encoding motion vectors using the spatial relationship between the motion vectors are being used.