1. Field of the Invention
The present invention relates to a video compression method, and more particularly to a method and an apparatus for elevating compression efficiency of a motion vector (MV) by effectively predicting a motion vector of an enhanced layer by means of a motion vector of a base layer in a video coding method employing a multi-layer structure.
2. Description of the Prior Art
With the development of technology of information and communication including the Internet, image communication as well as text and voice communication has increased. The existing text-based communication method cannot satisfy various requirements of a consumer. Accordingly, a multimedia service has increased, which can provide various types of information such as a text, an image, or a movie. Further, since multimedia data are of large quantities, a large capacity of storage medium and a wide transmission bandwidth are required for such data. Accordingly, in order to transmit multimedia data including text, images, and audio, it is necessary to use a compression coding method.
A basic principle to compress data is to eliminate redundancy of data. That is, data can be compressed by eliminating spatial redundancy, such as repetition of the same colors or objects in an image, temporal redundancy, such as no change of adjacent frames in a dynamic frame or continuous repetition of the same sound in an audio, or visual redundancy, considering that high frequencies are insensible to human eyesight and perception.
Currently, most video coding standards are based on a motion compensation prediction coding method. That is, temporal redundancy is eliminated by a temporal filtering based on a motion compensation and spatial redundancy is eliminated by a spatial transform.
In order to transmit multimedia data generated after redundancy of data is eliminated, a transmission medium is necessary. Herein, transmission performance changes according to a transmission medium. Transmission media currently used have various transmission speeds, from an ultra high speed communication network capable of transmitting data at a speed of several tens of Mbytes per second to a mobile communication network having a transmission speed of 384 kbits per second.
In such environments, in order to support a transmission medium having various transmission speeds or transmit multimedia data at a transmission rate suitable for transmission environments, a data coding method having scalability is more suitable.
Such scalability is a coding scheme which enables a decoder or a pre-decoder to perform a partial decoding with respect to one compressed bit stream, according to a condition such as a bit rate, an error rate, or system resources. The decoder or the pre-decoder can extract a portion of a bit stream coded by a coding method having such scalability and restore a multimedia sequence having a different picture quality, resolution, or frame rate.
Meanwhile, standardization work for scalable video coding is in progress by the Moving Picture Experts Group-21 (MPEG-21) part-13, and a wavelet-based scheme in a spatial transform method is recognized as a powerful method. Further, a technology proposed by a published patent application (US published number 2003/0202599 A1) of Philips, Co., Ltd has attracted considerable attention.
In addition, even a coding scheme, which does not use a wavelet-based compression method such as the conventional MPEG 4 or H.264, has achieved spatial and temporal scalability by employing a multi-layer structure.
Scalable video implemented as a single layer has scalable features focused only on the single layer. In contrast, in scalability employing a multi-layer structure, the scalability can be designed to obtain an optimum performance with respect to each layer. For instance, when a multi-layer structure is formed with a base layer, a first enhanced layer, and a second enhanced layer, the layers can be distinguished from each other according to a quarter common intermediate format (hereinafter, referred to as a QCIF), a common intermediate format (hereinafter, referred to as a CIF), or a 2CIF. Further, SNR scalability and temporal scalability can be accomplished in each layer.
However, since each layer has a motion vector (MV) to eliminate temporal redundancy, the bit budget of the motion vector considerably increases in comparison with one layer structure. Accordingly, the amount of a motion vector used in each layer takes a great portion of a bit budget assigned for an entire compression. That is, effectively eliminating redundancy for a motion vector of each layer has a great influence on the entire quality of video.
FIG. 1 is a view showing one example of a scalable video codec using a multi-layer structure. First, a base layer is defined as a QCIF, 15 Hz (frame rate), a first enhanced layer is defined as a CIF, 30 Hz, and a second enhanced layer is defined as a standard definition (SD), 60 Hz. When a CIF 0.5 M stream is required, only SNR is controlled by 0.5 M in a CIF—30 Hz—0.7 M of the first enhanced layer. In this way, spatial scalability, temporal scalability, and SNR scalability can be achieved. As shown in FIG. 1, since the number of motion vectors increase and thus an overhead of about twice as much as that of the existing scalability employing one layer occurs, motion prediction through a base layer is important.
However, the conventional motion prediction through a base layer in a multi-layer structure employs a method of compressing a difference of a motion vector obtained in each layer. Hereinafter, the conventional method will be described with reference to FIG. 2. In a video transmission having a low bit rate, when a bit, with respect to a motion vector, the size and the position of a variable block to perform a motion prediction, and information (hereinafter, referred to as motion information) regarding a motion prediction, etc., determined according to such a variable block, is saved, and this saved bit is assigned to texture information, picture quality may be improved. Accordingly, when the motion information is also layered after a motion prediction and the layered information is transmitted, picture quality may be improved.
In a motion prediction using a variable block size, a 16 by 16 macroblock may be used as a basic unit of prediction. Herein, each macroblock may be constructed by a combination of a 16 by 16, a 16 by 8, an 8 by 16, an 8 by 8, an 8 by 4, a 4 by 8, and a 4 by 4. Further, a corresponding motion vector may be obtained according to various pixel accuracies such as 1 pixel accuracy, ½ pixel accuracy, or ¼ pixel accuracy. Such motion vectors can be layered and achieved according to the following steps.
First, a motion search of a 16 by 16 block size is performed according to 1 pixel accuracy. A generated motion vector becomes a base layer of a motion vector. FIG. 2 shows a motion vector 1 of a macroblock in the base layer.
Second, a motion search of a 16 by 16 block size and an 8 by 8 block size is performed according to ½ pixel accuracy. A difference between a motion vector searched through the motion search and the motion vector of the base layer is a motion vector difference of a first enhanced layer, and this value is transmitted to a decoder afterward. Motion vectors 11 to 14 as shown in FIG. 2 are obtained by determining a variable block size in the first enhanced layer and finding motion vectors for the determined block size. However, actual transmitted values are difference obtained by subtracting the motion vector 1 of the base layer from the motion vectors 11 to 14. That is, referring to FIG. 3, the motion vector difference of the first enhanced layer becomes vectors 15 to 18.
Third, a motion search of all sub-block sizes is performed according to ¼ pixel accuracy. A difference between a value, which is obtained by adding the motion vector 1 of the base layer and the motion vector difference of the first enhanced layer, and a motion vector searched through the motion search becomes the motion vector difference of the second enhanced layer, and this value is transmitted. For instance, a motion vector difference in a macroblock A is a value obtained by subtracting a difference vector 14 from a difference vector 142 and this value is equal to a value obtained by subtracting a sum of a difference vector 18 and a difference vector 1 from the difference vector 142.
Lastly, motion information of the three layers is respectively encoded.
As shown in FIG. 2, original motion vectors are divided into vectors in three layers. Frames having motion vectors are divided into frames of a base layer and enhanced layers as described above. Accordingly, the entire motion vector information is organized into a group as shown in FIG. 1. In this way, the base layer becomes motion vector information having the highest priority, and it is a component which must necessarily be transmitted.
Accordingly, a bit rate of the base layer must be smaller than or equal to a minimum bandwidth supported by a network and a transmission bit rate of both the base layer and the enhanced layers must be smaller than or equal to a maximum bandwidth supported by the network.
In order to cover a wide range of a spatial resolution and a bit rate, when the aforementioned method is employed, proper vector accuracy is determined according to the spatial resolution, thereby achieving scalability for motion information.
As described above, in order to effectively compress the motion vector of the enhanced vector, a motion prediction is performed by means of the motion vector of the base layer. Since this prediction is an important factor for reducing bits used in a motion vector, it has an important influence on compression performance.
However, the conventional method does not use correlation with adjacent motion vectors, simply obtains only difference with a motion vector of a lower layer, and encodes the obtained difference. Accordingly, a prediction is not performed well, and thus difference of a motion vector in an enhanced layer increases, thereby having a negative influence on compression performance.