1. Field
The invention relates generally to coding of multimedia data, and more specifically to coding of video data.
2. Background
Due to the explosive growth and great success of the Internet and wireless communication coupled with increasing demand for multimedia services, streaming multimedia over the Internet or wireless channels has drawn tremendous attention. For example, multimedia data such as video data are transmitted by a network and can be streamed by one or more clients such as mobile phones and televisions. The transmission mode can be either uni-cast or multi-cast. In the case of wireless communication systems, the air interface could be implemented by using one of the following technologies: a code division multiple access (CDMA), a frequency division multiple access (FDMA) an orthogonal frequency division multiple access (OFDMA), a time division multiple access (TDMA), a Global System for Mobile Communication (GSM) and a wideband CDMA (WCDMA).
Prior to their transmission, video data are coded. Many standards of video coding exist and some of them are MPEG-2, MPEG-4, H.263, H.264 and the like. Video data comprise three types of frames—I frames (intraframes), P frames (predicative frames) and B frames (bi-directional frames).
Turning first to I frames, they are coded without reference to any other frames. That is, they are coded using just the information in the frame itself, in the same way still images are coded by, for example, using the discrete cosine transform (DCT), quantization, run-length encoding and so on. This is called intracoding. There are generally one or two I frames associated with each second of video data. Complex frames are encoded as I frames.
With respect to P and B frames, both are coded with reference to the previous frame, that is, they are intercoded. P frames are coded with reference to a previous frame, called forward prediction. B frames are coded with reference to one or both of the previous frame (forward prediction) and the next frame (backward prediction). Use of forward, backward or both forward and backward predictions allows less bits to be used for coding because only changes from one frame to the next get coded.
Furthermore, in video coding B frames are introduced to provide better functionalities such as temporal scalability and coding efficiency. B frames could use motion compensated prediction from their neighboring past and future frames as explained above. These reference frames are encoded and then reconstructed before the B frames. Each block, e.g., a 16×16 block of pixels or macroblock (MB), in the B frame could use prediction from either direction or both directions and thus these options provide temporal scalability. Coding efficiency is achieved because only the residual data or data difference between the B frame and a reference frame determined after the prediction will be transformed, quantized and coded.
To effectively code multimedia data such as a video frame, an appropriate quantization parameter determination is needed for coding such video frame.