Digital video sequences, like ordinary motion pictures recorded on film, comprise a sequence of still images, the illusion of motion being created by displaying the images one after the other, typically at a rate between 15 and 30 frames per second.
Each frame of an uncompressed digital video sequence comprises an array of image pixels. In a commonly used digital video format, known as the Quarter Common Interchange Format (QCIF), a frame comprises an array of 176×144 pixels (i.e. 25,344 pixels). In turn, each pixel is represented by a certain number of bits, which carry information about the luminance and/or colour content of the region of the image corresponding to the pixel. Commonly, a so-called YUV colour model is used to represent the luminance and chrominance content of the image. The luminance, or Y, component represents the intensity (brightness) of the image, while the colour content of the image is represented by two chrominance or colour difference components, labelled U and V.
Colour models based on a luminance/chrominance representation of image content provide certain advantages compared with colour models that are based on a representation involving primary colours (that is Red, Green and Blue, RGB). The human visual system is more sensitive to intensity variations than it is to colour variations and YUV colour models exploit this property by using a lower spatial resolution for the chrominance components (U, V) than for the luminance component (Y). In this way, the amount of information needed to code the colour information in an image can be reduced with an acceptable reduction in image quality.
The lower spatial resolution of the chrominance components is usually attained by sub-sampling. Typically, each frame of a video sequence is divided into so-called “macroblocks”, which comprise luminance (Y) information and associated chrominance (U, V) information, which is spatially sub-sampled. FIG. 1 illustrates one way in which macroblocks can be formed. As shown in FIG. 1, a frame of a video sequence represented using a YUV color model, each component having the same spatial resolution. Macroblocks are formed by representing a region of 16×16 image pixels in the original image as four blocks of luminance information, each luminance block comprising an 8×8 array of luminance (Y) values and two, spatially corresponding, chrominance components (U and V) which are sub-sampled by a factor of two in both the horizontal and vertical directions to yield corresponding arrays of 8×8 chrominance (U, V) values. According to certain video coding recommendations, such as International Telecommunications Union (ITU-T) recommendation H.26L, the block size used within the macroblocks can be other than 8×8, for example 4×8 or 4×4 (see T. Wiegand, “Joint Model Number 1 ”, Doc. JVT-A003, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, January 2002, Sections 2.2 and 2.3). ITU-T recommendation H.26L also allows macroblocks to be organized together to form so-called “slices”. More specifically, each slice is formed from a number of consecutive macroblocks in coding order and is encoded in such a way that it can be decoded independently without making reference to any other slice of the same frame. This arrangement is advantageous, as it tends to limit the propagation of artefacts in the decoded video that may arise due to transmission errors. While there is no specific limitation on the way in which slices may be constructed, one straightforward scheme is to group all the macroblocks in a single row of a frame together as a slice. This arrangement, together with the division of a QCIF format image into 16×16 macroblocks is illustrated in FIG. 2.
As can be seen from FIG. 2, a QCIF image comprises 11×9 macroblocks (in this case grouped into 9 slices of 11 consecutive macroblocks each). If the luminance blocks and chrominance blocks are represented with 8 bit resolution (that is by numbers in the range 0 to 255), the total number of bits required per macroblock is (16×16×8)+2×(8×8×8)=3072 bits. The number of bits needed to represent a video frame in QCIF format is thus 99×3072=304,128 bits. This means that the amount of data required to transmit/record/display an uncompressed video sequence in QCIF format, represented using a YUV colour model, at a rate of 30 frames per second, is more than 9 Mbps (million bits per second). This is an extremely high data rate and is impractical for use in video recording, transmission and display applications because of the very large storage capacity, transmission channel capacity and hardware performance required.
If video data is to be transmitted in real-time over a fixed line network such as an ISDN (Integrated Services Digital Network) or a conventional PSTN (Public Switched Telephone Network), the available data transmission bandwidth is typically of the order of 64 kbits/s. In mobile videotelephony, where transmission takes place at least in part over a radio communications link, the available bandwidth can be as low as 20 kbits/s. This means that a significant reduction in the amount of information used to represent video data must be achieved in order to enable transmission of digital video sequences over low bandwidth communication networks. For this reason, video compression techniques have been developed which reduce the amount of information transmitted while retaining an acceptable image quality.
Video compression methods are based on reducing the redundant and perceptually irrelevant parts of video sequences. The redundancy in video sequences can be categorised into spatial, temporal and spectral redundancy. “Spatial redundancy” is the term used to describe the correlation (similarity) between neighbouring pixels within a frame. The term “temporal redundancy” expresses the fact that objects appearing in one frame of a sequence are likely to appear in subsequent frames, while “spectral redundancy” refers to the correlation between different colour components of the same image.
There is often a significant amount of spatial redundancy between the pixels that make up each frame of a digital video sequence. In other words, the value of any pixel within a frame of the sequence is substantially the same as the value of other pixels in its immediate vicinity. Typically, video coding systems reduce spatial redundancy using a technique known as “block-based transform coding”, in which a mathematical transformation, such as a two-dimensional Discrete Cosine Transform (DCT), is applied to blocks of image pixels. This transforms the image data from a representation comprising pixel values to a form comprising a set of coefficient values representative of spatial frequency components. This alternative representation of the image data reduces spatial redundancy significantly and thereby produces a more compact representation of the image data.
Frames of a video sequence which are compressed using block-based transform coding, without reference to any other frame within the sequence, are referred to as INTRA-coded or I-frames.
Generally, video coding systems not only reduce the spatial redundancy within individual frames of a video sequence, but also make use of a technique known as “motion-compensated prediction”, to reduce the temporal redundancy in the sequence. Using motion-compensated prediction, the image content of some (often many) of the frames in a digital video sequence is “predicted” from one or more other frames in the sequence, known as “reference” frames. Prediction of image content is achieved by tracking the motion of objects or regions of an image between a frame to be coded (compressed) and the reference frame(s) using “motion vectors”. As in the case of INTRA-coding, motion compensated prediction of a video frame is typically performed macroblock-by-macroblock.
Frames of a video sequence, compressed using motion-compensated prediction, are generally referred to as INTER-coded or P-frames. Motion-compensated prediction alone rarely provides a sufficiently precise representation of the image content of a video frame and therefore it is typically necessary to provide a so-called “prediction error” (PE) frame with each INTER-coded frame. The prediction error frame represents the difference between a decoded version of the INTER-coded frame and the image content of the frame to be coded. More specifically, the prediction error frame comprises values that represent the difference between pixel values in the frame to be coded and corresponding reconstructed pixel values formed on the basis of a predicted version of the frame in question. Consequently, the prediction error frame has characteristics similar to a still image and block-based transform coding can be applied in order to reduce its spatial redundancy and hence the amount of data (number of bits) required to represent it.
In order to illustrate the operation of a video coding system in greater detail, reference will now be made to FIGS. 3 and 4. FIG. 3 is a schematic diagram of a generic video encoder that employs a combination of INTRA- and INTER-coding to produce a compressed (encoded) video bit-stream. A corresponding decoder is illustrated in FIG. 4 and will be described later in the text.
The video encoder 100 comprises an input 101 for receiving a digital video signal from a camera or other video source (not shown). It also comprises a transformation unit 104 which is arranged to perform a block-based discrete cosine transform (DCT), a quantizer 106, an inverse quantizer 108, an inverse transformation unit 110, arranged to perform an inverse block-based discrete cosine transform (IDCT), combiners 112 and 116, and a frame store 120. The encoder further comprises a motion estimator 130, a motion field coder 140 and a motion compensated predictor 150. Switches 102 and 114 are operated co-operatively by control manager 160 to switch the encoder between an INTRA-mode of video encoding and an INTER-mode of video encoding. The encoder 100 also comprises a video multiplex coder 170 which forms a single bit-stream from the various types of information produced by the encoder 100 for further transmission to a remote receiving terminal or, for example, for storage on a mass storage medium, such as a computer hard drive (not shown).
Encoder 100 operates as follows. Each frame of uncompressed video provided from the video source to input 101 is received and processed macroblock by macroblock, preferably in raster-scan order. When the encoding of a new video sequence starts, the first frame to be encoded is encoded as an INTRA-coded frame. Subsequently, the encoder is programmed to code each frame in INTER-coded format, unless one of the following conditions is met: 1) it is judged that the current macroblock of the frame being coded is so dissimilar from the pixel values in the reference frame used in its prediction that excessive prediction error information is produced, in which case the current macroblock is coded in INTRA-coded format; 2) a predefined INTRA frame repetition interval has expired; or 3) feedback is received from a receiving terminal indicating a request for a frame to be provided in INTRA-coded format.
Operation of the encoder 100 in INTRA-coding mode will now be described. In INTRA-coding mode, the control manager 160 operates the switch 102 to accept video input from input line 118. The video signal input is received macroblock by macroblock and the blocks of luminance and chrominance values which make up each macroblock are passed to the DCT transformation block 104. Here a 2-dimensional discrete cosine transform is performed and a 2-dimensional array of DCT coefficients is formed for each block.
The DCT coefficients for each block are passed to the quantizer 106, where they are quantized using a quantization parameter QP. Selection of the quantization parameter QP is controlled by the control manager 160 via control line 115.
In more detail, quantization of the DCT coefficients is performed by dividing each coefficient value by the quantization parameter QP and rounding the result to the nearest integer. In this way the quantization process yields a set of quantized DCT coefficient values that have a reduced numerical precision compared with the coefficient values originally generated by DCT transformation block 104. Thus, in general, each of the quantized DCT coefficients may be represented by a smaller number of data bits than required to represent the corresponding coefficient before quantization. Furthermore, certain DCT coefficients are reduced to zero by the quantization process, thus reducing the number of coefficients that must be coded. Both these effects result in a reduction in the amount of data (i.e. data bits) required to represent the DCT coefficients for an image block. Thus, quantization provides a further mechanism by which the amount of data required to represent each image of the video sequence can be reduced. It also introduces an irreversible loss of information, which leads to a corresponding reduction in image quality. While this reduction in image quality may not always be desirable, quantization of DCT coefficient values does provide the possibility to adjust the number of bits required to encode a video sequence to take into account e.g. the bandwidth available for transmission of the encoded sequence or the desired quality of the coded video. More specifically, by increasing the value of QP used to quantize the DCT coefficients, a lower quality, but more compact representation of the video sequence can be created. Conversely, by reducing the value of QP, a higher quality but less compressed encoded bit-stream can be formed.
The quantized DCT coefficients for each block are passed from the quantizer 106 to the video multiplex coder 170, as indicated by line 125 in FIG. 1. The video multiplex coder 170 orders the quantized transform coefficients for each block using a zigzag scanning procedure, thereby converting the two-dimensional array of quantized transform coefficient values into a one-dimensional array. Typically, the video multiplex coder 170 next represents each non-zero quantized coefficient in the one-dimensional array by a pair of values, referred to as level and run, level being the value of the quantized coefficient and run being the number of consecutive zero-valued coefficients preceding the coefficient in question. The run and level values are further compressed using entropy coding. For example, a method such as variable length coding (VLC) may be used to produce a set of variable length codewords representative of each (run, level) pair.
Once the (run, level) pairs have been entropy (e.g. variable length) coded, the video multiplex coder 170 further combines them with control information, also entropy coded, for example, using a variable length coding method appropriate for the kind of information in question, to form a single compressed bit-stream of coded image information 135. It is this bit-stream, including the variable length codewords representative of the (run, level) pairs and control information relating, among other things, to the quantization parameter QP used for quantizing the DCT coefficients, which is transmitted from the encoder.
A locally decoded version of the macroblock is also formed in the encoder 100. This is done by passing the quantized transform coefficients for each block, output by quantizer 106, through inverse quantizer 108 and applying an inverse DCT transform in inverse transformation block 110. Inverse quantization is performed by reversing the quantization operation performed in quantizer 106. More specifically, inverse quantizer 108 attempts to recover the original DCT coefficient values for a given image block by multiplying each quantized DCT coefficient value by quantization parameter QP. Because of the rounding operation performed as part of the quantization process in quantizer 106, it is generally not possible to recover the original DCT coefficient values exactly. This results in a discrepancy between the recovered DCT coefficient values and those originally produced by DCT transformation block 104 (hence the irreversible loss of information referred to above).
The operations performed by inverse quantizer 108 and inverse transformation block 110 yield a reconstructed array of pixel values for each block of the macroblock. The resulting decoded image data is input to combiner 112. In INTRA-coding mode, switch 114 is set so that the input to the combiner 112 via switch 114 is zero. In this way, the operation performed by combiner 112 is equivalent to passing the decoded image data unaltered.
As subsequent macroblocks of the current frame are received and undergo the previously described encoding and local decoding steps in blocks 104, 106, 108, 110 and 112, a decoded version of the INTRA-coded frame is built up in frame store 120. When the last macroblock of the current frame has been INTRA-coded and subsequently decoded, the frame store 120 contains a completely decoded frame, available for use as a prediction reference frame in coding a subsequently received video frame in INTER-coded format. The flag indicating INTRA or INTER-code format is provided in line 122.
Operation of the encoder 100 in INTER-coding mode will now be described. In INTER-coding mode, the control manager 160 operates switch 102 to receive its input from line 117, which comprises the output of combiner 116. The combiner 116 receives the video input signal macroblock by macroblock from input 101. As combiner 116 receives the blocks of luminance and chrominance values which make up the macroblock, it forms corresponding blocks of prediction error information. The prediction error information represents the difference between the block in question and its prediction, produced in motion compensated prediction block 150. More specifically, the prediction error information for each block of the macroblock comprises a two-dimensional array of values, each of which represents the difference between a pixel value in the block of luminance or chrominance information being coded and a decoded pixel value obtained by forming a motion-compensated prediction for the block, according to the procedure described below.
The prediction error information for each block of the macroblock is passed to DCT transformation block 104, which performs a two-dimensional discrete cosine transform on each block of prediction error values to produce a two-dimensional array of DCT transform coefficients for each block.
The transform coefficients for each prediction error block are passed to quantizer 106 where they are quantized using a quantization parameter QP, in a manner analogous to that described above in connection with operation of the encoder in INTRA-coding mode. Again, selection of the quantization parameter QP is controlled by the control manager 160 via control line 115. The accuracy of prediction error coding can be adjusted depending on the available bandwidth and/or required quality of the coded video. In a typical Discrete Cosine Transform (DCT) based system this is done by varying the Quantizer Parameter (QP) used in quantizing the DCT coefficients the DCT coefficients to a specific accuracy.
The quantized DCT coefficients representing the prediction error information for each block of the macroblock are passed from quantizer 106 to video multiplex coder 170, as indicated by line 125 in FIG. 1. As in INTRA-coding mode, the video multiplex coder 170 orders the transform coefficients for each prediction error block using a zigzag scanning procedure and then represents each non-zero quantized coefficient as a (run, level) pair. It further compresses the (run, level) pairs using entropy coding, in a manner analogous to that described above in connection with INTRA-coding mode. Video multiplex coder 170 also receives motion vector information (described in the following) from motion field coding block 140 via line 126 and control information (e.g. including an indication of the quantization parameter QP) from control manager 160. It entropy codes the motion vector information and control information and forms a single bit-stream of coded image information, 135 comprising the entropy coded motion vector, prediction error and control information. The indication, qz, of the quantization parameter QP is provided to multiplex xoswe 170 via line 124.
The quantized DCT coefficients representing the prediction error information for each block of the macroblock are also passed from quantizer 106 to inverse quantizer 108. Here they are inverse quantized, in a manner analogous to that previously described in connection with operation of the encoder in INTRA-coding mode. In INTER-coding mode, the quality of the encoded video bit-stream and the number of bits required to represent the video sequence can be adjusted by varying the degree of quantization applied to the DCT coefficients representing the prediction error information.
The resulting blocks of inverse quantized DCT coefficients are applied to inverse DCT transform block 110, where they undergo inverse DCT transformation to produce locally decoded blocks of prediction error values. The locally decoded blocks of prediction error values are then input to combiner 112. In INTER-coding mode, switch 114 is set so that the combiner 112 also receives predicted pixel values for each block of the macroblock, generated by motion-compensated prediction block 150. The combiner 112 combines each of the locally decoded blocks of prediction error values with a corresponding block of predicted pixel values to produce reconstructed image blocks and stores them in frame store 120.
As subsequent macroblocks of the video signal are received from the video source and undergo the previously described encoding and decoding steps in blocks 104, 106, 108, 110, 112, a decoded version of the frame is built up in frame store 120. When the last macroblock of the frame has been processed, the frame store 120 contains a completely decoded frame, available for use as a prediction reference frame in encoding a subsequently received video frame in INTER-coded format.
Formation of a prediction for a macroblock of the current frame will now be described. Any frame encoded in INTER-coded format requires a reference frame for motion-compensated prediction. This means, necessarily, that when encoding a video sequence, the first frame to be encoded, whether it is the first frame in the sequence, or some other frame, must be encoded in INTRA-coded format. This, in turn, means that when the video encoder 100 is switched into INTER-coding mode by control manager 160, a complete reference frame, formed by locally decoding a previously encoded frame, is already available in the frame store 120 of the encoder. In general, the reference frame is formed by locally decoding either an INTRA-coded frame or an INTER-coded frame.
The first step in forming a prediction for a macroblock of the current frame is performed by motion estimation block 130. The motion estimation block 130 receives the blocks of luminance and chrominance values which make up the current macroblock of the frame to be coded via line 128. It then performs a block matching operation in order to identify a region in the reference frame which corresponds substantially with the current macroblock. In order to perform the block matching operation, motion estimation block accesses reference frame data stored in frame store 120 via line 127. More specifically, motion estimation block 130 performs block-matching by calculating difference values (e.g. sums of absolute differences) representing the difference in pixel values between the macroblock under examination and candidate best-matching regions of pixels from a reference frame stored in the frame store 120. A difference value is produced for candidate regions at all possible positions within a predefined search region of the reference frame and motion estimation block 130 determines the smallest calculated difference value. The offset between the macroblock in the current frame and the candidate block of pixel values in the reference frame that yields the smallest difference value defines the motion vector for the macroblock in question.
Once the motion estimation block 130 has produced a motion vector for the macroblock, it outputs the motion vector to the motion field coding block 140. The motion field coding block 140 approximates the motion vector received from motion estimation block 130 using a motion model comprising a set of basis functions and motion coefficients. More specifically, the motion field coding block 140 represents the motion vector as a set of motion coefficient values which, when multiplied by the basis functions, form an approximation of the motion vector. Typically, a translational motion model having only two motion coefficients and basis functions is used, but motion models of greater complexity may also be used.
The motion coefficients are passed from motion field coding block 140 to motion compensated prediction block 150. Motion compensated prediction block 150 also receives the best-matching candidate region of pixel values identified by motion estimation block 130 from frame store 120. Using the approximate representation of the motion vector generated by motion field coding block 140 and the pixel values of the best-matching candidate region from the reference frame, motion compensated prediction block 150 generates an array of predicted pixel values for each block of the macroblock. Each block of predicted pixel values is passed to combiner 116 where the predicted pixel values are subtracted from the actual (input) pixel values in the corresponding block of the current macroblock, thereby forming a set of prediction error blocks for the macroblock.
Operation of the video decoder 200, shown in FIG. 2 will now be described. The decoder 200 comprises a video multiplex decoder 270, which receives an encoded video bit-stream 135 from the encoder 100 and demultiplexes it into its constituent parts, an inverse quantizer 210, an inverse DCT transformer 220, a motion compensated prediction block 240, a frame store 250, a combiner 230, a control manager 260, and an output 280.
The control manager 260 controls the operation of the decoder 200 in response to whether an INTRA- or an INTER-coded frame is being decoded. An INTRA/INTER trigger control signal, which causes the decoder to switch between decoding modes is derived, for example, from picture type information associated with each compressed video frame received from the encoder. The INTRA/INTER trigger control signal is extracted from the encoded video bit-stream by the video multiplex decoder 270 and is passed to control manager 260 via control line 215.
Decoding of an INTRA-coded frame is performed macroblock-by-macroblock. The video multiplex decoder 270 separates the encoded information for the blocks of the macroblock from possible control information relating to the macroblock in question. The encoded information for each block of an INTRA-coded macroblock comprises variable length codewords representing the VLC coded level and run values for the non-zero quantized DCT coefficients of the block. The video multiplex decoder 270 decodes the variable length codewords using a variable length decoding method corresponding to the encoding method used in the encoder 100 and thereby recovers the (run, level) pairs. It then reconstructs the array of quantized transform coefficient values for each block of the macroblock and passes them to inverse quantizer 210. Any control information relating to the macroblock is also decoded in the video multiplex decoder using an appropriate decoding method and is passed to control manager 260. In particular, information relating to the level of quantization applied to the transform coefficients (i.e. quantization parameter QP) is extracted from the encoded bit-stream by video multiplex decoder 270 and provided to control manager 260 via control line 217. The control manager, in turn, conveys this information to inverse quantizer 210 via control line 218. Inverse quantizer 210 inverse quantizes the quantized DCT coefficients for each block of the macroblock according to the control information relating to quantization parameter QP and provides the now inverse quantized DCT coefficients to inverse DCT transformer 220. The inverse quantization operation performed by inverse quantizer 210 is identical to that performed by inverse quantizer 108 in the encoder.
Inverse DCT transformer 220 performs an inverse DCT transform on the inverse quantized DCT coefficients for each block of the macroblock to form a decoded block of image information comprising reconstructed pixel values. The reconstructed pixel values for each block of the macroblock are passed via combiner 230 to the video output 280 of the decoder where, for example, they can be provided to a display device (not shown). The reconstructed pixel values for each block of the macroblock are also stored in frame store 250. Because motion-compensated prediction is not used in the encoding/decoding of INTRA coded macroblocks control manager 260 controls combiner 230 to pass each block of pixel values as such to the video output 280 and frame store 250. As subsequent macroblocks of the INTRA-coded frame are decoded and stored, a decoded frame is progressively assembled in the frame store 250 and thus becomes available for use as a reference frame for motion compensated prediction in connection with the decoding of subsequently received INTER-coded frames.
INTER-coded frames are also decoded macroblock by macroblock. The video multiplex decoder 270 receives the encoded video bit-stream 135 and separates the encoded prediction error information for each block of an INTER-coded macroblock from encoded motion vector information and possible control information relating to the macroblock in question. As explained in the foregoing, the encoded prediction error information for each block of the macroblock typically comprises variable length codewords representing the level and run values for the non-zero quantized transform coefficients of the prediction error block in question. The video multiplex decoder 270 decodes the variable length codewords using a variable length decoding method corresponding to the encoding method used in the encoder 100 and thereby recovers the (run, level) pairs. It then reconstructs an array of quantized transform coefficient values for each prediction error block and passes them to inverse quantizer 210. Control information relating to the INTER-coded macroblock is also decoded in the video multiplex decoder 270 using an appropriate decoding method and is passed to control manager 260. Information relating to the level of quantization (QP) applied to the transform coefficients of the prediction error blocks is extracted from the encoded bit-stream and provided to control manager 260 via control line 217. The control manager, in turn, conveys this information to inverse quantizer 210 via control line 218. Inverse quantizer 210 inverse quantizes the quantized DCT coefficients representing the prediction error information for each block of the macroblock according to the control information relating to quantization parameter QP and provides the now inverse quantized DCT coefficients to inverse DCT transformer 220. Again, the inverse quantization operation performed by inverse quantizer 210 is identical to that performed by inverse quantizer 108 in the encoder. The INTRA/INTER flag is provided in line 215.
The inverse quantized DCT coefficients representing the prediction error information for each block are then inverse transformed in the inverse DCT transformer 220 to yield an array of reconstructed prediction error values for each block of the macroblock.
The encoded motion vector information associated with the macroblock is extracted from the encoded video bit-stream 135 by video multiplex decoder 270 and is decoded. The decoded motion vector information thus obtained is passed via control line 225 to motion compensated prediction block 240, which reconstructs a motion vector for the macroblock using the same motion model as that used to encode the INTER-coded macroblock in encoder 100. The reconstructed motion vector approximates the motion vector originally determined by motion estimation block 130 of the encoder. The motion compensated prediction block 240 of the decoder uses the reconstructed motion vector to identify the location of a region of reconstructed pixels in a prediction reference frame stored in frame store 250. The region of pixels indicated by the reconstructed motion vector is used to form a prediction for the macroblock in question. More specifically, the motion compensated prediction block 240 forms an array of pixel values for each block of the macroblock by copying corresponding pixel values from the region of pixels identified in the reference frame. These blocks of pixel values, derived from the reference frame, are passed from motion compensated prediction block 240 to combiner 230 where they are combined with the decoded prediction error information. In practice, the pixel values of each predicted block are added to corresponding reconstructed prediction error values output by inverse DCT transformer 220. In this way, an array of reconstructed pixel values for each block of the macroblock is obtained. The reconstructed pixel values are passed to the video output 280 of the decoder and are also stored in frame store 250.
As subsequent macroblocks of the INTER-coded frame are decoded and stored, a decoded frame is progressively assembled in the frame store 250 and thus becomes available for use as a reference frame for motion-compensated prediction of other INTER-coded frames.
As described above, typical video encoding and decoding systems (commonly referred to as video codecs) are based on motion compensated prediction and prediction error coding. Motion compensated prediction is obtained by analyzing and coding motion between video frames and reconstructing image segments using the motion information. Prediction error coding is used to code the difference between motion compensated image segments and corresponding segments of the original image. The accuracy of prediction error coding can be adjusted depending on the available bandwidth and required quality of the coded video. In a typical Discrete Cosine Transform (DCT) based system this is done by varying the quantization parameter (QP) used in quantizing the DCT coefficients to a specific accuracy.
It should be noted that, in order to stay in synchronization with the encoder, the decoder has to know the exact value of the QP used in the coded video sequence. Typically, the QP value is transmitted once per slice leading to increase in the number of bits needed to encode the image. (As previously explained, a slice contains part of the image and is coded independently from other slices in order to avoid propagation of possible transmission errors inside the picture). For example, if the coding of a single QP value takes 6 bits and 20 images, each divided into 10 slices, are transmitted every second, 1.2 kbps is spent for the QP information alone.
Prior art solutions (for example, the H.26L video coding recommendation presented in the document by T. Wiegand, “Joint Model Number 1 ”, Doc. JVT-A003, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, January 2002) code the picture/slice QP parameters independently with a fixed or variable length code. This leads to increased transmission bit-rate as described above. More specifically, according to H.26L Joint Model Number 1, the quantization parameter value QP used in quantizing the DCT coefficient values is typically indicated in the encoded bit-stream at the beginning of each picture (see T. Wiegand, “Joint Model Number 1”, Doc. JVT-A003, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, January 2002, Section 3.3.1). If the macroblocks within a frame are arranged into slices then the QP value is also indicated at the beginning of each slice of the frame (e.g. in an optional slice header portion of the encoded bit-stream). In both cases, the QP value is indicated as such or is coded using an appropriate variable length coding scheme. As stated above, it should be realized that this scheme is very costly in terms of the number of bits required to represent the quantization parameter information, particularly in situations where frames are divided into many slices and/or the available bandwidth available for transmission of the encoded video sequence is low. This is a particularly significant problem in mobile video applications in which the encoded video bit-stream is transmitted over a radio communications link. In this situation, the bandwidth available for transmission of the encoded video bit-stream may be as low as 20 kbits/s and the QP information included in the bit-stream may represent a significant proportion of the overall available bandwidth.
Furthermore, according to H.26L, the value of QP may optionally be varied at the macroblock level by inserting a quantizer change parameter (Dquant) in the portion of the encoded bit-stream representative of the macroblock in question (see T. Wiegand, “Joint Model Number 1”, Doc. JVT-A003, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, January 2002, Section 3.4.7). This leads to a further increase in the amount of information in the encoded bit-stream that is devoted to the indication of QP related information.
In view of the foregoing, it should be realised that there is a significant need for an improved mechanism for indicating information relating to quantization parameter values in video coding systems.