1. Field of the Invention
The present invention relates to coded block pattern (CBP) encoding and decoding in a video signal encoding and decoding system and more particularly to CBP encoding/decoding apparatus and method wherein one of variable length coding (VLC)/variable length decoding (VLD) table is stored in a memory and selectively applied for encoding and decoding a coded block pattern of a macroblock according to the number of blocks having object pixels within the macroblock, which is detected using shape information.
2. Description of Related Art
Generally, video signal compressive encoding and decoding allow not only transmission of video information via low rate channels, but also reduce memory requirements for storing the video. Therefore, compression encoding and decoding techniques are very important to the multimedia industry requiring applications such as storage and transmission of video.
Standardization of information compressing methods are required for compatibility of information and extension of the multimedia industry, while standards for video are established based upon various applications. Representative standards for video encoding and decoding are H.261 recommended by ITU-T (International Telecommunication Union-Telecommunication Standardization Sector, old CCITF) for transmitting video information for video phone and video conferencing via ISDN (Integrated Service Digital Network), H.263 recommended by ITU-T for transmitting the video information via PSTN (Public Switched Telephone Network), MPEG (Moving Picture Experts Group)-1 recommended by ISO/IEC JTC 1/SC29/WG11 (International Standardization Organization/International Electrotechnical Commission Joint Technical Committee 1/Sub Committee 29/Working Group 11) MPEG for storing video in digital storage media (DSM), and MPEG-2 for high definition digital broadcasting such as EDTV (Enhanced Digital Television) and HDTV (High definition Television).
Compressive encoding of still image signals also has been standardized and JPEG (Joint Photographic Coding Experts Group) recommended by ISO/IEC JTC1/SC29/WG1 is a representative standard.
Such conventional video signal coding methods encode a rectangular frame or a whole picture, thus called frame-based coding. This frame-based coding method encodes texture information (e.g., luminance and chrominance) of all picture elements (pels or pixels) forming the frame.
Recently, however, instead of coding and transmitting the whole frame, there are increased demands for multimedia products and service including a function of coding and transmitting or manipulating only a particular region (or object) that a user is interested in or wants for some necessities.
In response to this tendency, an object-based coding method of encoding an image in units of arbitrarily shaped regions has been actively studied as the alternative to the frame-based coding that encodes the whole frame.
FIG. 1 and FIG. 2 show examples of test prior art images used for explaining the object-based coding. FIG. 1 shows a frame including a picture of two children playing with a ball in a certain space (background). The object-based coding is appropriate for this case since only the video information is needed for transmission of an object composed of children and a ball. Namely, only texture information values of pixels forming the children and ball are encoded and transmitted. Here, the region including the children and ball is called an object and the other region, excluding the object, is called a background.
For compressively encoding the image shown FIG. 1 using the object-based coding, an encoder and a decoder should equally recognize which pixels of the whole frame of pixels represent the children and ball and which pixels of the whole frame pixels represent the background. Such information is called shape information. The encoder should efficiently encode and transmit the shape information to the decoder to permit the decoder to recognize the shape information. The largest difference between the frame-based encoder/decoder and the object-based encoder/decoder is that the object-based encoder/decoder includes a shape information encoder/decoder.
FIG. 2 shows shape information when only the children and ball are considered as an object among the video information. The pixels forming the children and ball have bright values and the pixels forming the background have dark values.
In order to discriminate the pixels forming the object from the pixels forming the background, as shown in FIG. 2, the pixels are represented by shape information having predetermined values according to their respective regions. This is called a binary mask. For example, all the pixels belonging to the background have a value xe2x80x9c0xe2x80x9d and all the pixels belonging to the object have a value xe2x80x9c255xe2x80x9d, so that each pixel has one value between xe2x80x9c0xe2x80x9d and xe2x80x9c255xe2x80x9d. For the object-based coding, the shape information for identifying and discriminating object pixels and background pixels among all the pixels forming a whole picture is required. Each of the object pixels has the texture information.
The shape information can be represented by a contour indicating a boundary between the background and the object other than the binary mask. The two types are alternative. Contour extraction is carried out to express the binary mask as contour information. Altematively, contour filling is carried out to obtain the binary mask from the contour information. For the purpose of encoding and transmitting to the decoder the shape information with the small amount of bits, an effective shape information coding method is required. This shape information coding method is not related to the present invention, so the detailed description is omitted.
Representative frame-based coding methods are H.261 and H.263 recommended by ITU-T, MPEG-1 and MPEG-2 by ISO/IEC JTC1/SC29/WG11, and JPEG by ISO/IEC JTC1/SC29/WG1. Representative object-based coding methods are MPEG-4 recommended by ISO/IEC JTC1/SC29/WG11 and JPEG2000 by ISO/IEC JTC1/SC29/WG1.
A conventional video signal coding method that is widely used around the world is transform coding. Transform coding converts video signals into transform coefficients (or frequency coefficients) to suppress transmission of high frequency components and to transmit signals of low frequency components. This method has an advantage of increasing a compression ratio with reduction of loss in picture quality. Discrete Fourier transform (DFT), discrete cosine transform (DCT), discrete sine transform (DST), and Walsh-Hadamard transform (WHT) have been developed for the transform coding.
The DCT of the transform methods is excellent at compacting video signal energy into a low-frequency component. Compared with other transform methods, DCT provides high picture quality with only the small number of low frequency coefficients and includes a fast algorithm. Due to these advantages, DCT is the most generally used transform coding and is employed for the video coding standardization systems such as H.261, H.263, MPEG-1, MPEG-2, MPEG-4, and JPEG.
Conventional frame-based coding divides a frame into macroblocks, each respectively having 16 pixels in length and width (hereinafter this size is expressed as 16*16), while carrying out the coding in macroblock units. Namely, motion estimation, motion compensation, and coding type decisions are carried out in macroblock units. The coding type determines whether to encode an input video signal or perform motion compensation of an error signal of the macroblock. A macroblock corresponding to the former is called an intra macroblock and a macroblock corresponding to the latter is called an inter macroblock.
According to conventional techniques, DCT is performed with respect to input determined in accordance with the transmitted coding type and transform coefficients. Here, the macroblock is divided into blocks of 8*8 and the DCT is performed in block units.
FIG. 3 is part of the prior art and shows the relationship between a macroblock and blocks. As shown in FIG. 3, the macroblock comprises four blocks, Y1, Y2, Y3, and Y4. Transform coefficients of blocks are quantized. The quantized coefficients marshaled on the two-dimensional blocks are re-marshaled in one dimension through scanning and then variable length coded for transmission to a receiver.
The transform coefficients are classified into a DC coefficient and an AC coefficient. The DC coefficient represents an average value of an input block signal. The DC coefficient has different meanings according to the coding type of a corresponding input macroblock, such as the inter macroblock or intra macroblock. For the intra macroblock, the motion compensation error is coded, so that the DC coefficient has a peripheral value in many cases. For the inter macroblock, the input video signal is coded, so that the DC coefficient has the average value of the input video signal and regarded as very important information. For these reasons, many coding methods discriminate DC information from the AC information and transmit the information in detail. On the other hand, the DC coefficient may not be discriminated from the AC coefficient during the coding based upon the reason that the DC coefficient does not have a large value in the intra coding.
The variable length coding (VLC) is a method for transmitting the AC coefficient. Once transform coefficients are marshaled in one dimension after passing through the scanning, a non-zero, AC coefficient is two-dimensionally VL-coded through a combination coding of a distance to a previous non-zero, AC coefficient and its own magnitude. For the last non-zero, coefficient in a corresponding block, an end of block (EOB) signal is transmitted. Alternatively, three-dimensional VLC is employed for the combination of three pieces of information including: a distance to another non-zero, AC coefficient; a magnitude of a pertinent coefficient itself; and LAST information indicating whether the pertinent coefficient itself is the last none-zero one or not.
If all the quantized coefficients [xe2x88x92] (AC and DC coefficients, or AC coefficients if the DC coefficient is separately encoded) [xe2x88x92] have a value xe2x80x9c0xe2x80x9d, there is no data transmitted. At this time, the encoder transmits information indicating whether or not each block has transmitted data to the decoder. This kind of information is transmitted once per macroblock by combining information on four blocks of the macroblock. The combined information is called a coded block pattern. Each macroblock comprises four blocks and each block is subjected to one of two cases of having and not having transmitted data, so the coded block pattern has 16 possible cases.
The second column of FIG. 4 shows 16 cases of the coded block pattern of an intra macroblock. Starting from the most left one, four digits made of xe2x80x9c0xe2x80x9d and xe2x80x9c1xe2x80x9d respectively indicate the existence/non-existence of data with respect to the blocks, Y1, Y2, Y3, and Y4 of FIG. 3. Here, a xe2x80x9c0xe2x80x9d indicates the non-existence of coded data while a xe2x80x9c1xe2x80x9d indicates existence of coded data. For example, xe2x80x9c0101xe2x80x9d means that the data blocks Y1 and Y3 contain the transmitted data and the blocks Y2 and Y4 do not contain the transmitted data. The third column of FIG. 4 shows cases of an inter macroblock.
Although 4-bit fixed length coding (FLC) can be applied to each coded block pattern, since the coded block pattern has 16 cases, VLC is applied to reduce the amount of bits generated. In other words, length of a code is differently assigned according to how often a case happens. For example, the more the case happens, the less the number of bits of a code are assigned and the less the case happens, the more the number of bits of a code are assigned. The fifth column of FIG. 4 shows codes used in H.263 and the fourth column shows the number of bits of each code.
The frequency in occurrence of a coded block pattern can be different according to a coding method. In the case of H.263, the intra macroblock and the inter macroblock each have a different frequency of occurrence of the coded block pattern, so VLC tables for coded block patterns are differently set for the intra macroblock and inter macroblock. A particular thing is that the frequency of occurrence of coded block patterns of the intra macroblock and inter macroblock are similar to each other when the coded block patterns are in relation of 2""s complement. According to the table shown in FIG. 4, the coded block pattern xe2x80x9c1111xe2x80x9d occurs most frequently in the case of the intra macroblock and the coded block pattern xe2x80x9c0000xe2x80x9d occurs most frequently in the case of the inter macroblock. In the case of the inter macroblock, the motion compensation error is coded, so it often happens that there is no transmitted data when motion estimation is accurate or a quantization interval is large. In the case of the intra macroblock, the input video signal is coded, so most cases have transmitted data when video signals are not uniform. The coded block pattern xe2x80x9c0110xe2x80x9d of the inter macroblock and the coded block pattern xe2x80x9c1001xe2x80x9dof the inter macroblock have the least frequency of occurrence.
Although different coded block patterns are used according to the coding type, the same table is used, so the same volume of memory is used regardless of the coding type.
The object-based coding also performs the coding such as DCT in block units and determines the coded block pattern in macroblock units. However, if the VLC table for coded block patterns used in the frame-based coding is directly applied to the object-based coding, a decrease in coding efficiency occurs.
FIG. 5 is part of the prior art and shows an example of an input signal in the object-based coding. MB1, MB2, MB3, and MB4 respectively indicate top left, top right, bottom left, and bottom right macroblocks and each macroblock comprises Y1, Y2, Y3, and Y4 blocks. An oval-shaped, hatched part indicates a group of pixels belonging to an object (object pixels). Three blocks Y2, Y3, and Y4 of MB1 respectively include at least one object pixel. The block Y1 of MB1 has no transmitted data. Accordingly, in the case that MB1 is the intra macroblock, cases corresponding to indexes 8-15 shown in the first column of FIG. 4 do not occur. At this time, the VLC table of FIG. 4 can be used, but this is inefficient. M133 has only the block Y2 as a block including the object pixel. In the case that MB3 is the intra macroblock, the blocks Y1, Y3, and Y4 do not have transmitted data, and 14 cases other than cases corresponding to indexes 0 and 4 of FIG. 4 do not occur. At this time, the table of FIG. 4 can also be used, but this is inefficient.
As illustrated, the conventional coded block pattern encoding uses one VLC table for coded block patterns that is used in the frame-based coding even though macroblocks have different numbers of blocks where an object is present, thereby reducing coding efficiency.
Accordingly, the present invention is directed to coded block pattern encoding/decoding apparatus and method that substantially obviates one or more of the limitations and disadvantages of the related art.
In one embodiment of the present invention, as embodied and broadly described, a coded block pattern encoding apparatus comprises: a variable length coding (VLC) table selection unit for detecting blocks where object pixels are present at a macroblock to be coded based upon incoming shape information and generating a control signal for selecting one of a plurality of VLC tables according to the number of blocks where the object is present; and a coded block pattern encoding unit for selecting one of the plurality of VLC tables according to the control signal output from the VLC table selection unit and encoding quantized coefficients with the selected VLC table to provide a coded block pattern.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.