1. Field of the Invention
The present invention relate to image signal compression coding and decoding, and more particularly to an apparatus for and a method of coding image signals, wherein upon scanning transform coefficients of an input image signal in a shape-adaptive transform coding, only segments containing such shape-adaptive transform coefficients are scanned, thereby reducing the quantity of data being transmitted. The present invention also relates to an apparatus for and a method of decoding image signals, wherein an inverse scanning for transform coefficients of a bitstream encoded in the above-mentioned manner is carried out upon decoding the bitstream, taking into consideration only segments containing those transform coefficients.
2. Description of the Prior Art
Compression coding and decoding of image signals makes it possible to achieve transmission of image information while reducing the memory capacity required to store image signals. Thus, such compression coding and decoding techniques are very important techniques in multimedia industries involving applications such as storage and transmission of image signals. Meanwhile, a standardization for information compression schemes has been necessary for extension of multimedia industries and information compatibility. To this end, various image standardization schemes associated with a variety of applications have been proposed. For example, as representative image coding and decoding standardization schemes, there are H.261 of International Telecommunication Union—Telecommunication Standardization Sector (ITU-T, the successor of CCITT) for video phone or video conference services using integrated services digital networks (ISDN), and H.263 of ITU-T for transmission of video information using public switched telephone networks (PSTN), MPEG-1 proposed by Moving Picture Experts Group (MPEG) of International Standardization Organization/International Electrotechnical Commission Joint Technical Committee 1/Sub Committee 29/Working Group 11 (ISO/IEC JTC1/SC29/WG11) for storage media, and MPEG-2 for high quality digital broadcasting associated with high definition televisions (HDTV) and enhanced digital television (EDTV). Standardization for compressive coding of still image signals also has been made by, for example, Joint Photographic Coding Experts Group (JPEG) of ISO/IEC JTC1/SC29/WG1.
In most conventional image signal coding schemes, the entire segment of a rectangular frame or picture is encoded. Such schemes are called “frame-based coding”. In such image signal coding schemes, texture information of all pixels included in a frame, namely, luminance and chrominance, is encoded and transmitted.
Recently, the demands for multimedia products have been increased which have functions of coding or manipulating only particular regions—or object—of a frame interested or needed to the user without coding the entire region of the frame. To this end, active research has recently been conducted for object-based coding schemes adapted to encode only arbitrary shape region of a frame, as a substitute for frame-based coding schemes adapted to encode the entire region of a frame. FIGS. 1 and 2 illustrate test images for explanation of such an object-based coding scheme, respectively. FIG. 1 is a frame showing features of two children playing with a ball in an arbitrary space (background). Where information, only associated with the children and ball, of the image is to be encoded and transmitted coding of such information can be achieved using the object-based coding scheme, that is, only texture information values of pixels associated with the children and ball are encoded and transmitted. In this case, the regions respectively associated with the children and ball are designated to be an object, whereas the remaining region of the picture other than the object is considered to be a background.
For coding of the picture shown in FIG. 1 using the object-based coding scheme, all pixels included in the frame should be distinguished into those associated with the children and ball and those associated with the background in an encoder and a decoder. This information is referred as shape information of the object. In order to allow the decoder to recognize the shape information, the encoder should efficiently encode the shape information and then transmit the shape information to the decoder. For thus, the object-based encoder and decoder have a most remarkable difference from the frame-based encoder and decoder in that they include a shape information encoder and a shape information decoder, respectively.
FIG. 2 shows shape information included in the image information where only the children and ball are considered to be an object. In this case, the pixels associated with the children and ball have shape information bearing a bright value whereas the pixels associated with the background have shape information bearing a dark value. Such shape information of pixels assigned with different values to distinguish those of the pixels associated with the object from those associated with the background is called a “binary mask”. Shape information may also be expressed by a contour indicative of the boundary between the background and the object. A transformation can be made between the shape information in the form of a binary mask and the shape information in the form of a contour. That is, the shape information having the form of a binary mask can be expressed into contour information by carrying out a contour extraction. On the other hand, a contour filling is carried out for obtaining a binary mask from contour information.
Representative examples of frame-based coding schemes include H.261 and H.263 of ITU-T, MPEG-1 and MPEG-2 of ISO/IEC JTC1/SC29/WG11, and JPEG of ISO/IEC JTC1/SC2 all being standardized schemes. On the other hand, representative examples of object-based coding schemes include MPEG-4 of ISO/IEC JTC1/SC29/WG11 and JEP2000 of ISO/IEC JTC1/SC29/WG11.
Transform coding is the most widely used coding method in well-known compressive coding schemes for image signals. In such a transform coding, an image signal is transformed to transform coefficients—or frequency coefficients and low frequency components are mainly transmitted while suppressing transmission of high frequency components. This scheme has an advantage of a high compression ratio while minimizing the degradation of picture quality. Examples of such a transform coding scheme include a discrete Fourier transform (DFT), a discrete cosine transform (DCT), a discrete sine transform (DST), and a Walsh-Hadamard transform (WHT).
Of such transform schemes, DCT is a scheme providing a superior compactness of image signal energy on low-frequency components. In other words, DCT provides a superior picture quality over other transform schemes, in spite of only using a reduced number of low frequency transform coefficients. In addition, there is a fast algorithm for DCT. By virtue of such an advantage, DCT has been used in various image coding standardization schemes such as H.261, H.263, MPEG-1, MPEG-2, MPEG-4, and JPEG.
Research of such transform coding schemes has been made with respect to image signals in blocks each consisting of a set of pixels (picture elements or pels) with a certain size. In accordance with transform coding schemes developed, a rectangular frame is divided into a plurality of blocks having the same size. A transform coding is then carried out for each block. In the case of an object-based coding scheme only texture information included in objects is encoded, as compared to frame-based coding scheme in which texture information of all pixels included in a rectangular frame is completely encoded. In such an object-based coding scheme, accordingly, it is required to conduct a transform coding only for image signal of some pixels of blocks associated to the object. FIG. 3 illustrates an object having an arbitrary shape in a frame divided into a plurality of blocks. In FIG. 3, each square region is indicative of one block. Dark region is indicative of a set of pixels associated with the object. In FIG. 4, transparent blocks correspond to blocks of FIG. 3 not to be encoded because of including no object pixel, respectively. The black blocks of FIG. 4 are indicative of blocks which are to be transformed by one of the known transform coding schemes because all pixels thereof are object pixels. The gray blocks of FIG. 4 are indicative of blocks each including both the object pixels and the non-object pixels, thereby requiring a transform coding only for texture information of a part of pixels thereof. In the following description, blocks corresponding to such gray blocks are referred to as “boundary blocks”. This scheme, in which a transform coding is not conducted for the entire pixels of each square block, but conducted for a part of pixels included in each square block, is called a “shape-adaptive transform coding”.
In a representative shape-adaptive transform coding scheme, each block which is a coding unit consists of 8×8 pixels. Namely, blocks include 8 lines per block and 8 pixels per line. In accordance with the scheme, texture signal of object pixels to be encoded are processed by an one-dimensional DCT in a vertical direction and then in a horizontal direction.
Referring to FIG. 5A, an 8×8 block is illustrated which has an object region to be encoded. In FIG. 5A, gray pixels are pixels associated with objects. For processing an shape-adaptive DCT coding for texture signal of object pixels to be encoded, a re-arrangement of pixels is carried out by vertically shifting those object pixels to the upper border of the block, thereby filling that border, as shown in FIG. 5B. In this state, one-dimensional DCT is performed in a vertical direction (indicated by thick lines in FIG. 5) for texture information of each column including object pixels. As a result, transform coefficients of one-dimensional DCT are generated, as shown in FIG. 5C. The solid circles in FIG. 5C denote positions of mean values of vertical one-dimensional DCT, namely, direct current (DC) values, respectively. After completing the vertical one-dimensional DCT as shown in FIG. 5D, a pixel re-arrangement is conducted again by shifting again the object pixels to the left border of the block. Thereafter, one-dimensional DCT is performed in a horizontal direction for the transform coefficients included in each of rows, which comprise at least one transform coefficient, as shown in FIG. 5E. FIG. 5F shows positions of transform coefficients completing the one-dimensional DCT in both the vertical and horizontal directions. This procedure, namely, the transform, in which one directional DCT is carried out in a successive manner in the vertical and horizontal directions, is called a “shape-adaptive DCT”. It should be noted that the positions of transform coefficients resulting from the shape-adaptive DCT may not be coincide with those of input object pixels (or input shape information) and that the positions of those transform coefficients are determined only based on the input shape information. The number of transform coefficients resulting from the shape-adaptive DCT is equal to the number of object pixels, as in the conventional block-wised DCT schemes.
FIG. 6 illustrates a block diagram of a conventional shape-adaptive image signal coding apparatus (=texture information encoding part 24) which utilizes the above mentioned shape-adaptive DCT. This apparatus 24 receives, as an input thereof, shape information on a block basis having a M×N size (M and N are integer values larger than zero) and texture information of object pixels in the form of a block having the same size as the shape information block. In the case of FIGS. 5A to 5F, block of M=8 and N=8 is inputted. The apparatus 24 generates an output in the form of a bitstream. The outputted bitstream is transmitted to a receiver. Otherwise, the bitstream may be transmitted to a multiplexer (MUX) so that it is multiplexed with other signals, for example, bitstreams of shape information.
An shape-adaptive DCT part 10 first performs a shape-adaptive DCT for input texture information of object pixels, based on input shape information, thereby outputting transform coefficients which are positioned at the upper left portion of an associated block, as shown in FIG. 5F. For these transform coefficients, a quantization is conducted for data compression in a quantization part 11. As a result, quantized transform coefficients are outputted from the quantization part 11. The transform coefficients from the quantization part 11 are then transmitted to a scanning part 12 which, in turn, carries out a scanning procedure for arranging the received transform coefficients into a one-dimensional array. Various scanning methods applicable to blocks of M=8 and N=8 are illustrated in FIGS. 7A to 7C. FIG. 7A illustrates a zig-zag scanning order most widely used. In FIG. 7A, the numerals indicated on respective portions of the block are indicative of the scanning order of corresponding transform coefficients upon arranging those transform coefficients in a one-dimensional array. The scanning methods of FIGS. 7A to 7C can be selectively used in accordance with the characteristics of input image signals to be subjected to a transform coding. These scanning methods are also used in MPEG-2 and MPEG-4 schemes. Blocks, which are processed by the shape-adaptive DCT, may include segments containing no transform coefficient, as shown in FIG. 5F. In accordance with a conventional schemes, a scanning operation is carried out for the entire segment of a block in a sequential manner according to a well-known, predetermined scanning order while setting transform coefficient values of segments containing no transform coefficient to “zero”. That is, all segments—64 transform coefficients in the case of an 8×8 block—are sequentially scanned using one of the known scanning methods, irrespective of whether or not the segments to be scanned contain transform coefficients. The resultant transform coefficients arranged in a one-dimensional array are subjected to a variable-length coding in a variable-length coding part 13. The resultant signal in the form of a bitstream is then transmitted to an MUX or receiver.
The variable-length coding part 13 receives, as an input thereof, the DCT coefficients outputted from the scanning part 12 and carries out a variable-length coding for the transform coefficients arranged in a one-dimensional array in a sequential manner.
Now, the variable-length coding for transform coefficients will be described in conjunction with H. 263 which is a representative of conventional variable-length coding schemes. In accordance with this scheme, EVENTs of transform coefficients with values not being “zero” are first derived. For the derive EVENTs, corresponding bitstreams are then sought from a given variable-length coding (VLC) table. Thereafter, the sought codes are sequentially outputted. Each EVENT consists of a combination of three kinds of information, namely, a LAST indicative of whether or not the transform coefficient being currently encoded is the last non-zero transform coefficient, a RUN indicative of the number of successive zero transform coefficients preceding the current non-zero coefficient and a LEVEL indicative of the magnitude of the current transform coefficient.
Such EVENTs, which are combined and defined as mentioned above, can be entropy-coded through a variable-length coding such as Huffman coding or arithmetic coding. This is, a smaller amount of bits are allocated for EVENTs exhibiting a higher occurrence probability. On the other hand, a larger amount of bits are allocated for EVENTs exhibiting a lower occurrence probability. Accordingly, it is possible to achieve coding of EVENTs using a considerably reduced number of bits, as compared to a fixed-length coding (FLC).
A part of a VLC table that used for H.263, is illustrated in Table 1.
INDEXLASTRUNLEVELBITVLC CODE0001310s100251111s200370101 01s300480010 111s400590001 1111s5006100001 00101s6007100001 0010 0s7008110000 1000 01s8009110000 1000 00s90010120000 0000 111s100011120000 0000 110s110012120000 0100 000s120114110s1301270101 00s1401390001 1110s15014110000 0011 11s16015120000 0100 001s17016130000 0101 0000s1802151110s1902290001 1101s20023110000 0011 10s21024130000 0101 0001s2203160110 1s23032100001 0001 1s24033110000 0011 01s
In the Table 1, the first column represents indexes for distinguishing EVENTs from one another. The second column represents LASTs. LAST=0 denotes that coefficient to be coded is not the last non-zero transform coefficient whereas LAST=1 denotes that the coefficient is the last one. The third column represents RUNs. The fourth column represents LEVELs indicative of values of transform coefficients. The fifth column represents the number of bits generated for each EVENT. The last column represents a bitstream generated for each EVENT. In the last column, “s” is indicative of the sign of each LEVEL. A s of “0” (s=0) is indicative of the fact that the associated LEVEL is a positive number whereas a s of “1” (s=1) is indicative of the fact that the associated LEVEL is a negative number.
With regard to the occurrence probability of EVENTs, it should be noted that EVENTs exhibit a lower occurrence probability at greater RUN values and a higher occurrence probability at smaller RUN values. Where the variable-length coding is conducted, taking RUNs into consideration, accordingly, a larger amount of bits are allocated for an EVENT with a greater RUN value whereas a smaller amount of bits are allocated for an EVENT with a smaller RUN value. Referring to the VLC table of H.263 as shown in Table 1, such a feature can be clearly understood. This will be exemplarily described in conjunction with EVENTs (INDEX=1, 13, 19) respectively having RUNs of 0, 1, 2, along with LAST=0 and LEVEL=2. For the EVENT bearing INDEX=19 (LAST=0, RUN=2, and LEVEL=2), nine bits are allocated. Seven bits are allocated for the EVENT bearing INDEX=13 (LAST=0, RUN=1, and LEVEL=2). On the other hand, five bits are allocated for the EVENT bearing INDEX=1 (LAST=0, RUN=0, and LEVEL=2). Thus, a larger amount of bits are allocated at a greater RUN value under the condition in which the LAST and LEVEL values are fixed.
Known shape-adaptive DCT coding schemes may use the above mentioned transform coefficient VLC method or VLC table. This will be exemplarily described in conjunction with FIGS. 8A to 8E. FIG. 8A shows the result obtained after carrying out a shape-adaptive DCT and quantization for an image signal in the form of an 8×8 block. In FIG. 8A, the dark segments correspond to object pixels of the block, respectively. Accordingly, the dark segments correspond to segments containing transform coefficients generated due to the object pixels. The remaining bright segments are indicative of segments containing no transform coefficient, thereby being set with a transform coefficient value of “0”. In each dark segment, “Xij” represents a transform coefficient positioned at an i-th position in a horizontal direction and a j-th position in a vertical direction in 8×8 block. When all transform coefficients of the block are scanned in a zig-zag scanning order, they are arranged in the order of X11, X12, X21, X31, X22, X13, X14, 0, X32, X41, X51, X42, 0, 0, X15, X16, 0, 0, 0, X52, X61, X71, 0, 0, . . . , and 0. If the transform coefficient X71 is not zero, then the variable-length coding can be carried out only for segments positioned within the region defined by the thick solid line of FIG. 8B. FIG. 8C illustrates the case in which two transform coefficients X32 and X42 are zero whereas the remaining transform coefficients having non-zero values, respectively. When the transform coefficient X41 is encoded in this case, its LAST value is zero because it is not the last one of non-zero transform coefficients in the block. In this case, LEVEL and RUN values are X41 and 2, respectively. With regard to the transform coefficient X41, such an increase in RUN value caused by the transform coefficient X32 having a value of zero is reasonable because the zero value of the transform coefficient X32 is generated in accordance with a signal information of associated object. However, the transform coefficient value of zero existing between the transform coefficients X14 and X32 is a value not generated in accordance with the object information transform, but given for a segment containing no transform coefficient. Accordingly, transmission of such information is unnecessary and rather results in an increase in RUN value. As a result, there is a disadvantage in that an increased number of bits is generated. This causes a reduction of the coding efficiency of the shape-adaptive DCT coding.
Although problems involved in the conventional shape-adaptive DCT coding, which is a representative shape-adaptive transform coding scheme, have been described, other shape-adaptive transform coding schemes also have the same problems. That is, the conventional shape-adaptive transform coding schemes have a problem in that an addition of unnecessary values of zero is involved in the scanning procedure.
Meanwhile, a variety of VLC coding schemes for carrying out a VLC for transform coefficients have been used. Although the VLC table of H.263 has been exemplified in the above description, a VLC table consisting of a combination of only RUN and LEVEL is used in the cases of JPEG, MPEG-1, and MPEG-2. In this case, however, the above mentioned problem namely, reduction of coding efficiency, also occurs because an increase in RUN value results in an increase in the number of coding bits occurs.