There is an explosive demand for a scalable encoding method as an image encoding method, both a still image and a moving picture alike. Particularly, people want to obtain, manage and modify image information by using mobile telecommunication services that makes anyone possible to communicate with whomever, wherever and whenever with use of image information, and information household appliances that are connected with various kinds of computers such as laptops, palm top computers, PDAs and so forth, which have been brought with the introduction of a wireless internet.
Therefore, diverse forms of image information household appliances such as an IMT-2000 video phone and HDTV will be shown in the market and the decoding ability or information transmission environment of those image information household appliances will be different from each other, for the properties and application environment are different according to the kind of a terminal.
What needs to be considered here is how to transmit moving picture that is suitable to each terminal. For instance, if encoding is done in agreement with a low quality decoder, a user with a high quality decoder will receive the low quality image with his expensive decoder, which no one ever wants. That is, a user with a high quality decoder should well have to obtain high quality image and even a user with a low quality decoder will have to be transmitted with quite a level of an image.
To address this problem, MPEG-4 (Moving Pictures Expert Group-4) designs to provide various levels of image quality according to the environment and performance of a terminal on the receiving part. For example, when the terminal of the receiving part is of high computing power and delivery layers, e.g., wireless, ATM, LAN, etc., are in good condition, it can receive and display a high quality moving picture. However, when its computing power and delivery lines are not in good condition, it cannot receive the high quality image. To accommodate both cases, MPEG-4 is designed to perform scalable coding.
The scalable coding is a method of the encoding part making and transmitting scalable bitstreams so that the receiving part could receive various qualities of an image from the low quality to the high quality. That is, if bitstreams are scalable, a low-performance receiving terminal will receive and display image bitstreams of basic quality, which have been encoded in the base layer while a high-performance receiving terminal receives and displays high quality image bitstreams, which have been encoded in the enhancement layer.
The scalable coding method largely consists of a base layer and an enhancement layer. The base layer of the encoding part transmits basic moving picture information and its enhancement layer transmits information for providing image of advanced quality in addition to the basic quality of moving picture information so that the receiving part could put the information and the information from the base layer together and decode into high quality image.
Therefore, the receiving part gets to decode image information of the two layers transmitted in accordance with the computing power of the receiving terminal and the condition of the delivery layers. So, if a decoder does not have sufficient decoding ability for all information transmitted through the delivery layers, it will be able to decode only information of the base layers, which is the minimum image quality compensation layer, and the information of the enhancement layer will not be decoded and dismissed. In the mean time, a high quality receiving apparatus can bring in information from all layers and achieves high quality image. This way, using the scalable coding method, images satisfying both users with a high quality decoder and with a low quality decoder can be transmitted.
The present scalable coding methods are classified into two types: one is a spatial scalable coding method, the other is a temporal scalable coding method. The spatial scalable coding method is used for improving the spatial resolution step by step while the temporal scalable coding method is used to improve the number of images (in case of TV broadcasting, 30 frames/sec) shown in a unit time on the axis of time (for example, 10 Hz→30 Hz). To do the scalable coding, MPEG-4 forms one or more enhancement layers and transmits bitstreams to the receiving part. In case of a moving picture coding using one enhancement layer, the base layer encodes and transmits image of low resolution both spatially and temporally basically, while the enhancement layer additionally encodes and transmits image information for embodying improved resolution in addition to the image information transmitted from the base layer.
Conventional scalable coding method described above is designed suitable when the delivery layers are in a relatively stable and good condition. That is, an image frame can be restored only when the receiving part receives all bitstreams transmitted from the enhancement layers. If the condition of the delivery layers changes (the bitstream bandwidth that delivery layers can accommodate changes: delivery layers like the Internet changes its bandwidth to be allocated to users by external factors such as the number of Internet users) and all the bitstreams from the enhancement layer are not received, the corresponding image cannot be restored normally. In this case, the receiving part should request the transmitting part for retransmission, or give up performing image restoration until all the bitstreams are received, or perform transmission error concealment by using the previous frame image.
It frequently happens in the wired/wireless Internet that image bitstreams are not transmitted as fast as to catch up with the real-time due to the unstable condition of the delivery layers condition. In short, to restore the transmitted image in real-time even when the bandwidth changes due to the unstable delivery layer condition as happens in the wired/wireless Internet, the receiving part must be able to restore image in real-time with the part of image bitstreams which have been received till then, although it hasn't received all the bitstreams. One example for this is a fine granular scalability (FGS) method suggested by the MPEG-4 and established as a draft international standard.
The fine granular scalable coding method makes it possible to restore a transmitted image with bitstreams that have been received till then, when the receiving part does not receive all the bitstreams encoded in and transmitted from the base layer encoder and the enhancement layer encoder, for instance, when the delivery layer is in unstable, and the delivery layer changes suddenly such as in the wired/wireless Internet and the bandwidth to be allocated to users changes while the scalable coding is performed. It is designed to supplement the shortcoming of the conventional scalable coding method embodied in consideration of a stable delivery layer, in which image can finally be restored after all bitstreams are received, thus causing delay in receiving image, and retransmission has to be requested or transmission error concealment should be performed when transmission error generates.
In order to receive part of image bitstreams and make the transmitted image restored efficiently at the receiving part, the fine granular coding method transmits image bitstreams on a bit-plane basis, when the transmitting part embodies an image with improved quality at the base layer based on the transmitted image and transmits it. That is, it is similar to the conventional scalable coding method in that it improves the quality of a transmitted image by sending out image difference between the original image and the image transmitted from the base layer, when transmitting bitstreams needed for the enhancement layer from the transmitting part to the receiving part. But even when the bandwidth of the delivery layers changes suddenly and not all the bits needed for image restoration have been received this present method can restore an image, to what extend, with bitstreams as much as received till then, by dividing image information to be transmitted according to each bit-plane, transmitting the most significant bit (MSB) with priority, and then dividing the next significant bit according to each bit-plane and transmitting them on and on.
For instance, when we suppose that there is image information of 25 to be transmitted and when we express it into binary numbers, it becomes “11001,” which consists of five bit-planes. To transmit this information per bit-plane, first of all, the transmitting part should notify the receiving part that the transmission information is composed of five bit-planes. Then when it is supposed to be transmitted to the receiving part from the most significant bit (MSB) to the least significant bit (LSB) on a bit basis, if the transmission of the first MSB is completed, the receiving part will acknowledge that the transmitted information is a figure more than 16(10000), and after the transmission of a second MSB, it will get to know that a figure more than 24(11000) will be transmitted thereto. If no more bitstream can be transmitted to the receiving part due to the width restriction of the delivery layer, the receiving part can restore the FIG. 24, a similar figure of what is originally supposed to be transmitted, by using the bitstream (11000) transmitted till then.
The fine granular scalable coding method used in MPEG-4 considers a situation where the bandwidth of the delivery layer may change at any time. The structure of the basic fine granular scalable coding method is shown in FIG. 1A.
FIG. 1A is a structural diagram of the conventional basic fine granular scalability (FGS) coding method. As illustrated in the figure, it has a base layer and a fine granular scalability layer as an enhancement layer. The base layer is adopting the conventional MPEG-4 encoding method without any intactness. It is unique in that it only seeks to increase coding efficiency in the base layer, not considering any method for increasing coding efficiency in the FGS layer, the enhancement layers, because delivery layer should be considered to do it.
Just as shown, spatial scalability should adopt the structure of FIG. 1A, while for temporal scalability, structures of FIGS. 1B and 1C are to be adopted.
FIG. 1B shows a structural diagram of the conventional fine granular scalability (FGS) coding method with two improvement steps of FGS and FGST (Fine Granular Scalability Temporal) and FIG. 1C represents a structural diagram of the conventional fine granular scalability (FGS) coding method with an enhancement step in which the FGS and FGST are integrated.
Here, the FGST (Fine Granular Scalability Temporal) carries out motion estimation and compensation to increase coding efficiency. But this also considers a method for increasing coding efficiency in the base layer only.
FIG. 2A shows the structure of an encoder, i.e., the transmitting part, of a fine granular scalable coding method used in the MPEG-4 Draft International Standard.
The figure, FIG. 2A, is a structural diagram depicting an encoder of the conventional fine granular scalability (FGS) coding method in accordance with an embodiment of the present invention.
As shown in the drawing, the base layer is using the MPEG-4 image encoding method as it is without any intactness. The image encoding method used in the base layer includes performing image data compression in the direction of the spatial axis and the temporal axis by performing discrete cosine transform (DCT), quantization (Q), motion estimation (ME), motion compensation (MC), inverse quantization (Q−1), and inverse discrete cosine transform (IDCT) implementing entropy coding according to the preponderance of sign generation probability by performing variable length coding, and transmitting base layer bitstream generated while encoding to delivery layer with use of a transmission buffer.
As shown in the drawing, the FGS encoding of the enhancement layer is performed through the procedures of obtaining residues between the original image and the image restored in the base layer, performing discrete cosine transform (DCT), performing bit-plane shift, finding maximum value, and performing bit-plane variable length encoding (Bit-plane VLC).
In the procedure of obtaining the residue, the residue is obtained by calculating the difference between the original image and the image restored in the base layer, the image that passes through Q−1 and IDCT and clipped in the drawing.
In the process of discrete cosine transform, the image-base residues obtained in the above procedure is transformed into the DCT domain by using a block-unit DCT, which is 8×8.
Here, if you want a block with optionally higher quality, the corresponding value has to be transmitted prior to anything else, and for this, bit-plane shift may be performed optionally. This is defined as a selective enhancement, which is performed in the procedure of bit-plane shift.
In the procedure of finding the maximum value, the maximum value is obtained out of all the other values that have gone through the discrete cosine transform according to their absolute value. The maximum value is used to calculate the number of maximum bit-planes for transmitting a corresponding image frame.
In the procedure of the bit-plane variable length encoding, 64 DCT coefficients obtained on a block basis according to each bit-plane are inserted in a matrix in a zigzag scan order, the bit-plane of a corresponding bit of a DCT coefficient being 0 or 1, and each matrix is run-length encoded according to the variable length code table (VLC table).
FIG. 2B shows the structure of a decoder, i.e., the receiving part, of a fine granular scalable coding method used in the MPEG-4 Draft International Standard.
FIG. 2B is a structural diagram depicting a decoding part of the conventional fine granular scalability (FGS) coding method in accordance with an embodiment of the present invention.
As illustrated in the drawing, the decoding of transmission bitstreams that are divided into the base layer and the enhancement layer and transmitted from the delivery layers is performed in reverse to that of the encoder depicted in FIG. 2A.
In the base layer, the MPEG-4 image decoding method is used as it is without any intactness. The image transmitted from the base layer is restored by after the bitstream is inputted in the base layer, conducing variable length decoding (VLD), performing inverse quantization (Q−1), carrying out inverse discrete cosine transform (IDCT) on the corresponding values, adding them to motion compensation (MC) values, and clipping the corresponding values between the values from 0 to 255.
In the enhancement layer of the fine scalable coding method, the decoding of the bitstreams transmitted to the enhancement layer is performed in reverse to that of an encoder. First, bit-plane VLD is performed on the inputted enhancement bitstream, and if the location of a block with optionally higher image quality optionally, bit-plane shift may be performed.
On the values obtained by conducing bit-plain VLD and performing shift optionally, block-based (8×8) inverse discrete cosine transform (IDCT) is performed and the image transmitted from the enhancement layer is restored. Then the image is combined with the image decoded in the base layer, and clipping the sum values into the values between 0 and 255, restoring the image improved finally.
The problem of the conventional technique described above is as following.
The scalable coding method that has been used conventionally in encoding moving pictures is designed to be suitable for a condition where delivery layers are relatively stable. A corresponding image frame can be restored only when all the bitstream transmitted from the enhancement layer of the transmitting part is received in the receiving part. Here, if the condition of the delivery layers changes suddenly, for instance, the bandwidth that the delivery layer can accommodate changes, or in such a delivery layer as the Internet, the bandwidth to be allocated to users changes by external factors like the number of internet users, and all the bitstreams from the enhancement layer are not received, that image can not be restored and shown properly. Therefore, there is a shortcoming of having to request retransmission to the receiving part, aborting image restoration until all bitstreams are received, or performing transmission error concealment by using the image of previous image.
Meanwhile, supplementing the shortcoming by considering a delivery layer where the conventional scalable coding method is stable, images transmitted from the transmitting part to the receiving part should be restored in real-time even when the bandwidth changes due to the unstable delivery layers such as the wired/wireless Internet. One method for it is a fine granular scalability (FGS) method, which restores a transmitted image real-time by using image bitstreams received until then when the receiving part does not receive the whole bitstreams. Here, to make an image restored with only part of the whole bitstreams, only a method maximizing the coding efficiency from the base layer should be used. A method like increasing image coding efficiency between enhancement layers does not work.
It is figured out that moving picture coding methods using DCT, which are mostly used in JPEG (Joint Photographics Expert Group), H.263, MPEG and so forth, are coded and transmitted on a macro block and 8×8 block basis. Here, the encoding and decoding of all the image frames or the video object plane (VOP) begin from the macro block, or block, at the upper-left line of the image and proceed to the one at the bottom-left part successively. In this invention, this is referred to as normal scan order, which is illustrated in FIG. 3A.
The normal scan order is a scan order that has to be used necessarily to restore image normally at the receiving part. It uses such methods as motion estimation and compensation, DC value estimation, of increasing coding efficiency between the base layer and the enhancement layer, or between enhancement layers.
When applying the scan order to the scalable coding method that makes it possible to restore the image with only part of bitstreams it received, part of macro block or block at the upper part is decoded and the restored image is displayed on the screen of the receiving part as illustrated in FIG. 3B. The black blocks are decoded blocks, while white blocks are ones that have not been decoded yet.
That is, bitstreams transmitted from the base layer added with those partial bitstreams received from the enhancement layer and decoded display an improved image in the receiving part. As depicted in FIG. 3B, if only upper part of the image data are received and decoded from the enhancement layer, the restored image gets to have improved image only on the part where decoding is performed in the enhancement layer. However, there is a shortcoming that in case the improved part of the restored image is where viewers do not pay attention, such as background, or something else except the face of an actor, this process of receiving and restoring bitstreams of the enhancement layer becomes of no use.
In the mean time, as shown in FIG. 4, the conventional method for coding image and moving picture applied with subband coding that uses such a method as a wavelet coding is using the normal scan order, conducing encoding and decoding on a pixel basis according to each subband from the image data of the upper-left pixels toward bottom-left pixels. When applying this method to the scalable coding method that restores image with the reception of partial bitstreams, the pixel values above the subband received finally are decoded and the restored image of them is displayed in the screen of the receiving part. That is, bitstreams transmitted from the base layer is received, added to those decoded in the enhancement layer and generates improved image in the receiving part. Here, in case the data of the upper part of the image are received and decoded, the restored image will show improved image quality in the part whose image data are decoded in the enhancement layer, which is marked in FIG. 4. But there is a shortcoming that in case the improved part of the restored image is where viewers pay no or less attention, such as background or something else except the face of actors, this process of receiving bitstreams of the enhancement layer and restoring becomes of no use because they don't recognize it.