There is an explosive demand for a scalable encoding method as a method for encoding an image including still images and moving pictures. Particularly, people want to obtain, manage and modify image data using mobile telecommunication services that makes anyone possible to communicate with whomever, wherever and whenever with use of image data, and information household appliances that are connected with various kinds of computers such as laptops, palm top computers, PDAs and so forth, which have been brought with the introduction of a wireless internet. Therefore, diverse forms of image data household appliances such as IMT-2000 video phones and HDTV will be shown in the market and the decoding ability or information transmission environment of those image data household appliances will be different from each other, for the properties and application environment are different according to the kind of a terminal.
What needs to be considered here is how to transmit a moving picture suitably to the reception environment of each terminal. For instance, if encoding is carried out agreeably to a low quality decoder, a user with a high quality decoder will receive the low quality image with his expensive decoder, which no one ever wants. That is, a user with a high quality decoder may well have to obtain high quality image, and even a user with a low quality decoder will have to receive quite a level of an image. For example, when the terminal on the receiving end is of high computing power and the delivery layers, e.g., wireless, ATM, LAN, etc., are in a good condition, it can receive and display a high quality moving picture. However, when its computing power and delivery lines are not in a good condition, it cannot receive the high quality image.
To address this problem, Moving Pictures Expert Group-4 (MPEG-4) designs to provide an image in various levels of image quality based on the environment and performance of a terminal on the receiving part.
A scalable encoding is a method where the encoding part makes and transmits scalable bit streams so that the receiving end could receive the image in various image qualities from the low quality to the high quality. That is, if bit streams are scalable, a low-performance receiving terminal will receive and display image bit streams of basic quality, which have been encoded in the base layer, while a high-performance receiving terminal receives and displays high quality image bit streams, which have been encoded in the enhancement layer.
The scalable encoding method largely consists of a base layer and an enhancement layer. The base layer of the encoding part transmits basic moving picture data and its enhancement layer transmits data for providing an image of an advanced quality in addition to the moving picture data of a basic quality so that the receiving end could put the data from the base layer and the data from the enhancement layer together and decode into a high quality image.
Therefore, the receiving end performs decoding on the image data from the two layers transmitted in accordance with the computing power of the receiving terminal and the delivery layer condition. If a decoder does not have sufficient decoding ability for all the data transmitted through the delivery layers, it decodes the data from the base layer only, which is the minimum image quality compensation layer, and the data from the enhancement layer remains undecoded and dismissed. In the mean time, a high-quality receiving terminal can afford all the data from all layers and achieves high quality images. Accordingly, it is possible to receive images that can satisfy both users with a high quality decoder and those with a low quality decoder by using the scalable encoding method.
A conventional scalable encoding method is designed suitable for a case where the delivery layers are in a relatively stable and good condition. That is, an image frame can be restored completely only when the receiving end receives all bit streams transmitted from the enhancement layers. If the delivery layer condition is changed (the bit stream bandwidth that the delivery layers can accommodate is changed: the delivery layers like the Internet changes its bandwidth to be assigned to users by external factors, such as the number of Internet users) and the entire bit streams from the enhancement layer are not received, the corresponding image frame cannot be restored normally. In this case, the receiving end should request the transmitting part for retransmission, or give up performing image restoration until all the bit streams are received, or perform transmission error concealment by using the preceding frame image.
It frequently happens in the wired/wireless Internet that image bit streams are not transmitted as fast as to catch up with the real-time due to the unstable delivery layer condition. In short, to restore the transmitted image in real-time even when the bandwidth is changed due to the unstable delivery layer condition as it happens in the wired/wireless Internet, the receiving end must be able to restore the image in real-time only with part of the image bit streams which have been received till then, although it hasn't received all the bit streams. One example for this method is a fine granular scalability (FGS) method suggested by MPEG-4 and established as a draft international standard.
The FGS encoding method makes it possible to restore a transmitted image with some bit streams that have been received till then, when the receiving end does not receive all the bit streams encoded in and transmitted from the base layer encoder and the enhancement layer encoder, for instance, when the delivery layer is unstable, and is changed suddenly, just as the wired/wireless Internet is and the bandwidth to be assigned to the users is changed during the scalable encoding. It is designed to supplement the shortcoming of the conventional scalable encoding method, which is embodied in consideration of a stable delivery layer.
In order to restore an image efficiently with part of the image bit streams at the receiving end, image bit streams are transmitted on a bit-plane basis, when the transmitting end forms an image with an improved quality at the base layer based on the transmitted image and transmits it. That is, the FGC method is similar to the conventional scalable encoding method in that it improves the quality of the transmitted image by sending out image difference between the original image and the image transmitted from the base layer, when bit streams needed for the enhancement layer are transmitted from the transmitting part to the receiving end. However, with the method of the present invention, although the bandwidth of the delivery layers is changed suddenly and not all the bits needed for image restoration have been received, an image can be restored by using the bit streams received till then. According to this method, image data to be transmitted are divided into bit-planes. Subsequently, the most significant bit (MSB) is transmitted on a top priority, and then the next significant bit is transmitted and the process is repeated on and on.
FIG. 1A is a block diagram illustrating a structure of a conventional FGS encoder, and FIG. 1B is a block diagram illustrating a structure of a conventional FGS decoder. As depicted in the drawing, the base layer of the FGS encoder defined in the MPEG-4 international standards adopts the MPEG-4 image encoding method.
The FGS encoder includes discrete cosine transform (DCT) units, a bit-plane shifting unit, a maximum value calculating unit, a bit-plane-based variable length encoding (VLC) unit, a quantization (Q) unit, a variable length encoding (VLC) unit, a motion compensation (MC) unit, an inverse quantization (Q−1), an inverse discrete cosine transform (IDCT), a motion estimation (ME), a frame memory, and a clipping unit.
In the image encoding method, image data are impressed in the spatial and temporal directions through the DCT, quantization unit, ME unit, MC unit, inverse quantization unit, and IDCT unit. Then, entropy encoding is carried out based on the preponderance of sign generation probability by performing VLC, and thus base layer bit stream is transmitted.
As shown in the drawing, the FGS encoding of the enhancement layer is performed through the procedures of obtaining residues between the original image and the image restored in the base layer, performing DCT, performing bit-plane shift, finding maximum value, and performing bit-plane VLC.
In the procedure of obtaining the residue, the residue is obtained by calculating the difference between the original image and the image that is restored after encoded in the base layer. The latter image is a restored image that has passed through the inverse quantization unit (Q−1), the IDCT unit, and the clipping unit in the drawing.
The DCT unit transforms the image-based residue obtained in the above procedure into the DCT domain by using a block(8×8)-based DCT.
Here, if you want a block with optionally higher quality, the corresponding value should be transmitted prior to anything else, and for this, bit-plane shift may be performed optionally. This is defined as a selective enhancement, and it is performed in the bit-plane shifting unit.
The maximum value calculating unit calculates the maximum value among the absolute values of all the other values that have gone through DCT. The obtained maximum value is used to calculate the number of maximum bit-planes for transmitting a corresponding image frame.
The bit-plane VLC unit forms 64 DCT coefficients (bit of the bit-planes corresponding to a DCT coefficient: 0 or 1) obtained on a block basis into a matrix in a zigzag scan order. Each matrix is run-length encoded according to the VLC table.
As illustrated in FIG. 1B, a structure of a conventional FGS decoder defined in the MPEG-4 Draft International Standards is divided into the base layer and the enhancement layer. The decoding of the bit streams transmitted from the delivery layers is performed in reverse to the encoding process of the encoder depicted in FIG. 1A.
In the base layer, the MPEG-4 image decoding method is used as it is without any intactness. The FGS encoder includes a bit-plane variable length decoding (VLD) unit, a bit-plane shifting unit, inverse discrete cosine transform (IDCT) units, clipping units, a VLD unit, an inverse quantization (Q−1), a motion compensation (MC) unit, and a frame memory. The image transmitted from the base layer is restored by after the bit stream is inputted into the base layer, performing VLD, performing inverse quantization, carrying out IDCT on the corresponding values, adding them to MC values, and clipping the corresponding values between the values from 0 to 255.
In the enhancement layer adopting the FGS encoding method, the decoding of the bit streams transmitted to the enhancement layer is performed in reverse to the encoding process of the encoder. First, bit-plane VLD is performed on the inputted enhancement bit stream, and if the location information on a block having optionally higher image quality optionally is transmitted, bit-plane shift may be performed optionally.
Subsequently, the IDCT unit performs block(8×8)-based IDCT on the values obtained by performing the bit-plane VLD and performing the optional shifting to restore the image transmitted from the enhancement layer. Then, it clips the values summed to the image encoded in the base layer into the values between 0 and 255 to finally restore the image with improved quality.
Here, in order to restore an image with as many bit streams as received till then, only a method that can maximize the encoding efficiency of the base layer, and no other methods that enhances encoding efficiency of the enhancement layer may be used.
FIG. 2A is an exemplary view illustrating a conventional raster scan order in an image and moving picture encoding method using DCT, and FIG. 2B is an exemplary view applying the conventional raster scan order to the scalable encoding method.
In an image encoding methods using DCT that is usually used in Joint Photographic Experts Group (JPEG), H.263, MPEG, image data is encoded and transmitted on a macro block basis or on an 8×8 block basis. Here, the encoding and decoding of all the image frames or the video object plane (VOP) begin from a macro block, or block, at the top-left line of the image and proceed to the one at the bottom-left part successively. In this invention, this is referred to as Raster Scan Order, which is illustrated in FIG. 2A.
The raster scan order is a scan order that should be used necessarily to apply a method for enhancing encoding efficiency between the base layer and the enhancement layer, or between the enhancement layers to the conventional image or moving picture processing method.
When applying the raster scan order to the scalable encoding method that makes it possible to restore an image with some bit streams received till then only, part of macro blocks or blocks on the upper part are decoded and the restored image is displayed on the screen of the receiving end as illustrated in FIG. 2B. The black blocks are decoded blocks, while white blocks are ones that are not decoded yet.
That is, in a process of restoring an improved image at a receiving end based on the bit streams transmitted to the base layer and some bit streams transmitted to the enhancement layer and decoded, as depicted in FIG. 2B, if only upper part of the image data are received and decoded in the enhancement layer, the restored image gets to have improved image only on the part where decoding is performed in the enhancement layer. However, this method has a shortcoming that the improved part of the restored image may be somewhere viewers do not pay attention, such as background, or something else except the face of an actor, and thus this process of receiving and restoring bit streams of the enhancement layer becomes useless.