1. Field of the Invention
The present invention relates to video signal coding and decoding, and more particularly an apparatus and method for coding and decoding low transfer rate video images.
2. Description of the Related Art
The international standards for a conventional video coding include the Joint Photographic Coding Experts Group (JPEG) for still picture coding/decoding, the Moving Picture Experts Group (MPEG) for motion picture coding/decoding, and the H.261 or H.263 for low transfer rate video coding/decoding. In response to an increasing demand for video communication over the existing Public Switched Telephone Network (PSTN), extensive studies have been made on the low bit video coding.
For example, the H.263 recommends the International Telecommunication Union-Telecommunication Standardization Sector (ITU-T, previously known as CCITT), which employs a motion compensated hybrid Differential Pulse Code Modulation/Discrete Cosine Transform (DPCM/DCT) coding method suitable to very low bit video transmission for realization of video telephone system. Particularly, the video coding method involves a DCT transformation and quantization of an input digital video signal; a restoration of the quantized video signal to detect a difference between the quantized video signal and the original video signal thereby estimating a motion; and a control of the quantizing step to attain a desired bit rate.
FIG. 1 is a block diagram showing a conventional motion picture coding apparatus based on DCT. The video coding is generally divided into an Intra (I) frame coding and a Predictive (P) frame coding. If an input video bit stream is an I frame, it is directly output to a DCT unit 101 without change. For a P frame, a difference between a motion-estimated data and the current input bit stream is output to the DCT unit 101 through the subtractor 110.
The DCT unit 101 eliminates the correlation of data by a dimensional coordinate transform. The DCT unit 101 decomposes the input frame in block units for the coordinate transform such that a portion of each picture block is transformed from a spatial domain to a frequency domain. Thus, the DCT transformed data are inclined to be driven in one direction towards the lower frequency band and are quantized at the quantizing (Q) unit 102. The quantization parameters such as a weight matrix and a quantization scale code are used for quantizing wherein the weight matrix indicates the weight of each DCT coefficient and the quantization scale code determines the quantizing step.
After quantization, each coefficient is output to an entropy coding unit 103, which performs a Variable Length Coding (VLC). Through the VLC, a frequently occurring value is represented by a smaller number of bits and an occasionally occurring value is represented by a larger number of bits, thereby reducing the entire number of bits to be transmitted to channel 104. The quantized data is also subjected to dequantization at a dequantizing unit 105 before an Inverse Discrete Cosine Transform (IDCT) at an IDCT unit 106. An adder 107 sums a motion-estimated data from a motion prediction unit 109 and the IDCT data, storing the summed value in a frame memory 108.
A continuous picture in a time axis usually includes a motion of a human or object in the center of the image. Based upon this idea, the motion prediction unit 109 eliminates redundancy by replacing an unchanged or similarly moving portion of the picture with the related portion of the previous picture. Thus, the amount of data to transmit is reduced to a large extent.
When the adder 107 stores the summed value in the frame memory 108, the data stored in the frame memory 108 forms the previous picture while the motion prediction unit 109 estimates the motion of a currently input picture. Motion estimation is performed by searching for the most similar blocks between the previous and current pictures, and a motion vector (MV) represents a degree of motion. The motion vectors are transmitted as well as the information concerning the VLC transform coefficients via the channel 104. The motion vectors also undergo VLC at the entropy coding unit 103 in order to attain a maximize the coding efficiency.
The MVs must first be obtained in order for the motion prediction unit 109 to perform a motion estimation. Up to four MVs are produced per one macro block, but only a difference between the current and previous MV is subjected to VLC for transmission because four MVs have a large number of bits to transmit. In the motion estimation, the motion prediction unit 109 uses forward and backward predicted blocks as well as involving two types of motion compensated frame.
The P frame is an estimated motion through the forward prediction and is used to predict the next P frame. The P frame is also usable for the forward and backward predictions of a bi-directionally (B) predicted frame. The B frame itself is not usable for prediction of other frames. On the other hand, an I frame is an image used as a criterion for performing an image compressive coding. Thus the original signals of an I frame is input to DCT transformation and quantization steps, thereby eliminating redundancy in spatial direction only.
The first frame is generally for I-frame coding, and when a transmission packet loss occurs, a transmitter may send an I frame at any other time by the request of a receiver. The I frame is used in the motion estimation for the P and B frames. Accordingly, the I frame coding also determines the coding efficiency for the following P and B frames. Especially, the background portion of the P frame image is not coded until the I-frame coding step has terminated and a scene change has occurred. As a result, the I-frame coding has an influence on the image quality. Thus, a better result of I-frame coding provides better coding of the subsequent P frame.
For the P frame coding, two coding steps are used with the input frames, namely a motion compensated prediction coding using a highly visible correlation between adjacent frames, and a displaced frame difference (DFD) coding involving an estimation error after performing a motion compensation. The DFD is the output of the subtractor 110, i.e. the signal difference between the current frame and the previous frame, the difference being as much as a motion vector. The DFD coding is the bulk of the P frame bits.
In most standardized coding, the DFD is coded in a same manner as the I frame video coding, which does not utilize the characteristic difference between a natural image and a DFD image. The difference arises because the DFD image has less spatial correlation, having much more mid and high frequency components compared to the natural image, including primarily smoothing regions. Thus, the DFD image provides a lower energy compression efficiency than the natural image, resulting in a deterioration of the overall efficiency in the run-length coding method using the existing zigzag scanning.
Furthermore, the number of DCT coefficients to be coded in low bit rate transmissions is too small. Yet, the coefficients are expressed by high quantizing levels, resulting in block and ringing effect.
Another issue of the motion picture coding is a bit rate control at the bit rate control unit 111. Because bits are generated by simply fixed quantizing parameters and coded by coefficients, a repetitive coding method is required to adjust to a specific target bit rate. Even though control is possible, an accurate bit rate control is hard to achieve.
The I-frame coding involves a structure for bit rate control by simply controlling the quantization step size (quantization parameterxc3x972). Thus, the DCT error increases due to a large quantization distance in low bit rate transmissions, resulting in a continuous effect on the coding including motion estimation and compensation of P frames. As a result, the entire coding performance is deteriorated.
Moreover, the bits required for the I-frame coding when transmitting at a low bit rate (12-48 kbps) takes 40-70% of the total bits for transmission. Thus, an efficient coding of the I frame is essential to an enhancement of the entire coding performance. With the increasing demand for more efficient video coding, a concept of Embedded Zerotree Wavelet (EZW) was introduced recently which gave rise to many studies on the embedded zerotree video coding for still picture compression.
The EZW provides a bit rate distortion performance much more enhanced than the existing DCT-based video coders such as the JPEG. The embedded zerotree coding is widely used for the coding of wavelet transformed coefficients. A zerotree structure of wavelet coefficients is used to code position and amplitude information in the order of priority so that a bit stream may be obtained, arranged based on the significance of the bit stream.
The EZW method has an excellent compression performance, scalability characteristic of various resolution and image quality, and accurate bit rate control as well as a simple algorithm. The method allows a production of a good quality image at any given bit rate and an easy bit rate control, even when a bit stream transmission is suddenly interrupted at certain points in time. The primary characteristic of the embedded zerotree coding is in the interband prediction of the significant coefficient positions by utilizing a self-similarity of wavelet transform and a successive approximation quantization (SAQ) which approximates the amplitude of wavelet coefficient in succession. The schematic process of the embedded zerotree method will be described below.
Utilizing a wavelet transformation, an input image is first decomposed into subbands having different resolutions. The low frequency components of the original image are grouped into the most coarse subband while the high frequency components are grouped into the rest of subbands. The coefficients in each subband, excluding the highest frequency band, have a relationship with the coefficients having a similar direction in the next subband.
The coefficients in the most coarse subband is referred to as a xe2x80x9cparentxe2x80x9d and a set of coefficients in the other subbands at a position of the similar direction is called a xe2x80x9cchildren.xe2x80x9d The parent node in the lowest frequency band has three children of different directions. The EZW produces a zerotree data structure from the parent-children relationship.
The zerotree structure assumes that if the wavelet coefficient in the most coarse band is lower than a given threshold, there is a high probability that the children of the wavelet coefficient are also low. Such a zerotree structure is very similar to the concept of zigzag scanning and End Of Block (EOB), generally used to code DCT coefficients. For example, the EZW involves the scanning of coefficients by bands.
Particularly, the children of a given parents are also scanned after scanning the neighboring parents within the band of the given parents. Each coefficient is compared with a current threshold. If the absolute value of a coefficient is larger than the threshold, the coefficient is coded into either one of a negative or a positive significance symbol. A zerotree root symbol is used to code parents whose children constitute a zerotree structure, wherein all children have a value below the threshold. An isolated zero symbol is used to code coefficients that have at least one children greater than the threshold.
The EZW involves a further coding using SAQ for the coefficients that are determined as significant information. The SAQ for quantizing the wavelet coefficients results in an embedded bit stream arranged in the order of priority based upon significant bits. Spatial data grouping and quantization are enabled in the EZW because the wavelet has both frequency and spatial information.
Another embedded video coder with an excellent compression performance is a Set Partitioning in Hierarchical Trees (SPIHT) using an improved zerotree coding method. The SPIHT coding method eliminates inter-coefficient redundancy by updating significant nodes with respect to a decreasing threshold. The SPIHT coding method is more efficiency than the EZW method because of an improved zerotree structure utilizing the characteristic of significant coefficients being primarily distributed in the lowest frequency band.
Although both the EZW and SPIHT methods may be effective in the coding of still picture for a high resolution image of 512xc3x97512, the methods are inappropriate for a low bit rate transmission video compression, performed usually at a relatively low resolution of 176xc3x97144. For example, in a Quarter Common Intermediate Format (QCIF), the coding efficiency is significantly reduced because a deterioration of the wavelet characteristics due to a deterioration of the zerotree coding efficiency. Also, an application of the EZW method to the DFD coding results in a lower overall efficiency relative to the DCT-based method because the break down of the wavelet-based pyramid structure does not reflect the reduced energy compression efficiency of the upper and mid level high frequency components.
Moreover, the DCT is a single transform and is relatively well-defined even with slight variations, while the wavelet may be one of various types including, but not limited to an Orthogonal, a biorthogonal, a wavelet packet, and a multi wavelet. Accordingly, there are many combinations of video coders, leading to a problem of selecting and designing a wavelet-based video coding and decoding apparatus.
Furthermore, a wavelet may be incompatible with the existing video coders considering that the 8xc3x978 block DCT is used to code I frames and errors with a motion-compensated image, i.e. Displace Frame Difference (DFD), in most video compression standards for still pictures such as JPEG, MPEG (MPEG1, MPEG2 and MPEG4), H.261 and H.263.
Accordingly, an object of the present invention is to solve at least the problems and disadvantages of the related art.
An object of the present invention is to provide a video coding/decoding apparatus and method using DCT while maintaining the advantages of the embedded video coders.
Another object of the present invention is to provide a video coding/decoding apparatus and method designed to enhance the zerotree coding efficiency which is compatible with the existing DCT-based coders.
A further object of the present invention is to provide a DCT-based embedded video coding/decoding apparatus and method suitable to transmit an image at a low transfer rate.
A still further object of the present invention is to provide a DCT-based embedded video coding/decoding apparatus and method designed to enhance the coding efficiency of error signals with a motion-compensated image.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and advantages of the invention may be realized and attained as particularly pointed out in the appended claims.
To achieve the objects and in accordance with the purposes of the invention, as embodied and broadly described herein, a video coding method according to the present invention includes the steps of: (1) decomposing an input frame into a plurality of blocks and transforming each block from a spatial domain to a frequency domain; (2) sorting and rearranging the frequency band-based coefficients transformed in step (1) in the order of significance priority based on a degree how much information required for image reproduction is included; (3) coding information concerning the position and amplitude of the coefficients rearranged in step (2) in the order of priority based upon significance and outputting bit streams arranged according to the significance.
A video coding apparatus of the present invention includes a transforming unit for decomposing an input frame into a plurality of blocks and transforming each block from a spatial domain to a frequency domain; a rearranging unit for sorting and rearranging the transformed coefficients in the order of significance priority based on a degree how much information required for image reproduction is included; and a zerotree coding unit for coding information concerning the position and amplitude of the rearranged coefficients in the order of priority based upon significance and outputting bit streams arranged according to the significance.