1. Field of the Invention
The present invention relates generally to video encoding and decoding for moving and still pictures and more specifically to multi-dimensional-scalable video compression and decompression of high resolution moving and still pictures.
2. Description of the Related Art
As high definition television begins to make its way into the market, the installed base of existing television systems and video storage systems that operate at reduced definition must not be ignored. To address the complex problem of different resolutions and standards several techniques are available. One of these techniques, scalable video coding, provides for two or more resolutions simultaneously in the video coding scheme to support both the installed base of standard resolution systems and new systems with higher resolution.
One scalable video coding technique is spatial scalability, which seeks to provide two or more coded bit streams that permit the transmission or storage of a lower resolution and a higher resolution image. One stream, a lower resolution encoded image stream, contains the lower resolution image data and the other stream, an encoded difference image stream, contains the data needed for forming a higher resolution image when combined with the lower resolution image. An encoded image stream is a time sequence of frame pictures or field pictures, some of which may be difference frames, that are encoded in accordance with a particular standard such as JPEG, MPEG-1, MPEG-2 or MPEG-4 or other similar standard. A source image stream is a time-ordered sequence of frame pictures or field pictures F1-Fn, each containing a number of pixel blocks, that are presented to an encoder for coding or generated from a decoder for viewing.
FIG. 1 shows a standard MPEG-2 encoder 10, which is modified in FIG. 3 to become a spatial scalable codec. In FIG. 1, the standard encoder 10 has a first adder (subtractor) 12 that receives an input frame sequence Fn and a predicted frame sequence Pxe2x80x2n and forms the difference between the two, (Fnxe2x88x92Pxe2x80x2n). A discrete cosine transform (DCT) coder 14 next transforms the difference (Fnxe2x88x92Pxe2x80x2n) into the frequency domain to generate (Fnxe2x88x92Pxe2x80x2n)T. A quantizer (Q) 16 receives the difference and quantizes the difference values to generate (Fnxe2x88x92Pxe2x80x2n)TQ and a variable length coder 18 (VLC) entropy encodes the result to create the output bit stream (Fnxe2x88x92Pxe2x80x2n)TQE.
To generate a predicted frame sequence Pxe2x80x2n, a local decoder loop is used (where primed symbols indicate a decoded or reconstructed signal). The predicted frame Pxe2x80x2n can be either a forward or a forward and backward predicted frame. The local decoder starts at an inverse quantizer (IQ) 20 which receives the (Fnxe2x88x92Pn)TQ to form a sequence of transformed difference frames (Fxe2x80x2nxe2x88x92Pxe2x80x2n)T. An inverse DCT coder 22 receives the transformed difference frames (Fxe2x80x2nxe2x88x92Pxe2x80x2n)T and generates the original (Fxe2x80x2nxe2x88x92Pxe2x80x2n) difference sequence following which a second adder 24 sums the original difference sequence (Fxe2x80x2nxe2x88x92Pxe2x80x2n) with the predicted frame Pxe2x80x2n causing the output of the adder to generate a reconstructed original frame sequence Fxe2x80x2n. A frame store (FS) captures the recovered frame sequence Fxe2x80x2n and produces a delayed frame sequence Fxe2x80x2nxe2x88x921. Motion Estimator (ME) block 28 receives the original frame sequence Fn and the delayed frame sequence Fxe2x80x2nxe2x88x921 from the local decoder loop and compares the two to estimate any motion or change between the frame sequences in the form of displaced blocks. ME generates a motion vector mVn which stores information about the displacement of blocks between Fxe2x80x2n and Fxe2x80x2nxe2x88x921. A motion compensation predictor (MCP) 30 receives the motion vectors and the delayed frame sequence. Fxe2x80x2nxe2x88x921 and generates the predicted frame Pxe2x80x2n which completes the loop.
The encoding process starts without any initial prediction, i.e., Pxe2x80x2n=0, which permits the frame store FS 26 to develop a first stored frame Fxe2x80x2n=Fxe2x80x2nxe2x88x921. On the next input frame, a prediction Pxe2x80x2n is made by the MCP 30 and the encoder begins to generate encoded, quantized, transformed and motion compensated frame difference sequences.
FIG. 2 shows an MPEG-2 decoder 32. The decoder is similar to the local decoder loop of the encoder in FIG. 1. The encoded bit stream (Fnxe2x88x92Pxe2x80x2n)TQE and encoded motion vectors are decoded by the IVLC block 34. The motion vectors are sent directly from the IVLC block to the motion compensation prediction block (MCP) 36. The transformed and quantized image stream is then inverse quantized by the IQ block 38 and then transformed back to the time domain by the IDCT block 40 to create the reconstructed difference image stream (Fxe2x80x2nxe2x88x92Pxe2x80x2n). To recover a representation of the original image stream Fxe2x80x2n, the predicted frames Pxe2x80x2n must be added, in the summation block 42, to the recovered difference image stream. These predicted frames Pxe2x80x2n are formed by applying the recovered motion vectors, in a motion compensation prediction block, to a frame store 44 which creates a Fxe2x80x2nxe2x88x921 stream from the original image stream Fn. To get the decoder started, an image stream without Pxe2x80x2n is decoded. This allows the frame store to obtain the Fxe2x80x2n image and to store it for use in subsequent predictions.
FIG. 3 shows a prior art system 48 for encoding an image stream with spatial-scalable video coding. This system includes a spatial decimator 50 that receives the source image stream and generates a lower resolution image stream from the source image stream, a lower layer encoder 52 that receives the lower resolution image stream and encodes a bit stream for the lower layer using an encoder similar to that of FIG. 1, a spatial interpolator 54 that receives a decoded lower layer image stream from the lower layer encoder and generates a spatially interpolated image stream and an upper layer encoder 56, similar to that of FIG. 1, which receives the source image stream and the spatially interpolated image stream to generate the upper layer image stream. Finally, a multiplexor 58 is included to combine the lower and upper layer streams into a composite stream for subsequent transmission or storage.
The spatial decimator 50 reduces the spatial resolution of a source image stream to form the lower layer image stream. For example, if the source image stream is 1920 by 1080 luminance pixels, the spatial decimator may reduce the image to 720 by 480 luminance pixels. The lower layer encoder 52 then encodes the lower resolution image stream according to a specified standard such as MPEG-2, MPEG-4 or JPEG depending on whether motion or still pictures are being encoded. Internally, the lower layer encoder 52 also creates a decoded image stream and this image stream is sent to the spatial interpolator 54 which approximately reproduces the source video stream. Next, the upper layer encoder 56 encodes a bit stream based on the difference between source image stream and the spatially interpolated lower layer decoded image stream, or the difference between the source image stream and a motion compensated predicted image stream derived from the upper layer encoder or some weighted combination of the two. The goal is to choose either the motion compensated predicted frames or the spatially interpolated frames (or a weighted combination thereof) to produce a difference image stream that has the smallest error energy.
A spatial-scalable system, such as above, can offer both a standard television resolution of 720 by 480 pixels and a high definition resolution of 1920 by 1080 pixels. Also, scalability coding has other desirable characteristics such as interoperability of different video systems, improved World-Wide Web browser viewing of compressed images, and error-resiliency over noisy communication systems. However, scalability does not come without a cost compared to a single layer coding of the same image size. Typically, current image coding schemes, such as the scalable system described above, require a higher bit rate compared to single layer coding at the same picture quality, especially when interlaced pictures are involved. This higher bit rate tends to favor acceptance of single layer image coding.
Therefore, there is a need for a scalable image coding system and method that substantially reduces the bit rate of a multi-dimensional-scalable encoded image stream so that existing standard definition receivers can receive a standard quality image without incurring a cost at the receiver and high definition receivers can receive high quality images that are better than high definition images encoded as a single layer.
The present invention is directed towards the above-mentioned need. A method for encoding a source image stream to produce a lower-layer encoded image stream and an upper-layer encoded image stream includes the following steps. First, a down-converted source image stream is generated and encoded to create the lower-layer encoded image stream. Next, the lower-layer encoded image stream is decoded and then up-converted. The up-converted image stream is then processed in a non-linear fashion to generate a non-linear processed image stream. Next, a plurality of upper-layer difference streams is formed based on the source image stream and the non-linear processed image stream and an upper-layer difference stream is selecting for encoding. Finally, the selected upper-layer difference stream is encoded to form the upper-layer encoded image stream. In one embodiment the upper-layer difference stream includes a motion-compensated difference stream. This motion compensated difference stream is derived from prediction frames that may include.
A method, in accordance with the present invention, for decoding a layered, encoded image stream to produce a lower-layer image stream and an upper-layer image stream includes the following steps. First, the layered encoded image stream is de-multiplexed into a upper-layer encoded image stream and a lower-layer encoded image stream. The lower-layer encoded image stream is decoded to provide a lower-layer image stream which is then up-converted and processed to form a non-linear processed image stream. The composition of the upper-layer encoded image stream is then determined and the upper-layer encoded image stream is decoded to provide the upper-layer image stream based on the determined composition of the encoded upper-layer image stream, at least one composition of the encoded upper-layer image stream requiring the non-linear processed image stream to decode the upper-layer encoded image stream.
This approach has all the merits of spatial scalability coding algorithms such as interoperability, easy database browsing and indexing and error-resiliency. An advantage of the present invention is that a smaller bandwidth is required to send the lower layer encoded image stream and the upper-layer encoded image stream compared to current coders or, equivalently, higher quality images can be sent for a given bandwidth compared to current coders using the same bandwidth.