Many applications, such as, multi-point video teleconferencing, the windowing of displays for workstations, video communications on asynchronous transfer mode ("ATM") networks, and broadcast high-definition television can benefit if video at various resolutions can be derived from an encoded bitstream. The simplest method of achieving this is the simulcast technique in which multiple independent coded replicas of a video sequence (each sealed to a different resolution) are simultaneously transmitted. In this approach, scaling is performed prior to the compression/decompression of each replica, and each resolution scale is then independently encoded and assigned a portion of the total available transmission bandwidth. As a result of his independent encoding and transmission, the simulcast technique requires a wide bandwidth.
A more bandwidth efficient alternative to simulcasting, is scalable video encoding. Frequency scaling is a low-complexity method of scalable video encoding in which a single video signal is transmitted to multiple receivers which decode images of varying resolutions from that signal depending upon the particular signal decoding scheme employed by each receiver. A specific encoding method to which frequency scaling can be easily applied uses discrete cosine transform ("DCT") blocks of original or prediction error pixels to derive blocks of frequency domain coefficients. Various subsets of these frequency domain coefficients can be used to generate different resolution scales for a given image. Such encoding may be implemented using a slightly modified version of the encoders disclosed in the International Standards Organization Committee Draft 11172-2, "Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s," November 1991, and in "Video Coding Using the MPEG-1 Compression Standard," A. Puri, Proceedings of the Society for Information Display 92, May 1992, Boston.
The transform coding portion of these encoders computes the DCT as a function of the energy in the blocks it receives from an original image, or from the difference between the original and its prediction from a previously decoded image. Assuming an 8.times.8 picture element ("pixel") basic block size, each 8.times.8 pixel block of a picture is transformed into an 8.times.8 block of DCT coefficients. These coefficients are then quantized and scanned so that they may be converted into a one dimensional sequence. An entropy coder within the video encoder compresses the one dimensional sequence to a series of number pairs called run/levels. Within a run/level the first number represents the number of zero coefficients between the last two non-zero coefficients in the one dimensional sequence, and the second number represents the quantized value (i.e., quantization index) of the next non-zero coefficient in that sequence. The number of zero coefficients is called the length of "run of zeros". For example, the one dimensional sequence of quantized coefficients 1 1 0 0 5 0 0 0 6 3 0 0 4 0 0 0 . . . 0 results in the run/level sequence of (0, 1) (0, 1) (2, 5) (3,6) (0, 3) (2, 4) (EOB). EOB, or End-of-Block, indicates that all of the remaining quantized coefficients are zeros. Prior to transmission, these pairs, including the (EOB), are encoded using variable-length codes ("VLCs") consisting of a sequence of binary digits obtained from a code table which is optimized to assign shorter codes to most frequent pairs occurring in typical video images. This way, long runs of zeros (which are common in motion compensated transform based image coding) can be efficiently coded. When the transmitted signal is received, or stored signal is retrieved, the VLCs are decoded and the run/level pairs are recovered. The non-zero quantized coefficients are then inverse quantized, the zero coefficients are reconstituted, and an inverse DCT ("IDCT") is performed on the entire block.
Frequency scaling allows for efficient encoding of video in a manner which facilitates the derivation of various resolutions at the decoder by using different IDCT block sizes. It can be shown that if a decoder applies the IDCT to an upper left corner sub-block of every received block of coefficients, it can generate lower resolution images. For example, if a 2.times.2 sub-block of an encoded 8.times.8 block is decoded, image resolution will be reduced by a factor of four both vertically and horizontally (referred to as f-scale 2 frequency scaling). In this manner, the frequency scalability can be implemented on the decoder without any modification to the encoder. However, because of the VLCs used for coding run/level pairs, there is no way to detect the end of a block (EOB), or the beginning of a new block, unless the entire block is decoded. As a result, in order to recover a sub-block, the variable-length word decoder must operate as though a full resolution image were being recovered (this requires very fast, expensive circuitry to be used to decode even low resolution images). Using unique marker codes to separate sub-blocks is not feasible because of the high overhead caused by such codes.
An existing method for obtaining a frequency scalable bitstream is to encode the coefficients from sun-blocks separately in a layered structure and multiplex them as slices of various layers in the bitstream. In this method, the lowest resolution layer keeps the basic structure of the bitstream hierarchy, but fewer DCT coefficients are included with each block (such as four coefficients for f-scale 2). The remaining coefficients for each block in the slice are sent in slave slices which are separated by independently identifiable bit patterns called slave slice start codes. One problem with this approach is that every coded block in each scale has an EOB associated with it to mark the last non-zero coefficient included in that scale. As a result, instead of sending a single EOB for each coded block as in non-scalable coding, multiple EOBs are sent which decreases the efficiency of the frequency scalable coding.
Another problem with this solution is the overhead it introduces over and above the overhead caused by the slave slice start codes. The arrows in FIG. 1A shows a typical zigzag scan of an 8.times.8 coefficient block (100), where each of the numbers corresponds to the location of the coefficient in the one dimensional sequence derived by the scan operation. FIG. 1B shows the zig-zag scanning pattern applied to 2.times.2 (101), 4.times.4 (103), and 8.times.8 (103) coefficient sub-blocks used with f-scale 2, f-scale 4, and f-scale 8 layers respectively. If coefficients 3 and 4 are zeros, they will be coded as part of a single run of zeros within the bitstream if the 8.times.8 block is scanned as a whole (FIG. 1A). However, in existing methods of frequency scalable block encoding they must be coded separately as they are each in a different coefficient sub-block. While there are only three such breaks associated with an f-scale 2 sub-block (between the coefficients within the solid-line ovals of FIG. 1B), there are seven associated with an f-scale 4 sub-block (between the coefficients within the dotted-line ovals). Each break in a run of zeros introduced by the sub-block divisions leads to increased coding overhead. While this overhead may be reduced by using separate VLC tables optimized for each frequency scale, these separate tables introduce additional complexity.