It is reasonable to expect that in the future a wide range of quality video services like High Definition TV (HDTV) will be available together with Standard Definition TV (SDTV), and video services of lower quality such as videophone and videoconference. Multimedia documents containing video will most probably not only be retrieved over computer networks, but also over telephone lines, Integrated Services Digital Network (ISDN), Asynchronous Transfer Mode (ATM), or even mobile networks.
The transmission over several types of links or networks with different bit rates and varying traffic load will require an adaptation of the bit rate to the available channel capacity. The main constraint of the systems is that the decoding of any level below the one associated with the transmitted format should not need the complete decoding of the transmitted source.
In order to maximise the integration of these various quality video services, a single coding scheme which can provide an unlimited range of video services is desirable. Such a coding scheme would enable users of different qualities to communicate with each other. For example, a subscriber to only a lower quality video service should be capable of decoding and reconstructing a digitally transmitted higher quality video signal, albeit at the lower quality service level to which he subscribes. Similarly, a higher quality service subscriber should be capable of decoding and reconstructing a digitally transmitted lower quality video signal although, of course, its subjective quality will be no better than that of the transmitted quality.
The problem therefore is associated with the way in which video will be transmitted to subscribers with different requirements (picture quality, processing power, memory requirements, resolution, bandwidth, frame rate, etc.). The following points summarise the requirements:                satisfy users having different bandwidth requirements,        satisfy users having different computational power,        adapt frame rate, resolution and compression ratio to user preferences and available bandwidth,        adapt frame rate, resolution and compression ratio to network abilities,        short delay, and        conform with standards, if required.        
One solution to the problem of satisfying the different requirements of the receivers is the design of scalable bitstreams. In this form of scalability, there is usually no direct interaction between a transmitter and a receiver. Usually, the transmitter is able to make a bit stream which consists of various layers which can be used by receivers with different requirements in resolution, bandwidth, frame rate, memory or computational complexity. If new receivers are added which do not have the same requirements as the existing ones, then the transmitter has to be re-programmed to accommodate the requirements of the new receivers. Briefly, in bit stream scalability, the abilities of the decoders must be known in advance.
A different solution to the problem is the use of transcoders. A transcoder accepts a received data stream encoded according to a first coding scheme and outputs an encoded data stream encoded according to a second coding scheme. If one had a decoder which operates according to a second coding scheme then such a transcoder would allow reception of the transmitted signal encoded according to the first coding scheme without modifying the original encoder.
One situation that usually appears especially in multiparty conferences is that a particular receiver has a different bandwidth ability and/or a different computational requirements. For example, in a multipoint communication with participants connected through ISDN and Public Switched Telephone Network (PSTN), the bandwidth can vary from 28.8 kbits/s (PSTN) to more than 128 kbits/s (ISDN). Since video transmitted at as high bit rates as 128 kbits/s can not be transferred over PSTN lines, video transcoding has to be implemented in the Multipoint Control Unit (MCU) or Gateway.
This transcoding might have to implement a spatial resolution reduction of the video in order to fit into the bandwidth of a particular receiver. For example, an ISDN subscriber might be transmitting video in Common Intermediate Format (CIF) (288=352 pixels), while a PSTN subscriber might be able to receive video only in a Quad Common Intermediate Format (QCIF) (144×176). Another example is when a particular receiver does not have the computational power to decode at a particular resolution and therefore a reduced resolution video has to be transmitted to that receiver. Additionally, transcoding of HDTV to SDTV requires a resolution reduction.
For example, the transcoder could be used to convert a 128 kbit/s video signal in CIF format conforming to ITU-T standard H.261, from an ISDN video terminal for transmission to a 28.8 Kbit/s video signal in QCIF format over a telephone line using ITU-T standard H.263.
It should also be noted that many scalable video coding systems require both the use of 8×8 and 4×4 DCT. For example, in L. H. Kieu and K. N. Ngan, “Cell-loss concealment techniques for layered video codecs in an ATM network”, IEEE Trans. On Image Processing, Vol. 3, No. 5, pp. 666-677, September 1994, a scalable video coding system is described in which the base layer has lower resolution compared to the enhancement layer. In that system, an 8×8 DCT is applied in each of the 8×8 blocks of the image and the enhancement layer is compressed using the 8×8 DCT. The base layer uses the 4×4 out of the 8×8 DCTs of each block of the enhancement layer and is compressed using only 4×4 DCTs. This however is not beneficial since a 4×4 DCT usually results in reduced performance compared to the 8×8 DCT and it requires also that encoders and decoders have to be able to handle 4×4 DCTs/IDCTs.
The traditional method of downsampling an image consists of two steps, see J. Bao, H. Sun, T. C. Poon, “HDTV down conversion decoder”, IEEE Trans. On Consumer Electronics, Vol. 42, No. 3, pp. 402-410, August 1996. First the image is filtered by an anti-aliasing low pass filter. The filtered image is downsampled by a desired factor in each dimension. For a DCT-based compressed image, the above method implies that the compressed image has to be recovered to the spatial domain by inverse DCT and then undergo the procedure of filtering and downsampling. If the image is to be compressed and transmitted again, this requires an extra forward DCT after the undersampling stage. This can be the case in which the undersampling takes place in a Multipoint Control Unit—MCU in order to satisfy the requirements and bandwidth of a particular receiver, or in scalable video coding schemes.
In a different method, that works in the compressed domain, both the operations of filtering and downsampling are combined in the DCT domain. This is done by cutting DCT coefficients of high frequencies and using the inverse DCT with a lower number of DCT coefficients in order to reconstruct the reduced resolution image. For example, one can use the 4×4 out of the 8×8 and perform the IDCT of these coefficients in order to reduce the resolution by a factor of 2 in each dimension. This does not result in significant compression gains and additionally requires that receivers are able to handle 4×4 DCTs.
Furthermore, this method results in significant amount of block edge effects and distortions, due to the poor approximations introduced by simply discarding higher order coefficients. The above method would be more useful if one had 16×16 DCT blocks and were keeping the low frequency 8×8 DCT coefficients in order to obtain the downsampled image. However, most image and video compression standard methods like JPEG, H.261, MPEG1, MPEG2 and H.263 segment the images into rectangular blocks of size 8×8 pixels and apply the DCT onto these blocks. Therefore, only 8×8 DCTs are available. A way to compute the 16×16 DCT coefficients is to apply inverse DCT on each of the 8×8 blocks and reconstruct the image. Then the DCT on blocks of size 16×16 can be applied and the 8×8 out of the 16×16 DCTs coefficients of each block can be kept, if a resolution reduction by a factor of 2 in each dimension is required.
This, however, requires complete decoding (perform 8×8 IDCTs) and re-transforming by performing 16×16 DCTs (16×16 DCT hardware would be required). However, if one could compute the 8×8 out of the 16×16 DCT coefficients by using only 8×8 transformations, then this method would be faster and also would perform better than the one that uses the 4×4 out of the 8×8. It would also mean that computation of DCTs of size 16×16 is avoided and reduced memory requirements are obtained. Furthermore, U.S. Pat. No. 5,107,345 describes an adaptive DCT scheme used in coding. The scheme uses 2×2, 4×4, 8×8 and 16×16 DCTs in order to obtain a flexible bit rate which can be modified according to the available transmission capacity. Our scheme provides a fast computation to this adaptive scheme.