The present invention relates to transcoders for conversion of signals between a first and a second coding scheme. The transcoders according to the invention are particularly suitable for converting two- and three-dimensional images and video signals.
There are many occasions when it is necessary to transmit moving picture television over long distances via a transmission link. Broadcast quality television requires an excess of 100 Mbit/s when transmitted in digital form which is expensive to transmit and requires links of high bandwidth. An acceptable degree of degradation in the picture quality can be introduced in order to reduce the information content being transmitted. Additionally or alternatively, compression coding techniques may be used which take advantage of the high degree of spatial and temporal redundancies in the video signals being encoded.
There are also other applications where compression coding techniques are used. So, for example, for video conference applications a compression down to a bit rate of a few hundred kbit/s is possible whereas videophone-quality pictures including sound can be compressed down to less than 64 kbit/s.
Redundancy reduction techniques assume that there is a spatial and/or temporal correlation between neighbouring pixels or blocks of pixels. The details of correlation are encoded as well as the differences between the assumptions and the actual pixels or blocks. Typically each frame of an image to be coded comprises an array of picture elements (pixels) which are divided into blocks of Nxc3x97M pixels.
Predictive coding exploits the assumption that a value within a frame is related to some neighbouring values, in the same or a different frame, and the value may therefore be calculated at the receiver instead of being transmitted. It is then only necessary to transmit the prediction error arising from such an assumption. For instance the first pixel of a frame may be transmitted exactly whilst each subsequent pixel is transmitted as a difference from its predecessor. In more complex schemes the prediction may be found by a combination of a number of pixels.
Transform coding exploits the correlation of pixel magnitudes within a frame by transforming the magnitudes into another set of values, many of which are expected to be relatively small and which can therefore be coded using fewer bits. The most common form of transform coding uses the Discrete Cosine Transform (DCT). A block of Nxc3x97M pixels is transformed into an array of Nxc3x97M transform coefficients. The resulting array of coefficients is then quantised by dividing each coefficient by a quantisation factor. The quantised coefficients may be coded by a variable length code, for instance a Huffman code.
Another coding technique is motion compensation in which a picture is divided into blocks of pixels and each block of the current frame is compared with the corresponding block of a reference frame, which may be a previous and/or a subsequent frame, and with regions shifted in position from that block, and that region of the reference frame which the block most closely resembles is identified.
The vector difference in position between the identified region and the block in question is termed a motion vector and is used to shift the identified region of the reference frame into the position of the relevant block in the current frame. Motion vectors are generated for all the blocks of the current frame and these are used to derive a predicted frame from the reference frame. The differences between the current and predicted frame are, on an average, smaller than those between the current and reference frame and can be encoded using fewer bits. A decoder which already has the reference frame stored can thus reproduce the current frame using the motion vectors and the difference values. A signal may be coded using any of the aforementioned coding techniques, either separately or in combination.
Furthermore, it is reasonable to expect that in the future a wide range of quality video services like HDTV, etc. will be available together with the lower quality video services such as the video-phone and video-conference services. Multimedia documents containing video will most probably not only be retrieved over computer networks, but also over telephone lines, ISDN, ATM, or even mobile networks. The transmission over several types of links or networks with different bit rates and varying traffic load will require an adaptation of the bit rate to the available channel capacity. A main constraint on the systems is that the decoding of any level below the one associated with the transmitted format should not need the complete decoding of the transmitted source.
To maximise the integration of these various quality video services, a single coding scheme which can provide an unlimited range of video services is desirable. Such a coding scheme would enable users of different qualities to communicate with each other. For example, a subscriber to only a lower quality video service should be capable of decoding and reconstructing a digitally transmitted higher quality video signal, albeit at the lower quality service level to which he subscribes. Similarly, a higher quality service subscriber should be capable of decoding and reconstructing a digitally transmitted lower quality video although of course its subjective quality will be no better than its transmitted quality.
The problem therefore is associated with the way in which video will be transmitted to subscribers with different requirements (picture quality, processing power, memory requirements, resolution, bandwidth, frame rate, etc.). The following points summarise the requirements:
satisfy users having different bandwidth requirements,
satisfy users having different computational power,
adapt frame rate, resolution and compression ratio according to user preferences and available bandwidth,
adapt frame rate, resolution and compression ratio according to network abilities,
short delay, and
conform with standards, if required.
One solution to the problem of satisfying the different requirements of the receivers is the design of scalable bitstreams. In this form of scalability, there is usually no direct interaction between transmitter and receiver. Usually, the transmitter is able to make a bit stream which consists of various layers which can be used by receivers with different requirements in resolution, bandwidth, frame rate, memory or computational complexity. If new receivers are added which do not have the same requirements as the previous ones, then the transmitter has to be re-programmed to accommodate the requirements of the new receivers. Briefly, in bit stream scalability, the abilities of the decoders must be known in advance.
Furthermore, the design of a scalable bitstream can result in a higher number of bits compared to a single bit-stream for achieving a similar quality. A scalable bit stream also requires very computationally powerful coders, which may consist of a number of coders equal to the number of different receivers.
A different solution to the problem is the use of transcoders. A transcoder accepts a received data stream encoded according to a first coding scheme and outputs an encoded data stream encoded according to a second coding scheme. If one had a decoder which operated according to a second coding scheme then such a transcoder would allow reception of the transmitted signal encoded according to the first coding scheme without modifying the original encoder. For example, the transcoder could be used to convert a 128 kbit/s video signal conforming to ITU-T standard H.261, from an ISDN video terminal for transmission to a 28.8 Kbit/s signal over a telephone line using ITU-T standard H.263.
Most of the known transcoders decode video signals according to a first coding scheme into an uncompressed video signal which is then encoded by an encoder according to a second coding scheme to output a new compressed data stream. Thus a full decoding operation is carried out to reconstruct the original signal and then the reconstructed signal is encoded to provide a new coded data stream according to the second coding scheme. For coding methods involving motion compensation, new motion vectors have to be generated for the signal encoded according to the new coding scheme and this accounts for a large proportion of time for conventional transcoders.
Various transcoder architectures for video signals have been described recently in literature. The research has mainly concentrated on rate transcoding, i.e. transcoding from a certain bit rate to a lower one without changing the resolution.
Furthermore, the International patent application WO 95/29561 discloses a transcoder which extracts motion vectors from an incoming, received data stream and passes them to the data stream of the encoding part of the transcoder, thereby avoiding recalculation of the motion vectors.
Although the transcoder described in the above cited International patent application seems to be well suited for rate reduction, it will not work when the encoder of the transcoder has to encode the decoded video sequence (in the transcoder), at a different spatial resolution (for example CIF and QCIF). This is due to the fact that the transcoder disclosed in the International patent application WO 95/29561 applies a difference operation applied on two video signals of different spatial resolution, one originating from the decoding side of the transcoder and one from the encoding side.
Furthermore, the International patent application WO 95/29561 does not deal with the problem of how a change in spatial resolution can be implemented efficiently. Therefore, the transcoder described in WO 95/29561 is only suitable for a rate reduction, i.e. for use with coding schemes having the same spatial resolution.
Moreover, the results given in WO 95/29561 do not hold when the transmitter uses different motion accuracy than the one that the receiver uses. For example, when the transmitter uses the H.261 algorithm with integer pel accuracy and the receiver the H.263 with half pel accuracy, then a refinement of the motion vectors has to be implemented. This problem is not addressed in WO 95/29561. In addition, in WO 95/29561, the problem of changing the temporal resolution is not addressed.
The published European patent application EP 0 687 112 A2 discloses an image conversion apparatus for converting spatial or temporal resolution. The apparatus can also scale motion compensation information. This is performed by means of interpolating a central value from the mean, mode and median of target blocks and surrounding blocks.
Also, the published European patent application EP 0 690 392 A1 addresses the problem of rate conversion. However, no other reformation is performed, such as resolution reduction. In addition, EP 0 690 392 A1 is applied to MPEG compressed signals.
It is an object of the invention to provide a transcoder, which can be used for bit rate modification and resolution (spatial and/or temporal) modification and having a simpler structure than existing ones.
It is also an object of the present invention to provide a transcoder which overcomes the problems associated with the conversion of resolution as outlined above and which also makes use of the computational reduction obtained by the extraction of the motion vectors, and which hence would be suitable for use when transcoding between coding schemes having different resolutions, for instance a first coding scheme having a resolution of 352xc3x97288 pixels (CIF) and a second coding scheme having a resolution of 176xc3x97144 pixels (QCIF).
It is another object of the present invention to provide a transcoder and a method for implementing a change in resolution both in the spatial and in the DCT domain.
It is yet another object of the invention to provide a transcoder and a method for fast algorithms for the DCT to be used for changing the resolution in the DCT domain.
These objects and others are obtained with a transcoder architecture comprising a decoder for decoding a video signal encoded according to a first coding scheme employing motion compensation techniques and an encoder for encoding the decoded video signal according to a second coding scheme where the second coding scheme changes the resolution (spatial and/or temporal) and the bit rate of the incoming video signal.
According to a second aspect of the invention, the motion compensation information, for example in the form of motion vectors, in the incoming video signal is extracted and, if necessary, after proper scaling and refinement, passed directly to the encoding part of the transcoder and output in the output data stream.
According to a third aspect of the invention, the encoder part of the transcoder implements the resolution reduction of the incoming video in the frequency domain, thereby having reduced computational complexity compared to the encoder that would work in the spatial domain and would require filtering operations.
According to a fourth aspect of the invention, the transcoder can utilise special variable length coders (VLC) and scanning operations suited more for the block sizes and resolutions used. The decision on which coding that is to be used can be based on negotiations with the receiver, i.e. checking if the receiver can accept the VLCs that the transcoder proposes to use.
According to a fifth aspect of the invention, the transcoder utilises special algorithms for the computation of the Discrete Cosine Transform (DCT), here termed pruning DCT which give the ability to compute only the necessary part of the DCT coefficients required for the transcoding operation.
According to a sixth aspect of the invention, both the undersampling and the oversampling (interpolation) of frames (images) is done in the DCT domain using special DCT algorithms.
According to a seventh aspect of the invention, the spatial resolution modification can be implemented in the spatial domain.
According to a eighth aspect of the invention, the transcoder can refine the motion vectors provided from the decoder of the transcoder. For example, if the motion estimation at the transmitter is performed using integer pel accuracy and half pel accuracy is required to be implemented at the encoder in the transcoder, the encoder can utilise the existing motion vectors and refine the accuracy of them.
According to a ninth aspect of the invention, the transcoder scales the motion vectors in such a manner so that they can be used efficiently when the resolution is modified.
According to a tenth aspect of the invention, the transcoder combines four incoming motion vectors in such a manner so that to produce one motion vector per macroblock during the re-encoding process.
According to an eleventh aspect of the invention, the transcoder has means for passing and refining macroblock type information from the decoder (of the transcoder) to the encoder (of the transcoder).
According to another aspect of the invention, the transcoder refines the motion vectors in a small area in such a manner so that they can be used efficiently when the resolution is modified.
According to a another aspect of the invention, the transcoder can be used to perform dynamic spatial resolution modification, i.e. change of spatial resolution from frame to frame according to the complexity of the sequence and the available bandwidth.
According to another aspect of the invention, the transcoder can be used to perform temporal resolution reduction, i.e. frame rate reduction. It can also be used to perform a combination of modification of spatial resolution, temporal resolution and dynamic resolution.
The invention provides a solution to the problem of transferring video signals to receivers with different requirements and abilities (compression and decompression algorithm, bandwidth available, computational power, frame rate requirements, resolution requirements, etc.). It provides transcoder architectures that can transcode any incoming bitstream that represents video sequences of a certain resolution (spatial and/or temporal) and compression ratio to video sequences of a resolution (spatial and/or temporal) and compression ratio that best suits the requirements and abilities of a particular receiver. The invention can be used to modify the resolution and/or compression ratio of the incoming video signal in order to satisfy the resolution, bandwidth and computational requirements of a particular receiver.