Methods for encoding and correspondingly decoding image information have been known for many years. Such methods are of significance in DVD, mobile telephone digital image transmission, digital cable television and digital satellite television. In consequence, there exists a range of encoding and corresponding decoding techniques, some of which have become internationally recognised standards such as MPEG-2.
Since 1997, a Video Coding Experts Group (VCEG) of the International Telecommunications Union (ITU) has been working on a new video coding standard having an international denomination H.26L. In late 2001, a Moving Picture Expert Group (MPEG) of the International Standardization Organization/International Electrotechnical Commission (ISO/IEC) in collaboration with VCEG decided to work together as a Joint Video Team (JVT) in order to create a single technical design; the design is expected to be officially approved in 2003 by the ITU-T as “Recommendation H.264” and by ISO/IEC as “International Standard 14496-10” (MPEG-4 Part 10) Advanced Video Coding (AVC).
Principal objectives of the H.264/AVC standardization have been to significantly improve video compression efficiency and also to provide a “network-friendly” video representation addressing conversational and non-conversational applications; conversational applications relate to telephony whereas non-conversational applications relate to storage, broadcast and streaming of communication data. Presently, the standard H.264/AVC is broadly recognized as being able to achieve these objectives; moreover, the standard H264/AVC is also being considered for adoption by several other technical and standardization bodies dealing with video applications, for example the DVB-Forum and the DVD Forum.
Both software and hardware implementations of H.264/AVC encoders and decoders are becoming available.
Other forms of video encoding and decoding are also known. For example, in a U.S. Pat. No. 5,917,609, there is described a hybrid waveform and model-based image signal encoder and corresponding decoder. In the encoder and corresponding decoder, an original image signal is waveform-encoded and decoded so as to approximate the waveform of the original signal as closely as possible after compression. In order to compensate its loss, a noise component of the signal, namely a signal component which is lost by the waveform encoding, is model-based encoded and separately transmitted or stored. In the decoder, the noise is regenerated and added to the waveform-decoded image signal. The encoder and decoder elucidated in this U.S. Pat. No. 5,917,609 is especially pertinent to compression of medical X-ray angiographic images where loss of noise leads a cardiologist or radiologist to conclude that corresponding images are distorted. However, the encoder and corresponding decoder described are to be regarded as specialist implementations not necessarily complying with any established or emerging image encoding and corresponding decoding standards.
Referring again to the emerging aforementioned H.264 standard, this standard utilizes similar principles of spatial scalability known from existing standards such as MPEG-2. Application of the principles means that it is possible to encode a video sequence in two or more layers arranged in sequence from a highest layer to a lowest layer, each layer using a spatial resolution which is equal to or less than the spatial resolution of its next highest layer. The layers are mutually related in such a manner that a higher layer, often referred to as an “enhancement layer”, represents a difference between original images in the video sequence and a lower encoded layer after which it has been locally decoded and scaled-up to a spatial resolution corresponding to the original images. In FIG. 1, there is shown a scheme for generating data corresponding to such an enhancement layer.
In FIG. 1, there is shown a known composite encoder indicated generally by 10. The encoder 10 comprises a scaling-down function 20, a first H.264 encoder 30, a local H.264 decoder 40, a scaling-up function 50, a difference function 60 and a second H.264 encoder 70. A video signal input IP is provided for inputting pixel image data. The input IP is coupled to a non-inverting input (+) of the difference function 60 and to an input of the scaling-down function 20. A scaled-down output of the scaling-down function 20 is coupled to an input of the first encoder 30. A first principal encoded output of the first encoder 30 is arranged to provide a base layer output BLOP. Moreover, a second local encoded output of the first encoder 30 is coupled to an input of a local H.264 decoder whose corresponding decoded output is coupled to an input of the scaling-up function 50. Furthermore, a scaled-up output of the scaling-up function 50 is coupled to an inverting input (−) of the difference function 60. A difference output of the difference function 60 is coupled to an input of the second encoder 70. An encoded output from the second encoder 70 is arranged to provide an enhancement layer output ELOP. The composite encoder 10 is defined as being a multi-layer encoder on account of input image data presented at the input IP being represented in a plurality of encoded outputs, for example at the BLOP and ELOP outputs, each output corresponding to a “layer”.
The composite encoder 10 is susceptible to being implemented in software, hardware, or a mixture of both software and hardware. Moreover, the scaling-down function 20 and the scaling-up function 50 are preferably arranged to have matched and mutually inverse image scaling characteristics. Furthermore, the first encoder 30 and the local decoder 40 are preferably arranged to provide matched but inverse characteristics. Additionally, the first and second encoders 30, 70 are preferably endowed with mutually similar encoding characteristics.
Operation of the composite encoder 10 will now be described with reference to FIG. 1.
An input stream of pixel data corresponding to a sequence of images is provided at the input IP of the encoder 10. The stream is passed on a frame-by-frame basis to the non-inverting input (+) of the difference function 60 and also to the scaling-down function 20. A scaled-down version of the input IP provided from the scaling-down function 20 is presented to the first encoder 30 which encodes the scaled-down version to provide the base layer BLOP output. Moreover, the first encoder 30 also provides a similar encoded output to the local decoder 40 which reconstitutes a version of the scaled-down version of the input presented to the first encoder 20. The reconstituted version is then passed via the scaling-up function 50 to the inverting input (−) of the difference function 60. The difference function 60 thereby provides at its output presented to an input of the second encoder 70 an error signal corresponding to errors introduced by a combination of the first encoder 30 and its associated decoder 40, ignoring deviations introduced by the scaling functions 20, 50. This error signal is encoded to give rise to the enhancement-layer ELOP output.
If the BLOP AND ELOP outputs are conveyed via a transmission medium to a receiver which is operable to decode the BLOP and ELOP outputs using one or more decoders similar in operating characteristics to the local decoder 40 and then the resulting decoded ELOP and BLOP signals are combined, it is feasible to reconstitute the input IP at the receiver with enhanced accuracy as encoding and decoding errors are susceptible to being compensated at the receiver by effect of the ELOP signal.
However, the inventors have appreciated that the ELOP output typically will have a relatively high spatial-frequency noise-like characteristic which corresponds to demanding material for a video encoder such as an H.26L encoder; in the following, the term “noise-like” is to be construed to refer to a relative lack of spatial correlation concurrently with a significant part of signal energy being distributed at higher spatial frequencies. Therefore, it is not uncommon in practice that the quantity of data used to encode a given part of the enhancement layer exceeds the quantity of data needed for encoding a corresponding part of the original image. Such a high data quantity requirement for encoding the enhancement layer signal ELOP potentially represents a problem which the present invention seeks to address.