1. Field of the Invention
The present invention relates to techniques for encoding/transcoding digital video sequences.
2. Background of the Invention
With the advent of new media, video compression is increasingly being applied. In a video broadcast environment, a variety of channels and supports exist, associated to a variety of standard for content encoding and decoding.
Of all the standards available, MPEG (a well known acronym for Moving Pictures Experts Group) is nowadays adopted worldwide for quite different applications.
An example is the transmission of video signals both for standard television (SDTV) and high definition television (HDTV). HDTV demands bit rates up to 40 Mbit/s): MPEG is thus widely used for Set-Top-Box and DVD applications.
Another example is the transmission over an error prone channel with a very low bitrate (down to 64 Kbit/s) like the Internet and third generation wireless communications terminals.
One of the basic blocks of an encoding scheme such as MPEG is the quantizer: this is a key block in the entire encoding scheme because the quantizer is where the original information is partially lost, as a result of spatial redundancy being removed from the images. The quantizer also introduces the so called “quantization error”, which must be minimized, especially when a re-quantization step takes place as is the case i.a. when a compressed stream is to be re-encoded for a different platform, channel, storage, etc.
Another important block, common to both encoding and transcoding systems, is the rate control: this block is responsible for checking the real output bit-rate generated, and correspondingly adjust the quantization level to meet the output bitrate requirements as needed.
The MPEG video standard is based on a video compression procedure that exploits the high degree of spatial and temporal correlation existing in natural video sequences.
As shown in the block diagram of FIG. 1, an input video sequence is subject to frame reorder at 10 and then fed to a motion estimation block 12 associated with an anchor frames buffer 14. Hybrid DPCM/DCT coding removes temporal redundancy using inter-frame motion estimation. The residual error images generated at 16 are further processed via a Discrete Cosine Transform (DCT) at 18, which reduces spatial redundancy by de-correlating the pixels within a block and concentrating the energy of the block into a few low order coefficients. Finally, scalar quantization (Quant) performed at 20 and variable length coding (VLC) carried out at 22 produce a bitstream with good statistical compression efficiency.
Due to the intrinsic structure of MPEG, the final bit-stream is produced at a variable and unconstrained bitrate; hence, in order to control it or when the output channel requires a constant bitrate, an output buffer 24 and a feedback bitrate controller block 26, which defines the granularity of scalar quantization, must be added.
In the block diagram of FIG. 1, reference number 28 designates a multiplexer adapted for feeding the buffer 24 with either the VLC coded signals or signals derived from the motion estimation block 12, while references 30, 32, and 39 designate an inverse quantizer, an inverse DCT (IDCT) module and a summation node included in the loop encoder to feed the anchor frames buffer 14.
All of the foregoing is well known to those of skill in the art, thus making a more detailed explanation unnecessary under the circumstances.
The MPEG standard defines the syntax and semantics of the output bit-stream OS and the functionality of the decoder. However, the encoder is not strictly standardized: any encoder that produces a valid MPEG bitstream is acceptable.
Motion estimation is used to evaluate similarities among successive pictures, in order to remove temporal redundancy, i.e. to transmit only the difference among successive pictures. In particular, block matching motion Estimation (BM-ME) is a common way of extracting the existing similarities among pictures and is the technique selected by the MPEG-2 standard.
Recently, adapting the multimedia content to the client devices is becoming more and more important, and this expands the range of transformations to be effected on the media objects.
General access to multimedia contents can be provided in two basic ways.
The former is storing, managing, selecting, and delivering different versions of the media objects (images, video, audio, graphics and text) that comprise the multimedia presentations.
The letter is manipulating the media objects “on the fly”, by using, for example, methods for text-to-speech translation, image and video transcoding, media conversion, and summarization.
Multimedia content delivery thus can be adapted to the wide diversity of client device capabilities in communication, processing storage and display.
In either basic ways considered in the foregoing, the need for converting a compressed signal into another compressed signal format occurs. A device that performs such an operation is called a transcoder. Such a device could be placed in a network to help relaying transmissions between different bit rates or could be used as a pre-processing tool to create various versions of the media objects possibly needed as mentioned in the foregoing.
For example, a DVD movie MPEG-2 encoded at 8 Mbit/s at standard definition (Main Profile at Main Level) may be selected by a user wishing to watch it using a portable wireless device assisted by a CIF display. To permit this, the movie must be MPEG-2 decoded, the picture resolution changed from standard definition to CIF and then MPEG-4 encoded. The resulting bitstream at, i.e., 64 Kbit/s is thus adapted to be transmitted over a limited bandwidth error-prone channel, received by the portable device and MPEG-4 decoded for related display. The issue is therefore to cleverly adapt the bitrate and the picture resolution of a compressed data stream compliant to a certain video standard (e.g. MPEG-2) to another one (e.g. MPEG-4).
A widely adopted procedure is to decode the incoming bitstream, optionally to down-sample the decoded images to generate a sequence with a reduced picture size, and then re-encode the sequence with a new encoder configured to achieve the required bitrate.
Alternative methods have been developed as witnessed, e.g. by EP-A-1 231 793, EP-A-1 231 794 or European patent application No. 01830589.6. These and similar systems are adapted to work directly in the DCT domain, incorporating the decoder and the encoder, and re-utilizing useful information available (like motion vectors, for example).
These systems are adapted to remove unnecessary redundancies present in the system. In any case, a de-quantization followed by a re-quantization step (called “requantizer”) is usually required together with an output rate control function.