1. Field of the Invention
The present invention relates to the processing of bitstreams encoded according to the MPEG standard.
The MPEG (Moving Pictures Experts Group) standard proposes a set of algorithms dedicated to the compression of sequences of digital (audio/video) signals. The subject of the specification does not regard so much the use of these tools in the encoding phase as rather the way of interpreting the syntax of the encoded bitstream and the use of said tools during decoding (i.e., when carrying out decompression). The techniques used are based on the reduction in spatial and temporal redundancy of the sequence.
2. Description of the Related Art
In general, according to the MPEG standard, reduction in spatial redundancy is obtained by independently compressing the individual images, using a discrete cosine transform (DCT), quantization and Huffman coding.
Reduction in temporal redundancy is obtained by exploiting the correlation that exists between successive and/or temporally close images in the sequence. Approximately it is assumed that each portion of an image could be expressed locally as the translation of a portion of a previous and/or subsequent image in the sequence.
For this purpose, the MPEG standard reviews three types of images indicated by I (Intra-Coded Frame), P (Predicted Frame), and B (Bidirectionally Predicted Frame).
The images I are encoded in an altogether independent way; the images P are encoded with respect to a previous image I or P in the sequence; finally, the images B are encoded with respect to two images of an I type or P type, one preceding and the other following in the sequence.
A typical succession of images may be as follows: IBBPBBPBBIB . . . .
This is the order in which the images are displayed, but since each image P is encoded with respect to the preceding image I or P, and each image B is encoded with respect to the preceding and following image I or P, it is necessary for the decoder to receive the images P before the image B, and the images I before the image P. Consequently, the order of transmission of the images will be IPBBPBBIBB . . . .
The images are processed by the encoder in a sequential way in the order indicated, and subsequently sent to a decoder which decodes them and re-orders them, so enabling their subsequent display. To encode an image B it is necessary for the encoder to maintain the images I and P—encoded and then decoded previously—to which the image B refers, in a special memory referred to as “frame memory”, and this operation requires an appropriate amount of memory.
The above methodology finds a valid example of implementation in the MPEG 2 and MPEG 4 standards.
In this connection, the diagram of FIG. 1 illustrates, in the form of a block diagram, the typical structure of a video MPEG encoder.
The system, designated as a whole by 10, comprises, in the first place, a module 11 designed to carry out filtering of the chrominance (chroma) component of the video signal passing from the format 4:2:2 to the format 4:2:0. Basically, the module 11 contains a bandpass filter, which operates on the chrominance component, replacing each pixel with a weighted sum of the surrounding pixels that are set on the same column multiplied by appropriate coefficients. This enables the subsequent sub-sampling by two to obtain a halved vertical definition of the chrominance.
The reference number 12 designates a frame-ordering module made up of one or more frame memories. The module 12 is designed to supply at output the frames in the encoding order required by the syntax of the MPEG standard.
For example, if the input sequence is IBBPBBP, etc., the order at output will be IPBBPBB . . . .
As has already been explained, I (Intra-Coded Picture) is a frame and/or a half-frame containing temporal redundancy; P (Predicted Picture) is a frame and/or a half-frame the temporal redundancy of which with respect to a preceding image I or P (which has been previously encoded/decoded) has been removed; by B (Bidirectionally Predicted Picture) a frame and/or half-frame is indicated the temporal redundancy of which with respect to the preceding image I and the subsequent image P (or else, the preceding image P and the subsequent image P, or again, the preceding image P and the subsequent image I) has been removed. In both cases, the images I and P are to be considered already encoded/decoded.
The reference number 13 designates the module for estimating motion, i.e., the block that is able to remove the temporal redundancy of the images P and B.
It is to be recalled that the above block works only on the most energetic component (and hence one that is rich in information) of the images that make up the sequence to be encoded, such as the luminance sequence.
One of the important concepts for carrying out encoding is the estimation of the motion, and the MPEG standard is based upon the considerations specified below.
A set of pixels of an image frame may be set in a position of the subsequent image obtained by translation of the image in the previous frame.
Suppose, for example, that this set of pixels is a square of 16×16 pixels. This set of data, together with the color information associated to it, is usually referred to as “macroblock”.
Of course, the changes in position of the objects may expose to the filming camera parts that were previously not seen, as well as modifications in the shapes of the objects themselves (for example, as a result of a zooming function, etc.).
The family of algorithms that are able to identify and associate the said portions of images is referred to as “estimation of motion”. This association makes it possible to calculate the portion of difference image, thus removing the redundant temporal information and rendering the subsequent process of compression by means of a DCT, quantization and entropic encoding more effective.
The reference number 14 designates a module or block that implements, on the signal coming from an adder node 23 (which will be explained in greater detail later), the DCT according to the MPEG standard. The image I and the images P and B, considered as error images, are divided into 8×8 blocks Y, U, V, on which DCT transformation is applied.
The reference number 15 designates a quantizer module (Q). Here the 8×8 block resulting from DCT transformation is divided by a matrix, referred to as “quantization matrix”, such as to reduce, more or less drastically, the dimension in number of bits of the DCT coefficients. In this case, the tendency is to remove the information associated to the higher frequencies, which are less visible to the human eye. The result is re-ordered and sent to the subsequent block, designated by 16, which implements the run-length coding (RLC) and the variable-length coding (VLC).
In particular, RLC aims at taking into account the fact that the code words at output from the quantizer module 15 tend to contain zero coefficients in a more or less high number, followed by non-zero values. The zero values, which precede the first non-zero value are counted, and this count constitutes the first portion of a word, the second portion of which is the non-zero coefficient. This method of packeting data is defined as “run-length coding”.
The result thus obtained undergoes VLC “variable-length coding”, also known as Huffman coding.
This type of coding takes into account the fact that some pairs of values tend to assume more likely values than others. The more likely values are coded with very short words (2/3/4 bits), whereas the less likely values are coded with longer words. Statistically, the number of bits produced at output is smaller than the number of bits at input, or rather the number of bits that there would be if the said coding were not carried out.
In order to be able to construct the final syntax envisaged by the MPEG standard, the data generated by the variable-length encoder (output from the module 16), the quantization matrices, the vectors of motion (output from the module 13), and other syntactic elements are sent to an assembler module, designated as a whole by 17 and comprising a multiplexer 17a and a buffer 17b. 
The limit size of the buffer is specified by the standard itself and cannot be exceeded.
The quantization block 15 presides over respect of the said limit, rendering more or less drastic the process of division of the DCT coefficients according to whether the latter are more or less close to filling the buffer and according to the energy of the 8×8 source block taken upstream of the process of estimation of motion and DCT transformation.
The reference numbers 18 and 19 designate two modules that basically implement a feedback loop to the estimation-of-motion function represented by the module 13.
In particular, the module designated by 18 performs on the data undergoing quantization in the module 15 an inverse-quantization function.
The signals thus obtained undergo inverse DCT (IDCT) in the module 19. In practice, the DCT function is inverted and applied to the 8×8 block at output from the process of inverse quantization. The function performed in the module 19 enables passage from the domain of spatial frequencies to the pixel domain, obtaining at output:                the decoded frame (half-frame) I that is to be stored in an appropriate frame memory for subsequent removal of temporal redundancy, with respect thereto, from the subsequent images P and B; and        the decoded prediction error frame (half-frame) P and B which is added to the information previously removed during the step of estimation of motion; in the P case, this resulting sum, stored in an appropriate frame memory, is used during the process of estimation of motion for the subsequent images P and B.        
The above is performed in the module designated, as a whole, by 20, where the frame memories are usually distinct from the re-ordering memories.
The reference number 21 designates the rate-control module which interacts for this purpose with the output of the module 14 and the output of the buffer 17b, supplying a corresponding control signal mQuant to the module 15.
Finally, the reference numbers 22 and 23 designate two adder nodes in which the following are respectively added:                the output of the IDCT module 19 and the output, designated by 24, on which the data relating to the motion vectors are transferred from the module 20 to the estimation-of-motion module 13; and        the output of the re-ordering module 12 and the output of the module 20, and this in view of supply to the module 14 which implements the DCT function.        
The foregoing obviously corresponds to altogether current know-how for persons skilled in the sector, a know-how which is here recalled merely for purposes of reference.
The same also applies to the structure of an MPEG decoder as represented in FIG. 2.
In the above-mentioned figure it is possible to note that the said demodulator, designated as a whole by 30, in the first place carries out, in a module designated by 31, detection of the so-called “headers” in the framework of the MPEG-encoded bitstream and the subsequent accumulation of the data received within a buffer 32 designed to absorb any discontinuities in the said stream.
The module 33 is responsible for performing the functions of demultiplexing, inverse VLC decoding, and inverse decoding of the run-level pairs in view of forwarding of the data thus obtained to a module 34. Here, under the control of the signal mQuant supplied by the module 33 itself on a line 35, the inverse-quantization function (IQ) is performed.
The signal thus obtained is then passed onto to a module 36 which performs the inverse DCT function, the aim being to proceed, in an adder node 37 to reconstruction of the output signal according to the signal generated by the motocompensation node 38 which receives, from the module 33, the data regarding the motion vectors on a line 39. In the node 37 also the prediction error is calculated for decoding the subsequent images P and B (line 40).
It may therefore be stated that the processes illustrated in FIGS. 1 and 2 are two concurrent processes cascaded together.
In the actual use of the MPEG standard it is therefore possible to transmit (or record) films, or, in general, video sequences on a variety of channels and media, each of which has its own characteristics of capacity, speed and cost.
For example, the distribution of a film starting from the master recording may take place on a DVD medium, via satellite, via radio antenna, or via cable.
The band available for transmission may therefore be different from the one envisaged in the step of decoding of the video sequence according to the MPEG standard.
Consider, for example, encoding a 6-Mbit/s sequence according to the MPEG 2 standard.
If the attempt were made to use a 384-kbit/s UMTS channel, the transmission would in general be impossible.
The same problem arises also at the level of the decoders which in general are not able to decode bitstreams in compliance with an MPEG specification that is different according to type, profile and level from that for which the decoders themselves were prepared.
With regard to MPEG 2 and MPEG 4 standards, there thus emerges the problem of ensuring that a bitstream encoded according to a given standard should be convertible into a new bitstream encoded according to a different standard and/or for channels with different bitrates so as to enable re-adaptation to the characteristics of the transmission medium and/or the decoding system.
In particular, it is possible to have combinations of use in which the encoder operates according to the MPEG 2 standard, whilst the decoding (or transmission) function is carried out not only according to the MPEG 2 standard, but also possibly according to the MPEG 4 standard, and, in a dual way, situations in which the encoding is carried out according to the MPEG 4 standard, whilst decoding and transmission is carried out not only with the MPEG 4 standard, but also with the MPEG 2 standard.
There thus exists the need to be able to modify the bitrate, resolution, and syntax of an MPEG bitstream generated following upon encoding of the source with bitrate B1 so as to give rise to a stream having syntax and resolution identical to or different from the starting ones, the said second stream having a bitrate B2, where B2 may be smaller than, greater than, or equal to B1.
There may then also arise the need to modify the horizontal and vertical dimensions and/or the resolution of the encoded image.
In order to achieve the above target, in the prior art there has already been proposed the solution of proceeding by decoding the MPEG bitstream, then proceeding to the change of horizontal resolution and/or on the decoded signal, and then to the subsequent recording of the latter using an MPEG encoder.
This solution is in actual fact highly complex from the computational point of view, also on account of the numerous different possible combinations, in view of the fact that the input and output bitstreams may be either MPEG 2 or MPEG 4.
To clarify the above concept further, reference may be made to the diagram of FIG. 3, which is a schematic illustration of a solution for MPEG transcoding performed according to the known art.
On the assumption of operating on an input bitstream IS encoded according to the MPEG 2 or MPEG 4 standard, the reference number 50 designates a decoder that carries out a transformation of the MPEG bitstream (it is irrelevant whether specification 2 or specification 4) into decoded images ID, which are a sequence of frames.
The reference number 60 designates a module that is able to carry out a possible change of resolution on the basis of a classic technique which employs finite impulse response (FIR) filters.
The FIR filter in question performs a transformation based upon the availability of a certain number N of pixels for each component of luminance and chrominance of the image. These pixels are multiplied by appropriate weights, and the results are accumulated and divided by the sum of said weights. Finally, some of the pixels are not transmitted in the resulting image, depending upon the mutation factor of the chosen resolution.
The signal that has undergone change of resolution in the module 60 is then fed to an MPEG encoder 70 which is able to generate a syntax in conformance with the MPEG 2 standard or MPEG 4 standard in view of the transmission schematically represented in T.
Starting from an encoded bitstream with arbitrary bitrate B1, it is always possible to obtain an encoded bitstream with bitrate B2 by simply connecting the output of the decoder 50 to the input of the change-of-resolution block 60. The output from the latter is then connected to the input of the encoder 70 programmed to encode at an Mbit/s bitrate B2.
The block designated by 80 is simply a switch, which is there to indicate the fact that the change-of-resolution operation is in itself optional, so that, in the case where it is not necessary to proceed to the change of resolution, the sequence of frames ID may be directly fed to the encoder 70 without undergoing change of resolution.
Finally, downstream of transmission (it is to be recalled that, for the purposes of the present invention, here the term “transmission” also includes recording on a physical medium, such as a DVD) the MPEG (re)coded signal is fed to a decoder 90 which is able to read and decode the bitstream received according to a syntax in conformance with the MPEG standard (either MPEG 2 or MPEG 4) in view of the generation of an output video sequence OS.
If the block diagrams of FIGS. 1 and 2 are borne in mind, it will be immediately realized that the sequence of processes illustrated in FIG. 3 presents a decidedly high computational complexity.
The transcoding operation represented in the diagram of FIG. 3 entails, in fact, as far as the decoder 50 is concerned, the execution of the following steps:                inverse Huffman coding;        inverse Run-Length coding;        inverse quantization;        inverse discrete cosine transform;        motocompensation;        filtering; and        change of resolution (where envisaged).        
For the encoder 70, the following operations become necessary:                pre-processing;        estimation of motion;        calculation of prediction error;        cosine transform;        quantization;        run-length coding;        Huffman coding;        inverse quantization;        inverse discrete cosine transform; and        motocompensation.        
Finally, for the receiving decoder, the following operations must be carried out:                inverse Huffman coding;        inverse run-length coding;        inverse quantization;        inverse discrete cosine transform; and        motocompensation.        
The computational cost lies almost entirely in the estimation of motion, followed by the direct and inverse cosine transforms and motocompensation. Quantization and the (direct and inverse) run-length and Huffman codings constitute, instead, a contribution smaller than the previous ones to the overall cost.
The quality of the resulting output bitstream OS derives, instead, from the information content of the quantized coefficients. This depends upon the implementation of the encoder (the decoder is uniquely defined by ISO/IEC 13818-2 Directives for the MPEG 2 standard and by ISO/IEC 14496-2 Directives for the MPEG 4 standard), upon the effectiveness of its estimator of motion, and upon the quality and precision of the rate control.