1. Field of the Invention
The present invention relates to multimedia signal processing, or more specifically, to bandwidth scaling of a compressed video stream.
2. Discussion of the Prior Art
Converting a previously compressed video bit stream to a lower bit rate through transcoding provides dynamic adjustments of the bitrate of the coded video bit stream to meet various channels solutions.
The principal goal of any transcoding system is to achieve a higher level of compression than the one that the original coding system has, while consuming less processing power. The more compression means better motion estimation and more quantization. But, more quantization means less quality of the video stream, therefore the only option left seems to improve the motion estimation. However, the better motion estimation the more processing power is needed. Thus, it seems almost impossible to achieve both goals simultaneously.
If one carefully analyzes the situation, one would recognize that the motion estimation is performed before DCT and quantization are done, so in the case when input and output pictures have the same spatial resolution, the original motion vectors would remain optimal if they were optimal in the first place. The problem is to make sure that all of them are 100% optimal. Usually they are not.
In the prior art, one way to deal with this problem is to improve original vectors by classification and refinement. This is the technique used to classify original motion vectors according to some criteria and to make a decision which of them are good enough to be reused subject to small refinement, and which of motion vectors are to be replaced completely.
If this is the case, as discussed in the paper xe2x80x9cMotion Vector Refinement for High-Performance Transcodingxe2x80x9d by J. Young, Ming-Ting Sun, and Chia-Wen Lin in the IEEE Transaction on Multimedia, Vol. 1, No. 1, March 1999, on page 30, the processing power is saved because it is used only for processing of a small subset of all motion vectors. In this paper, the optimality of an original motion vector is evaluated by performing the refinement scheme. In the refinement scheme, the optimal motion vector is obtained by refining the incoming motion vector within a small range of incoming motion vectores and by calculating how much gain it achieves in terms of MADxe2x80x94mean average difference. However, this is a highly subjective step because such gain depends on amount of motion in the video source, and no specific value can be used as a natural threshold.
What is needed is to perform the classification job by comparing different motion vectors between themselves. Indeed, if this is the case, if one has a set of neighboring vectors pointed to the same direction, this suggests that those motion vectors do correlate to the physical moving object that had been found by the original encoder. Therefore, these vectors can be considered optimal with a higher degree of probability than otherwise.
To address the shortcomings of the available art, the present invention discloses a new optimization scheme that allows to use the most relevant recovered from the original video stream original motion vectors in order to obtain in real time the optimized and most relevant motion vectors for the reconstructed video stream.
One aspect of the present invention is directed to a method of bandwidth scaling of a compressed video stream. In one embodiment, the method comprises the main steps (A) and (B).
At step (A), an original previously compressed video stream image having a first level of compression including a first level of quantization is decompressed. The original video stream comprises a set of original motion pictures, wherein each original video stream image comprises a set of original macro blocks further comprising a set of I source pictures, a set of P source pictures, and a set of B source pictures. The decompressed video stream image comprises a set of decompressed motion pictures, wherein each decompressed video stream image includes a set of decompressed macro further comprising a set of I decompressed pictures, a set of P decompressed pictures, and a set of B decompressed pictures.
More specifically, the step (A) further includes the following substeps. At the first substep (A1), a set of original motion vectors for each P source picture and each B source picture is recovered and saved. At substep (A2), reconstructed original video stream is recovered. The reconstructed original video stream differs from the original video stream by an amount of information lost during an original compression process of the original video stream. The reconstructed original video stream comprising a set of reconstructed original macro blocks further comprises a set of I reconstructed source (RS) pictures, a set of P reconstructed source (RS) pictures, and a set of B reconstructed source (RS) pictures.
At step (B), the decompressed video stream image is re-compressed to create a re-compressed video stream image having a second level of compression including a second level of quantization. The re-compressed video stream image comprises a set of re-compressed motion pictures. The re-compressed video stream image comprises a set of re-compressed macro blocks further comprising a set of I destination pictures, a set of P destination pictures, and a set of B destination pictures. In the preferred embodiment, the second level of compression is higher than the first level of compression, and the second level of quantization is stronger than the first level of quantization.
The step (B) further comprises the following substeps. At the first substep (B1), the set of recovered and saved original motion vectors is processed for each P source picture and each B source picture in order to create a set of usable source motion vectors for each P destination picture and each B destination picture.
At substep (B2), an interframe redundancy is removed from each P reconstructed source (RS) picture and from each B reconstructed source (RS) picture by using the set of usable source motion vectors. In each I (RS) picture the values of a set of pixels is independently provided. In each P (RS) picture, only the incremental changes in each pixel value from a preceding I (RS) picture or a preceding P (RS) picture are coded. In each B (RS) picture, a set of pixel values are coded with respect to both an earlier I (RS)/or P (RS) picture, and a later I (RS)/or P (RS) picture.
Next, at substep (B3), the intraframe redundancy is removed by performing a 2-dimensional discrete cosine transform (DCT) on a plurality of 8xc3x978 values matrices to map the spatial luminance or chrominance values into the frequency domain.
At the next substep (B4), a quantization process having the second level of quantization of each DCT coefficients is performed by weighting each element of each 8xc3x978 matrix in accordance with its chrominance or luminance type and its frequency.
At substep (B5), a run length coding for each weighted element of each 8xc3x978 matrix is performed. The run length coding is a lossless process wherein each 8xc3x978 matrix is represented as as an ordered list of a xe2x80x9cDCxe2x80x9d value, and alternating pairs of a non-zero xe2x80x9cACxe2x80x9d value and a length of zero elements following the non-zero xe2x80x9cACxe2x80x9d value.
Finally, at substep (B6), an entropy encoding scheme for each (RS) video stream is performed in order to further compress the representations of each DC block coefficient and each AC value-run length pairs using variable length codes. Thus, each original de-compressed video stream is re-compressed by using the set of reusable source motion vectors.
In the preferred embodiment of the present invention, the set of N1 motion vectors that substantially points out to a camera movement within at least one P/B source picture is determined by global frame processing of the set of all saved original motion vectors for each P source picture and each B source picture.
In one embodiment of the global frame processing, in the set N1 of motion vectors, for each pair comprising a first motion vector from the set of N1 motion vectors, and a second motion vector from the set of N1 motion vectors, a distance between the first motion vector and the second motion vector is checked whether it is less than a first predetermined value. In an alternative embodiment, in the subset N1 of motion vectors, a distance between each motion vector and the median value (or, in another embodiment, an average value) of a motion vector from the set N of motion vectors is checked whether it is less than the first predetermined value. N is an integer greater or equal to the first predetermined number Nthreshold1:Nxe2x89xa7N1xe2x89xa7Nthreshold1; N1 is an integer; Nthreshold1 is an integer. In one embodiment, the camera movement is detected if the number N1 of motion vectors is greater than N/2.
In one embodiment, the set of N1 motion vectors that substantially points out to a camera movement within at least one P/B source picture is further optimized by performing a narrow search in a narrow search area adjacent to the reference area in the reference picture. In one embodiment, the maximum size of the narrow search area is determined by the size of 5xc3x975 macro block area centered around the original motion vector. In an alternative embodiment, the maximum size of the narrow search area is determined by the size of 7xc3x977 macro block area centered around the original motion vector.
In one embodiment, the set of remaining (Nxe2x88x92N1) motion vectors is also optimized by performing a full search in a search area adjacent to the reference area in the reference picture in order to find an optimum motion vector that points out to an optimum matching macro block in the reference picture for each macro block in the P/B source picture. The size of the full search area depends on the amount of available processing power.
In the preferred embodiment of the present invention, the set of N2 motion vectors that substantially points out to at least one moving object within at least one P/B source picture is also determined. It is done by local frame processing of the set N of all saved original motion vectors for each P source picture and each B source picture.
In one embodiment of the local frame processing, in the subset N2 motion vectors, for a pair comprising a first motion vector from the subset of N2 motion vectors, and a second motion vector from the subset of N2 motion vectors, a distance between the first motion vector and the second motion vector is checked whether it is less than a second predetermined value. N2 is an integer greater or equal to the second predetermined number Nthreshold2:N2xe2x89xa7Nthreshold2; Nthreshold2 is an integer. If this is the case, the pair of motion vectors belongs to the subset of all Nmovingxe2x80x94object of motion vectors. By repeating this process, subset of all Nmovingxe2x80x94object of motion vectors that substantially points out to substantially all moving objects within at least one P/B source picture is recovered. Nmovingxe2x80x94object is an integer less or equal to N.