1. Field of the Invention
This invention relates to video formatting and the like. More specifically the invention relates to efficient conversion between transmission or storage formats and editing formats. The invention is particularly focused on conversion between DV and MPEG formatting.
2. Background of the Invention
DV is an SMPTE (Society of Motion Picture and Television Engineers) standardized digital video format targeted at acquisition and editing of professional quality video. A number of equipment vendors have adopted DV in their digital video products. Because MPEG is another popular digital video format standardized by ISO (International Standardization Organization), the industry has been making an effort to promote the interoperability between DV and MPEG by developing a software-only real-time transcoder between those two formats. A fundamental problem with transcoding between the DV and MPEG formats is bit number prediction. This typically arises in I frame-based MPEG2 to DV transcoder development as well as in DV encoding process generally.
In both DV and MPEG, video image data is compressed by transforming it into the frequency domain, using a discrete cosine transform (DCT), and by variable length coding (VLC) the transformed data. To best serve its target applications, DV is a DCT/VLC-based, I-frame only fixed frame-length format. To best serve tape recording replaying and to provide synchronized transmission, each frame is further broken down into many xe2x80x9csegmentsxe2x80x9d, each consisting of five xe2x80x9cmacroblocksxe2x80x9d. The DV macroblocks are similar to MPEG macroblocks, with several exceptions, including that DV each segment has a fixed bit length while there is no such constraint in MPEG. Because adjacent macroblocks have strong correlation and usually have similar coding complexity, the positions of the five macroblocks in each segment are sparsely scattered in the frame according to a pseudo-random deterministic pattern. This reduces the possibility of xe2x80x9cbit overflowxe2x80x9d or xe2x80x9cbit vacancy,xe2x80x9d Which would likely occur if the five macroblocks of a segment were close to each other and had similar complexity. This patterning of the macroblocks would produce either a large number of bits or very few bits in each segment.
Because each DV segment has a fixed bit budget, if the coding process produces more bits than a segment can accommodate, some high frequency DCT coefficients will be discarded. To control the number of bits produced, each macroblock is assigned a quantization number. Quantization numbers fall in the range of 0-15, with 15 corresponding to the finest quantizer and 0 to the coarsest. The quantization number is the index of the pre-determined 8xc3x978 xe2x80x9cquantization matrices.xe2x80x9d These matrices consist of the quantization steps for each DCT coefficient.
A fundamental problem faced in the encoding/transcoding process is how to achieve the best picture quality given the fixed bit space assigned to each segment. Naturally it would be preferable to fully utilize this bit budget, as any unused bit space can not be used by other segments and, thus, is wasted. It is not proper, however, to use always the finest quantizer and produce the greatest number of bits, even though with this strategy there will not be any wasted bit space. If all the VLC bits can fit in the budget with the finest quantizer (quantization number=15) then that is the optimal solution; otherwise, there are two directions to go to within the range of all possible quantizers. The two directions are either (1) to choose a finer quantizer and discard high frequency bits that can not fit in, or (2) to choose a coarser quantizer to fit in high frequency bits at the expense of increasing the quantization error of the lower frequency coefficients.
Currently the solution to the above problem that has been adopted by the majority of DV encoders in the industry is to select the quantization numbers, which just fit in all the bits into the given space. Any finer quantizer will cause the coefficients to be discarded. Experiments prove that this scheme produces better picture quality than simply using the finest quantizer at all times, although it can be shown that this solution is still not optimal.
At present the criterion for selecting a quantization number is based on the fixed space criterion. The non-optimality of the prior art criteria for quantization table selection is the result of the use of a fixed space for all bits. The selected quantization number N should be such that, with quantization number N all the bits can fit in the fixed space, but with quantization of N+1 (if N is not the maximum possible number 15, which corresponds to the finest quantizer), there will be some overflow bits. This criterion is referred to as the xe2x80x9cbest fitxe2x80x9d criterion.
The best fit criterion does not necessarily produce the best picture quality given the fixed bit space. Without the loss of generality, suppose that just one DCT block is to be coded, given the bit space according to the xe2x80x9cbest fitxe2x80x9d criterion. In this example, all of the AC coefficients for the DCT can be written in a line according to their scan order. In this representation the first line is the coefficient index, the second line is the coefficient values and the third line is the area number.
Suppose we have the following specific DCT block with coefficient values of 1 for coefficients 45 and 62.
If with quantization number QN equals M, the quantization step for area 3 is 1 and a coefficient 62 must be discarded because of the limited space. But if QN is Mxe2x88x921, the quantization step for the area becomes 2 (the quantization steps for other areas are unchanged) and all the bits can fit in the space allocated. According to the xe2x80x9cbest fitxe2x80x9d criterion, therefore, QN should be Mxe2x88x921. Because of the larger step size when QN is Mxe2x88x921, both coefficient 45 and 62 become zero and are effectively discarded. When the quantization number M chosen, however, coefficient 45 in still encoded although coefficient 62 is not. Clearly QN equal to M does better than Mxe2x88x921 and xe2x80x9cbest fitxe2x80x9d criterion does not provide the best possible result for this block.
The typical method used to find the xe2x80x9cbest fitxe2x80x9d quantizer, is to try all the candidates one by one, beginning with the finest quantizer, until all of the quantized bits fit into the allotted space. Each try requires a quantization, a run-length scan and many VLC table look-ups to determine the number of bits that are going to be produced from the VLC stage. Given that each frame has hundreds of segments and assuming a frame rate of 30 frames/second, this brute force approach becomes a bottleneck for a real-time software implementation, or a bulky part in a hardware implementation of the encoder.
The DV standard and consists of two sub-standards: DV25 and DV50. DV25 is a 4:1:1 format for semi-professional quality video while DV50 is a 4:2:2 format for professional, studio quality video. The industry has also developed DVHD format for the HDTV applications, based on the DV standard. The bit rates for this standard are 25 Mbps, 50 Mbps and 100 Mbps (DVHD) respectively. These formats are very similar to each other and can be understood through a description of DV50.
As in many video standards, in DV50 each frame is divided into 8xc3x978 DCT blocks and every four DCT blocks (2 luminance 8xc3x978 DCT blocks and two chrominance 8xc3x978 DCT blocks) constitute a macroblock in the pattern illustrated in FIG. 1.
There are two DCT modes in DV; 88-DCT and 248-DCT. 88-DCT is just the regular 8xc3x978 DCT, which is also used in MPEG in a xe2x80x9cframe DCTxe2x80x9d mode. The 248-DCT is intended to code those DCT blocks with relatively large xe2x80x9cintra-framexe2x80x9d motion, that is to say, motion between the two interlaced fields that make up a video frame. This mode still uses an 8xc3x978 DCT block, but each block consists of two 4xc3x978 DCT transforms of two sub blocks derived from the two fields.
As described above, DV50 is an I-frame only coding standard in which the smallest independent coding unit is a segment consisting of 5 macroblocks positioned sparsely according to a deterministic pattern. This spreading of the blocks allows for the efficient use of fixed bit space allocated for each segment.
Although there is a pattern according to which the bits of the five macroblocks are distributed, essentially there is no further limitation on the bit space for each macroblock within a segment. All bits share the fixed bit space assigned to a segment, which is 2560 bits for all AC coefficients, excluding the EOB (End of Block) bits. The constraint is that the sum of the coded bits of the five macroblocks should not exceed the segment""s limit; otherwise some bits must be discarded.
Each DCT coefficient is multiplied by a predetermined weighting number, after the DCT transform stage and weighting each DCT block goes through two steps before it is run-length coded: (1) classification and (2) quantization. Based on some overall characteristics of the coefficients of a DCT block, a class number of 0-3 is selected. If the DCT block falls into class 3, its AC coefficients are divided by two, shifted right by one bit. This is called xe2x80x9cscalingxe2x80x9d. Essentially, both of the operations above can be viewed as a part of the quantization process, but it is preferred to follow the terms in the standard.
The next step is to choose the quantization number (QNO), which determines quantization matrix to be used. The quantization matrix specifies the quantization step in use for each of the 64 coefficients in the DCT block. Instead of giving the quantization step for each coefficient explicitly, each DCT block is divided into 4 areas according to a predetermined pattern, as shown in FIG. 2. This pattern is different for 88-DCT and 248-DCT. The quantization step is the same for each area. Table 1 shows the quantization step for each area, for different classes and quantization table numbers. By way of example, for a coefficient in area xe2x80x9c2xe2x80x9d, given the class number xe2x80x9c1xe2x80x9d and the quantization table number xe2x80x9c10xe2x80x9d, the quantization size step is then determined as follows. In table 3, we locate the quantization number xe2x80x9c10xe2x80x9d in the column corresponding to class xe2x80x9c1xe2x80x9d and denote the row in which xe2x80x9c10xe2x80x9d resides as xe2x80x9crow Xxe2x80x9d. Then we find the column corresponding to area xe2x80x9c2xe2x80x9d and denote that column as xe2x80x9ccolumn Yxe2x80x9d. The number (here it is xe2x80x9c2xe2x80x9d) in row X and column Y is the desired quantization step.
Even though there are 16 quantization numbers, there are only 9 distinctive quantization matrices. These are identified by the quantization matrix number (QMN).
The DC coefficient (coefficient index 0) does not belong to any area. It is coded with a fixed number of bits after the weighting.
Not every DCT block is assigned a separate quantization number. Instead, each macroblock has a quantization number. All four DCT blocks in that macroblock share that quantization number. Therefore, each DCT block has its own class number and a common quantization number, shared with other DCT blocks in the same macroblock. Thus for a segment, five quantization numbers are to be determined.
After the quantization, a scan is performed on each DCT block to convert the two-dimension 8 by 8 matrix into a one-dimension vector with 64 coefficients. The scan process also determines the (run, amplitude) combinations. xe2x80x9cRunxe2x80x9d is the number of consecutive zero coefficients that are scanned before a non-zero coefficient is encountered. xe2x80x9cAmplitudexe2x80x9d is the value of that non-zero coefficient. A quantized DCT block can be represented by a number of (run, amplitude) combinations, which is called xe2x80x9crun-lengthxe2x80x9d coding.
Finally, the (run, amplitude) combinations are coded into binary bits with a VLC table, which assigns prefix-free, variable length binary code words to those combinations.
The entire DV encoding process is illustrated by FIG. 3, including the quantization table selection. The quantization selection table is a focus of the present invention.
One prior art scheme is used in industrial encoders. The quantization selection process within an industrial DV encoder is carried out in two stages: uniform quantization number selection, and fine adjustment.
In the uniform quantization number selection stage, the candidate quantizer sets all five quantization numbers for the five macroblock in a segment with one single number. The process can be described as following (TQNO: tentative quantization number; QNOi: the quantization number for the ith macroblock in the segment. i=1,2,3,4,5):
1. TQNO=15 (finest quantizer)
2. if TQNO equals to zero, QNOi=TQNO (i=1,2,3,4,5), exit;
3. Quantize all DCT blocks (20 of them) with QNOi=TQNO (i=1,2,3,4,5)
4. Run-length scan and VLC codeword length lookup
5. If all the bits can fit in the fixed space, or TQNO equals to zero, exit.
6. TQNO=TQNO-1
7. Goto 2
After the uniform quantization number selection, stage all macroblocks in a segment have the same quantization number. The fine adjustment procedure is intended to fully use the bit space:
1. i=1;
2. if QNOi is less than 15, increase QNOi by 1; otherwise goto 5
3. Redo the quantization, run-length scan and VLC codebook look up for macroblock i;
4. If all the bits can not fit in the fixed space, restore QNOi to its previous value (decreased by 1).
5. Increase i by 1. If i is greater than 5, exit;
6. Goto 2;
This implementation assumes that the finest quantizer which can fit all the bits in is the xe2x80x9coptimalxe2x80x9d quantizer.
That scheme is basically an exhaustive search approach. Each search virtually goes through the whole coding process, except that in the VLC table look up, only the code lengths are needed and no real bits are generated. This may not bring savings in complexity after the quantizer selection is finished, because, in order to produce the coded data, the coding process that was used to select the quantizer must be repeated. That means one or more xe2x80x9ctest codingsxe2x80x9d as well as the actual coding are carried out to code each segment. This is computationally intensive, especially for complex pictures.
A second scheme incorporates two optimizations on the previous approach. First, it does not assume that the encoding of each segment starts from the finest quantizer. Instead, it selects the initial quantizer as the quantizer that was selected for the previous segment, which should have reasonable correlation with the segment next to it, the scheme makes the selection among the numbers around the previous quantization number. In addition, the proposed scheme uses the number of non-zero quantized coefficients to estimate whether one can fit in all the bits, instead of going through the whole coding process, though quantization is still needed. That estimation may not turn out to be robust, so an adaptive adjustment was added to monitor and correct any large deviation.
As can be seen, the quantization selection scheme used in industrial encoders is very computationally intensive. Their approach in suggests some optimizations on the first scheme, but with two major drawbacks; (1) every search still requires the quantization process, and (2) the adaptive adjustment offsets the effort to reduce of reducing the computational intensity by adding the extra complexity and introducing instability into the scheme.
According to the present invention, it is desired to present a scheme, which simplifies the quantizer selection process by predicting the VLC bit number, rather than computing the exact bit number. This scheme is used in the DV encoding module of the MPEG-to-DV software transcoder with performance very close to one of the brute force methods used in the industry today. The scheme reduces the computation complexity considerably and hence leads to more efficient hardware and software implementations.
According to the present invention, a bit number prediction method for VLC coded DCT blocks is developed, and based on that, a new scheme for selecting the quantization number in DV encoding/transcoding is developed.
According to the present invention, a heuristic-based approach is used to convert video data formatted according to a first protocol to video data formatted according to a second protocol. A set of parameters are first predicted and one of the parameters is set to a first initial value. New values are incrementally established for the parameter until a predetermined criterion for has been reached.
In the preferred embodiment of the invention, the first protocol contains variable frame size formatted video information, such as MPEG, and the second protocol contains fixed frame size formatted video information using a DV format.
According to a particular aspect of the invention, a variable length coded bit number is predicted for a video frame. The predicted variable length bit number is used to optimize quantization on a frame-by-frame basis, and a resultant variable length coded bit number is used to generate a DV rendition of the video frame.
According to one aspect of the present invention, a method for constructing a prediction model includes extracting an initial parameter P, setting the parameter to a full first initial value, and setting an initial incremental value. The method is used to converting variable frame size formatted video information to fixed frame size formatted video information. In each DCT block a new value of P, (Pn), is established as a function of the previously established value of P, (Pp). New values of P (Pn) are established until a predetermined criterion for incrementing has been reached.
According to a further aspect of the present invention, variable frame size formatted video information is converted to fixed frame size formatted video information. A quantizer number and a DCT coefficient for a correlation database are received and supplied to the correlation database. At least one parameter value is extracted from the DCT coefficient and provided to the correlation database. A correlation database is then output to a prediction model. The parameter is set to a full first initial value, and a new value of P, (Pn), is set for the DCT block based on the previously established value of P, (Pp).
The invention uses a parameter (P), which has a strong correlation with the number of bits that is needed to code the data. A second order polynomial is developed to approximate the correlation and used to predict the number of bits. In the quantization number selection process of DV encoding/transcoding, this prediction can be used to determine whether all the bits for a segment will fit in the limited space given a specific quantization number. This process replaces the method of performing the actual quantization, run-length scan and VLC code length look-up for each segment.
Currently the quantization number selection process is very computation intensive and done with a parallel hardware implementation in the Panasonic DV encoder. With this new scheme, the computation complexity of selecting the quantization number for DV encoding/transcribing is considerably reduced and picture quality is also improved. This innovation can either (1) lower the cost and power consumption in hardware implementation or (2) make a real-time software only MPEG to DV transcoder possible. The present invention attempts to eliminate both drawbacks in the prior art with a different heuristic-based approach.
According to the invention, some easily computable parameters are determined. The parameters that have a very strong correlation with the VLC bit number that is needed to code the data, and hence a relatively precise prediction model can be established given such correlation. The prediction model is desirably simple and does not require much computation power. The system is robust, which means no changes in the model are needed to adapt it to different video content. It is apparent that the number of the parameters should be as small as possible. More parameters imply more complex parameter prediction computation. Therefore, it is desirable to use just one such parameter.
Because it is not simple to build a model describing the VLC coding process, intuition and trial-and-fail method are implemented in the inventive approach according to a preferred embodiment. According to one aspect of the present invention, the VLC coding scheme assigns shorter codes to smaller run and amplitude. Based on that, several definitions of the parameter (P) are considered.