1. Field of the Invention
This invention relates to a method and apparatus for converting picture information. More particularly, it relates to a picture information conversion method and apparatus for use in receiving picture information of, e.g., MPEG pictures compressed by orthogonal transform, such as discrete cosine transform, and motion compression (bitstream), over a broadcast satellite, cable TV or a network medium, such as the Internet, or in processing the bitstream on a recording medium, such as an optical disc or a magneto-optical disc.
2. Description of Related Art
There has so far been presented a picture information compression system, such as MPEG, for compressing the picture information by motion compression, by exploiting the redundancy proper to the picture information, and with a view to handling the picture information as digital data and to high-efficiency transmission and storage of the information. The apparatus conforming to this picture information compression method is finding widespread use in information distribution by, e.g., a broadcasting station and in information reception in homes.
In particular, the MPEG2 (ISO/IEC 13818-2) is defined as being a comprehensive picture encoding system applicable to both interlaced and progressive scanned pictures, to standard definition pictures, and to high-definition pictures.
That is, in the MPEG2 encoding compression system, codes of a bitrate of 4 to 8 Mbps are allocated to an interlaced scanned picture of a standard resolution with 720×480 pixels, and codes of a bitrate of 18 to 22 Mbps are allocated to a progressive scanned picture of a standard resolution with 1920×1088 pixels to realize a high compression factor and a high picture quality.
In light of the above, the MPEG2 is estimated to continue to be in extensive use in professional and consumer use.
However, the MPEG2 is mainly intended for high picture quality encoding for broadcasting, while it is not adapted to a coderate lower than that in MPEG 1, that is, it is not adapted to an encoding system with a higher compression rate.
On the other hand, it may be predicted that the need for an encoding system with a higher compression rate will continue to increase. In order to cope with this situation, standardization of the MPEG4 encoding system with a high compression rate is underway. For this picture encoding system, the international standardization was acknowledged in December 1998 as ISO/IEC 14496-2.
Meanwhile, there also exists the need for converting the MPEG2 compressed picture information, once encoded for digital broadcast, into compressed picture information (bitstream) of a lower code rate more amenable to processing on a portable terminal.
For accommodating these needs, there is presented a picture information converting apparatus (transcoder) in “Field-to-Frame Transcoding with Spatial and Temporal Downsampling” (Susie L. Wee, John G. Apostolopoulos, and Nick Feamster, ICIP 99; referred to below as reference 1).
As shown in FIG. 1, the picture information converting apparatus (transcoder) presented in this reference 1 is made up of a picture type decision unit 1, an MPEG2 picture information decoding unit (I/P picture) 2, a decimating unit 3, an MPEG2 picture information encoding unit (I/P-VOP)4, a motion vector synthesis unit 5 and a motion vector detection unit 6.
This picture information converting apparatus is fed with the interlaced scanned MPEG2 compressed picture information (bitstream) made up of an intra-coded picture (I-picture) obtained on intra-frame coding, a forward predicted picture (P-picture) obtained on predictive coding by referring to a forward direction in the display sequence, and a bi-directionally coded picture (B-picture) obtained on predictive coding by referring to the forward and backward directions in the display sequence.
This MPEG2 compressed picture information (bitstream) is discriminated in the picture type decision unit 1 as to whether it is of an I/P picture or of a B-picture. Only the I/P picture is output to the next following MPEG2 picture information decoding unit (I/P picture) 2, while the B-picture is discarded.
Similarly to the processing in a routine MPEG2 picture information decoding unit, the processing in the MPEG2 picture information decoding unit (I/P picture) 2 decodes the MPEG2 compressed picture information (bitstream) into picture signals,
The pixel value output by the MPEG2 picture information decoding unit (I/P picture) 2 is input to the decimating unit 3, which then decimates the pixel value by ½in the horizontal direction, while leaving only one of the data of the first field and the data of the second field and discarding the other. By this decimation, there is produced a progressive scanned picture having a size equal to ¼of the input picture information.
The progressive scanned picture generated by the decimating unit 3 is encoded by the MPEG2 picture information encoding unit (I/P-VOP) 4 into an intra-frame-coded I-VOP and a P-VOP obtained on predictive coding by referring to the forward direction in the display sequence, and it is output as the MPEG4 compressed picture information (bitstream). Meanwhile, VOP means a video object plane and is equivalent to a frame in MPEG2.
The motion vector information in the input MPEG2 compressed picture information (bitstream) is mapped in the motion vector synthesis unit 5 into a motion vector for the as-decimated picture information. The motion vector detection unit 6 detects the high precision motion vector based on the motion vector value synthesized in the motion vector synthesis unit 5.
Reference 1 discusses a picture information converting apparatus for generating the MPEG2 compressed picture information (bitstream) having a size equal to ½×½of the input MPEG2 compressed picture information (bitstream). That is, if the input MPEG2 compressed picture information (bitstream) is in meeting with the NTSC (National Television System Cornmittee), the output MPEG4 compressed picture information (bitstream) is of an SIF size (352×240 pixels).
Meanwhile, in the picture information converting apparatus shown in FIG. 1, the code rate control in the MPEG4 picture information encoding unit (I/P-VOP) 4 represents a significant factor in determining the picture quality in the MPEG4 compressed picture information. In the ISO/IEC 14496-2, there is no particular definition as to the coderate controlling system, such that each vendor may use a system that is possibly optimal from the viewpoint of the processing volume and the output picture quality depending on the particular application. The system discussed in MPEG2 Test Model 5(ISO/IEC JTCI/SC29/WG11 NO400) is hereinafter explained as a typical coderate controlling system.
The code rate control flow is now explained by referring to the flowchart of FIG.2. At a first step S11, the picture information encoding unit (I/P-VOP)4 allocates bits to each picture, with the target code rate (target bitrate) and the GOP (group of pictures) as input variables. It is noted that a GOP means a set ofpictures accessible at random.
That is, at step S11, the picture information encoding unit (I/P-VOP) 4 distributes bits to be allocated to each picture in the GOP based on the volume of bits allocated to a picture not as yet decoded in the GOP inclusive of pictures intended for allocation. This bit volume is referred to below as R. This distribution is repeated in the sequence of the encoded pictures in the GOP. In this case, coderate allocation to each picture is made using two assumptions as now explained.
It is first assumed that the product of an average quantization scale code used in encoding each picture and the volume of the codes generated is unchanged from one picture type to another as long as the picture displayed is not changed. Based on this supposition, variables Xi, Xp and Xb representing the picture complexity (global complexity measure) are updated after encoding each picture in accordance with the following equation (1) from one picture type to another:X1=Si·QiXp=Sp·QpXb=Sb·Qb  (1)
It is noted that Si, Sp and Sb denote the volumes of the codes generated on picture encoding, and Qi, Qp and Qb are average quantization scale codes at the time of picture encoding. On the other hand, the initial value, in terms of the target bitrate bit_rate [bits/sec], is as indicated in the following equation (2):Xi=160×bit_rate/115Xp=60×bit_rate/115Xb=42×bit_rate/115   (2).
Second, it is assumed that the overall picture quality is optimized at all times when the proportions Kp, Kb of the quantization scale code of the P- and B-pictures, referenced to the quantization scale code of an I-picture, are of values defined in the following equation (3):Kp=1.0; Kb=1.4  (3)
That is, the quantization scale code of a B-picture is set at all times so as to be 1.4 times the quantization scale codes of the I- and P-pictures. This is, based on the supposition that if the volume of the codes that can be saved in a B-picture by encoding the B-picture slightly more coarsely than the I- and P-pictures is added to the code volume of the I- and P-pictures, the I- and P-pictures can be improved in picture quality, so that the B-picture which refers to these also can be improved in picture quality.
From the above-mentioned twoassumptions, the volumes of bits allocated to each picture of the GOP (Ti, Tp, Tb) are as indicated by the following equation (4):                                           T            i                    =                      max            ⁢                          {                                                R                                      1                    +                                                                                            N                          p                                                ·                                                  X                          p                                                                                                                      X                          i                                                ·                                                  K                          p                                                                                      +                                                                                            N                          b                                                ·                                                  X                          b                                                                                                                      X                          i                                                ·                                                  K                          b                                                                                                                    ,                                                                            bit                      —                                        ⁢                    rate                                                        8                    ×                                          picture                      —                                        ⁢                    rate                                                              }                                      ⁢                                  ⁢                              T            p                    =                      max            ⁢                          {                                                R                                                            N                      p                                        +                                                                                            N                          b                                                ·                                                  K                          p                                                ·                                                  X                          b                                                                                                                      K                          b                                                ·                                                  X                          p                                                                                                                    ,                                                                            bit                      —                                        ⁢                    rate                                                        8                    ×                                          picture                      —                                        ⁢                    rate                                                              }                                      ⁢                                  ⁢                              T            b                    =                      max            ⁢                                          {                                                      R                                                                  N                        b                                            +                                                                                                    N                            p                                                    ·                                                      K                            b                                                    ·                                                      X                            p                                                                                                                                K                            p                                                    ·                                                      X                            b                                                                                                                                ,                                                                                    bit                        —                                            ⁢                      rate                                                              8                      ×                                              picture                        —                                            ⁢                      rate                                                                      }                            .                                                          (        4        )            where Np and Nb denote the number of P- and B-pictures, respectively, not as yet encoded in the GOP.
Based on the value of the allocated codes thus found, the Volume of bits R allocated to uncoded pictures in a GOP is updated in accordance with the following equation (5) each time a picture is encoded in accordance with steps 511 and 512:R=R−Si,p,b  (5)
On the other hand, in encoding the first picture of the GOP, R is updated in accordance with the equation (6):                     R        =                                            bit_rate              ×              N                        picture_rates                    +          R                                    (        6        )            where N is the number of pictures in a GOP. The initial value of R at the beginning of a sequence is 0.
Then, at step S12, the picture information encoding unit (I/P-VOP) 4 performs rate control using a virtual buffer. That is, at step S12, the picture information encoding unit (I/P-VOP) 4 finds the quantization scale code by macro-block based feedback control, based on the capacitance of three types of the virtual buffer set independently for the respective pictures, in order to make the volume of allocated bits for the respective pictures found by the equation (4) at step S11(Ti, Tp, Tb) coincident with the actual volume of generated codes.
Before proceeding to the encoding of the jth macroblock, the occupancy volume of the virtual buffer is found by the following equation (7):                                           d            j            i                    =                                    d              o              i                        +                          B                              j                -                1                                      -                                                            T                  1                                ×                                  (                                      j                    -                    1                                    )                                            MB_cnt                                      ⁢                                  ⁢                              d            j            p                    =                                    d              o              p                        +                          B                              j                -                1                                      -                                                            T                  p                                ×                                  (                                      j                    -                    1                                    )                                            MB_cnt                                      ⁢                                  ⁢                              d            j            b                    =                                    d              o              b                        +                          B                              j                -                1                                      -                                                            T                  b                                ×                                  (                                      j                    -                    1                                    )                                            MB_cnt                                                          (        7        )            
It is noted that doi, dop, dob are initial occupancy volumes ofthe virtual buffers, Bj is the volume of bits generated from the leading end of a picture up to thejth macro-block and MB_cnt is the number of macroblocks in one picture. The occupancy ofthe virtual buffer at the time of end of encoding of each picture (dMB_cnti, dMB_cntp, dMB_cntb) is used as initial values (doi, dop, dob) of the occupancy of the virtual buffer for the next picture in the same picture type.
The quantization scale code for the jth macroblock is then calculated in accordance with equation (8):                               Q          j                =                                            d              j                        ×            31                    r                                    (        8        )            where r is a variable controlling the feedback loop response, termed a reaction parameter, and is given by equation (9):                     r        =                  2          ×                                    bit_rate              picture_rate                        .                                              (        9        )            
Meanwhile, the initial value of the virtual buffer at the time of beginning the equation (10):                                           d            o            i                    =                      10            ×                          r              31                                      ⁢                                  ⁢                              d            o            p                    =                                    K              p                        ·                          d              o              i                                      ⁢                                  ⁢                                            d              o              b                        =                                          K                pb                            ·                              d                o                i                                              ,                                    (        10        )            
Finally, at S 13, the picture information encoding unit (I/P-VOP) 4 performs macro-block based adaptive quantization taking psychoacoustic characteristics into account. That is, at step S13, the picture information encoding unit (I/P-VOP) 4 varies the quantization scale code found at step S12 by a variable termed macroblock-based activity in such a manner that the quantization scale code will be quantized finely and coursely in a monotonous pattern portion where deterioration tends to be visually outstanding and in complex pattern portion where deterioration is less likely to be outstanding, respectively.
The activity is given, using luminance signal pixel values of an original picture, four blocks in the frame DCT mode and four blocks in the field DCT mode, totaling eight blocks, by the following equation (11):                                           act            j                    =                      1            +                                          min                                                      sblk                    =                    1                                    ,                  8                                            ⁢                              (                var_sblk                )                                                    ⁢                                  ⁢                                            var              —                        ⁢            sblk                    =                                    1              64                        ⁢                                          ∑                                  k                  =                  1                                64                            ⁢                                                (                                                            P                      k                                        -                                          P                      _                                                        )                                2                                                    ⁢                                  ⁢                              P            _                    =                                    1              64                        ⁢                                          ∑                                  k                  =                  1                                64                            ⁢                              P                k                                                                        (        11        )            where Pk is a pixel value in a luminance signal block of an original picture. The purpose of taking a minimum value in equation (11) is to refine the quantization if there is a monotonous pattern portion even in a portion of the macroblock.
The normalized activity Nactj, the value of which assumes a value in a range from 0.5 to 2, is found by equation (12):                               Nact          j                =                                            2              ×                              act                j                                      +            avg_act                                              act              j                        +                          2              ×              avg_act                                                          (        12        )            where avg_act is an average value of actj in a picture encoded directly previously.
The quantization scale code mquantj, which takes psychoacoustic characteristics into account, is given based on the quantization scale code Qj obtained at step S12 in accordance with the following equation (13):mquantj=Qj×NactI  (13)
The above-described code volume controlling system, defmed in MPEG2 Test Model 5, is known to suffer from the following limitations, such that, in actual control, measures need to be taken against these limitations. That is, the first limitation is that the first step S11 cannot cope with a scene change and, after a scene change, the parameter avg_act used at step S13 takes on an incorrect value. The second limitation is that there is no assurance that the constraint condition of VBV (video buffer verifier), as provided in MPEG2 and MPEG4, can be met.
Meanwhile, in the execution of the equation (11), it is necessary to calculate the totality of the average values and variance values of the pixel values for each macroblock, thus necessitating voluminous processing operations. There also are occasions where the fact that avg_act in the equation (12) is not an average value in the frame but is an average value in the directly previous frame and it obstructs stable coderate control.