This invention relates to an image information conversion apparatus and an image information conversion method, and more particularly to an image information conversion apparatus and an image information conversion method which are used to receive, through network media such as a satellite broadcast, a cable television broadcast or the Internet or process, on a recording medium such as an optical disk or a magneto-optical disk, image information in the form of a bit stream compressed by orthogonal transform such as discrete cosine transform and motion compensation.
In recent years, an apparatus which complies with a method wherein image information is handled as digital data and the redundancy unique to image information is utilized to compress image information by orthogonal transform such as, for example, discrete cosine transform and motion compensation in order to allow transmission and storage of information with a high efficiency has been popularized in both of information distribution from a broadcasting station or the like and information reception by ordinary households.
Particularly, the MPEG(Moving Picture Experts Group)2 standardized by the MPEG is defined as a general purpose image coding system in the ISO/IEC 13818-2 and covers both of interlaced scan images and progressive scan images as well as standard resolution images and high resolution images. Therefore, it is expected that the MPEG2 be used by wide varieties of applications from professional applications to consumer applications in the future.
Where such an MPEG2 compression system as described above is used, realization of a high compression ratio and a good picture quality can be anticipated by allocating, to interlaced scan images of a standard resolution having, for example, 720×480 pixels, a code amount (hereinafter referred to as bit rate) of 4 to 8 Mbps or by allocating, to interlaced scan images of a high resolution having, for example, 1,920×1,088 pixels, a bit rate of 18 to 22 Mbps.
The MPEG2 is directed to high picture quality coding suitable principally for broadcasting, but is not ready for a coding system of a bit rate lower than, that is, of a compression ratio higher than, that of the MPEG1. However, from popularization of portable terminals, it has been expected that the need for a coding system of a higher compression ratio increases in the future. Therefore, the MPEG4 coding system has been standardized, and the image coding system of the MPEG4 was approved as international standards of the ISO/IEC 14496-2 in December 1998.
In order to process MPEG2 image compression information (hereinafter referred to as MPEG2 bit stream) coded once so as to be suitable for digital broadcasting on a portable terminal or the like, it is demanded to convert the MPEG2 bit stream into MPEG4 image compression information (hereinafter referred to as MPEG4 bit stream) of a lower bit rate.
An image information conversion apparatus (transcoder) which satisfies the demand is disclosed in Susie J. Wee, John G. Apostlopoulos and Nick Feamster, “Field-to-Frame Transcoding with Spatial and Temporal Downsampling”, ICIP '99 (hereinafter referred to as document 1). The image information conversion apparatus mentioned is shown in FIG. 5.
Referring to FIG. 5, the image information conversion apparatus 101 shown includes a picture type discrimination section 111, an MPEG2 image information (I picture and P picture) decoding section 112, a reduction section 113, a video memory 114, an MPEG4 image information (I/P-VOP) coding section 115, a motion vector synthesis section 116, and a motion vector detection section 117. It is to be noted that the VOP (Video Object Plane) in the MPEG4 corresponds to the frame in the MPEG2.
The picture type discrimination section 111 receives data of frames of MPEG2 image compression information (hereinafter referred to as MPEG2 bit stream) of an interlaced scan as an input thereto and discriminates whether data of each frame is of MPEG2 image information (hereinafter referred to as I picture and P picture which signify an intra-image coded picture and a forward predictive coded picture, respectively) or of a B picture (bi-directionally predicted picture). The picture type discrimination section 111 outputs only the former data to the MPEG2 image information decoding section 112 of the following stage.
The MPEG2 image information decoding section 112 executes processing similar to that of an ordinary MPEG2 image information decoding section. However, since data regarding B pictures are discarded by the picture type discrimination section 111, it is only required for the MPEG2 image information decoding section 112 to have a function of decoding only I/P pictures.
The reduction section 113 receives pixel values from the MPEG2 image information decoding section 112 and performs processing of reducing the pixel values to ½ in the horizontal direction and discarding data of one of the first and second fields in the vertical direction while leaving data of the other field to produce a progressive scan image having a size of ¼ that of the inputted image information.
If the MPEG2 bit stream inputted from the MPEG2 image information decoding section 112 represents images compliant with the standards of the NTSC (National Television System Committee), that is, interlaced scan images of 720×480 pixels and 30 Hz, then the images after the reduction by the reduction section 113 have a size of 360×240 pixels. However, in order to allow the processing in a unit of a macro block when the MPEG4 image information coding section 115 in a succeeding stage performs coding, the pixel numbers both in the horizontal and vertical directions must be multiples of 16. Accordingly, the reduction section 113 further performs supplementation or discarding of pixels for satisfying the requirement. In particular, in the specific case described above, eight lines, for example, at the right end or the left end in the horizontal direction are discarded so that the image has a size of 352×240 pixels.
The progressive scan image produced by the reduction section 113 is stored into the video memory 114 and then undergoes coding processing by the MPEG4 image information coding section 115, and is outputted as an MPEG4 bit stream.
Motion vector information in the inputted MPEG2 bit stream is supplied to the motion vector synthesis section 116, by which it is mapped to motion vectors for the image information after the reduction.
The motion vector detection section 117 detects motion vectors of a high degree of accuracy based on the motion vector values synthesized by the motion vector synthesis section 116.
The image information conversion apparatus 101 disclosed in document 1 produces an MPEG4 bit stream of progressive scan images having a size of ½×½ that of an inputted MPEG2 bit stream. For example, where the inputted MPEG2 bit stream complies with the NTSC standards, the MPEG4 bit stream to be outputted has the SIF size (352×240 pixels). The image information conversion apparatus 101 can convert the inputted MPEG2 bit stream also into an image of any other image size, for example, the QSIF (176×112 pixels) size which is a size of approximately ¼×¼ in the example described above, by modifying the operation of the reduction section 113.
Further, the image information conversion apparatus 101 performs, as a process by the MPEG2 image information decoding section 112, a decoding process using all of eighth-order discrete cosine transform coefficients in the inputted MPEG2 bit stream for the horizontal and vertical directions or a decoding process using only low-frequency components from among eighth-order discrete cosine transform coefficients only for the horizontal direction or for both of the horizontal and vertical directions thereby to reduce the arithmetic operation amount for the decoding process and the video memory capacity while suppressing the picture quality deterioration to the minimum.
In the image information conversion apparatus 101 shown in FIG. 5, the code amount control of the MPEG4 image information coding section 115 makes a significant factor of determination of the picture quality of an MPEG4 bit stream. In the ISO/IEC 14496-2, the system for code amount control is not specifically prescribed, and each vendor can use a system which is considered optimum from the point of view of the arithmetic operation amount and the output picture quality in accordance with an application to be used. In the following, a system prescribed in the MPEG2 Test Mode 15 (ISO/IEC JTC1/SC29/WG11 N0400) as a representative code amount control system is described.
For the code amount control, bit distribution to each picture is performed as a first step using a target code amount (target bit rate) and a GOP (Group of Pictures) configuration as input variables. The GOP signifies a group of a plurality of pictures of different types arrayed in accordance with certain specifications. Then, rate control is performed using a virtual buffer, whereafter adaptive quantization for each macro block is performed finally taking a visual characteristic into consideration. The operation of the code amount control is illustrated in FIG. 6.
Referring to FIG. 6, first in step S101, the MPEG4 image information coding section 115 distributes an allocation bit amount for each picture in a GOP in accordance with a bit amount (hereinafter represented by R) to be allocated to those pictures which are not decoded as yet including allocation object pictures. This distribution is repeated in order of coded pictures in the GOP. In this instance, the code amount allocation to each picture is performed based on the following two assumptions.
First, it is assumed that the product of an average quantization scale code to be used for coding of each picture and the generated code amount is fixed for each picture type unless the screen does not change. Therefore, after each picture is coded, variables Xi, Xp and Xb (global complexity measures) each representative of the complexity of the screen are updated in accordance with the following expressions (1) to (3) for individual picture types:Xi=Si·Qi  (1)Xp=Sp·Qp  (2)Xb=Sb·Qb  (3)where Si, Sp and Sb are the generated code bit amounts upon picture coding, and Qi, Qp and Qb are average quantization scale codes upon picture coding. The variables Xi, Xp and Xb have initial values represented by the following expressions (4) to (6), respectively, using the target code amount (target bit rate) bit_rate [bits/sec]:Xi=160×bit_rate/115  (4)Xp=60×bit_rate/115  (5) Xb=42×bit_rate/115  (6)
Secondly, it is assumed that the picture quality of the entire image is always optimized when the ratios Kp and Kb of the quantization scale code of P and B pictures with reference to the quantization scale code of an I picture have values defined by the following expression (7):Kp=1.0; Kb=1.4  (7)
In particular, the quantization scale code of a B picture is always 1.4 times that of the quantization scale codes of I and P pictures. Here, it is supposed that, by coding a B picture rather roughly than I and P pictures, if the code amount saved with a B picture is added to that of an I or P picture, then the picture quality of the I or P picture is improved, and also the picture quality of a B picture which refers to the I or P picture is improved.
From the two assumptions specified as above, the allocation bit amounts (Ti, Tp, Tb ) to the different pictures of the GOP have values given by the following expressions (8) to (10), respectively:                               T          i                =                  max          ⁢                      {                                          R                                  1                  +                                                                                    N                        p                                            ·                                              X                        p                                                                                                            X                        i                                            ·                                              K                        p                                                                              +                                                                                    N                        b                                            ·                                              X                        b                                                                                                            X                        i                                            ·                                              K                        b                                                                                                        ,                              bit_rate                                  8                  ×                  picture_rate                                                      }                                              (        8        )                                          T          p                =                  max          ⁢                      {                                          R                                                      N                    p                                    +                                                                                    N                        b                                            ·                                              K                        p                                            ·                                              X                        b                                                                                                            K                        b                                            ·                                              X                        p                                                                                                        ,                              bit_rate                                  8                  ×                  picture_rate                                                      }                                              (        9        )                                          T          b                =                  max          ⁢                      {                                          R                                                      N                    b                                    +                                                                                    N                        p                                            ·                                              K                        b                                            ·                                              X                        p                                                                                                            K                        p                                            ·                                              X                        b                                                                                                        ,                              bit_rate                                  8                  ×                  picture_rate                                                      }                                              (        10        )            where Np and Nb are the numbers of P and B pictures which are not coded in the GOP as yet.
Based on the allocated code amounts determined in this manner, each time a picture is coded in steps S101 and S102, the bit amount R to be allocated to a non-coded picture in the GOP is updated in accordance with the following expression (11):R=R−Si,p,b  (11)
On the other hand, when the first picture in the GOP is to be coded, the bit amount R is updated in accordance with the following expression (12):                     R        =                                            bit_rate              ×              N                        picture_rate                    +          R                                    (        12        )            where N is the number of pictures in the GOP. The initial value of the bit amount R at the start of a sequence is 0.
In step S102, in order to make the allocation bit amounts (Ti, Tp, Tb) to the pictures determined in accordance with the expressions (8) to (10) in step S101 and actual generation code amounts coincide with each other, quantization scale codes are determined based on capacities of three different virtual buffers set independently of each other for the individual pictures by feedback control in a unit of a macro block. First, prior to code of a j-th macro block, the occupation amounts of the virtual buffers are determined in accordance with the following expressions (13) to (15):                               d          j          i                =                              d            0            i                    +                      B                          j              -              1                                -                                                    T                i                            ×                              (                                  j                  -                  1                                )                                      MB_cnt                                              (        13        )                                          d          j          p                =                              d            0            p                    +                      B                          j              -              1                                -                                                    T                p                            ×                              (                                  j                  -                  1                                )                                      MB_cnt                                              (        14        )                                          d          j          b                =                              d            0            b                    +                      B                          j              -              1                                -                                                    T                b                            ×                              (                                  j                  -                  1                                )                                      MB_cnt                                              (        15        )            where d0i, d0p and d0b are the initial occupation amounts of the virtual buffers, Bj is the generation bit amount from the top of the picture to the j-th macro block, and MB_cnt is the number of macro blocks in 1 picture. The occupation amounts (dMB—cnti, dMB—cntp, dMB—cntb) of the virtual buffers upon ending of coding of the individual pictures are used as initial values (d0i, d0p, d0b) for the virtual buffer occupations for the next pictures.
Then, the quantization scale code Qj for the j-th macro block is calculated in accordance with the following expression (16):                               Q          j                =                                            d              j                        ×            31                    r                                    (        16        )            where r is a variable called reaction parameter used to control the response of a feedback loop and given by the following expression (17):                     r        =                  2          ×                      bit_rate            picture_rate                                              (        17        )            
The initial values of the virtual buffers at the start of coding are given by the following expressions (18) to (20):                               d          0          i                =                  10          ×                      r            31                                              (        18        )             d0p=kp·d0i  (19)d0b=Kb·d0i  (20)
In step S103, the quantization scale codes determined in step S102 are modified with a variable called activity for each macro block so that they may be quantized finely at a flat portion at which deterioration can be visually observed comparatively conspicuously but may be quantized roughly at a complicated pattern portion at which deterioration can be visually observed comparatively less conspicuously.
The activity is given by the following expression (21) using pixel values of totaling 8 blocks including 4 blocks of a frame discrete cosine transform mode and 4 blocks of a field discrete cosine transform mode using brightness signal pixel values of the original picture:                                           act            j                    =                      1            +                                          min                                                      sblk                    =                    1                                    ,                  8                                            ⁢                              (                var_sblk                )                                                    ⁢                                  ⁢                  var_sblk          =                                    1              64                        ⁢                                          ∑                                  k                  =                  1                                64                            ⁢                                                           ⁢                                                (                                                            P                      k                                        -                                          P                      _                                                        )                                2                                                    ⁢                                  ⁢                              p            _                    =                                    1              64                        ⁢                                          ∑                                  k                  =                  1                                64                            ⁢                                                           ⁢                              P                k                                                                        (        21        )            where Pk is the brightness signal intra-block pixel value of the original image. The reason why a minimum value is taken in the expression (21) is that it is intended to use finer quantization where a flat portion is included only at a portion in the macro block.
Further, a normalized activity Nactj whose value ranges from 0.5 to 2 is determined in accordance with the following expression (22):                               Nact          j                =                                            2              ×                              act                j                                      +            avg_act                                              act              j                        +                          2              ×              avg_act                                                          (        22        )            where avg-act is the average value of the activity actj of the picture coded last.
A quantization scale code mquantj with a visual characteristic taken into consideration is determined in accordance with the following expression (23) based on the quantization scale code Qj determined in step S102:
 mquantj=Qj×N_actj  (23)
By the way, as recited in “Theoretical Analysis of the MPEG Compression Efficiency and Application thereof to the Code Amount Control”, Shingaku Giho, IE-95, DSP95-10, May 1995 (hereinafter referred to as document 2), the code amount control system defined in the MPEG2 Test Mode 15 does not always provide a good picture quality in an MPEG2 image coding section.
In document 2, the following system is proposed particularly as a technique for providing an optimum code amount distribution for each of frames in a GOP.
Where NI, NP and NB are the numbers of those I, P and B pictures in a GOP which are not coded as yet and the code amounts to be applied to them are represented by RI, RP and RB, respectively, such a fixed rate condition as given by the following expression (24) is satisfied:R=NI·RI+NP·RP+NB·RB  (24)
Where the quantization step sizes of individual frames are represented by QI, QP and QB and m is an order number for coordinating a quantization step size and a reproduction error variance with each other, that is, if it is assumed that minimization of an average of the quantization step sizes raised to the m-th power minimizes the reproduction error variance, then an optimum code amount distribution for each frame in the GOP is given by minimizing the expression (25) given below:                                                         N              I                        ·                          Q              I              m                                +                                    N              P                        ·                          Q              P              m                                +                                    N              B                        ·                          Q              B              m                                                            N            I                    +                      N            P                    +                      N            B                                              (        25        )            
It is to be noted that the average scale Q and the code amount R of the frames are coordinated with the complexity X of each frame as a medium variable used also in the MPEG2 Test Mode 15 as given by the following expression (26):Q·Ra=X  (26)
Accordingly, by calculating such code amounts RI, RP and RB as minimize the expression (25) using the Lagrange's method of undetermined multipliers taking the expression (26) into consideration under the restrictive condition of the expression (24), such values as given by the following expressions (27) to (29) are determined as optimum code amounts RI, RP and RB, respectively:                               R          I                =                  R                      1            +                                          N                P                            ·                                                (                                                            X                      p                                                              X                      I                                                        )                                                  m                                      1                    +                                          m                      ⁢                                                                                           ⁢                      α                                                                                            +                                          N                B                            ·                                                (                                                            X                      B                                                              X                      I                                                        )                                                  m                                      1                    +                                          m                      ⁢                                                                                           ⁢                      α                                                                                                                              (        27        )                                          R          p                =                  R                                    N              P                        +                                          N                B                            ·                                                (                                                            X                      B                                                              X                      P                                                        )                                                  m                                      1                    +                                          m                      ⁢                                                                                           ⁢                      α                                                                                                                              (        28        )                                          R          B                =                  R                                    N              B                        +                                          N                P                            ·                                                (                                                            X                      P                                                              X                      B                                                        )                                                  m                                      1                    +                                          m                      ⁢                                                                                           ⁢                      α                                                                                                                              (        29        )            
Where a α=1, the expressions (27) to (29) and the expressions (8) to (10) given hereinabove in the code amount control system defined in the MPEG2 Test Mode 15 have the following relationship. In particular, from the expressions (27) to (29), the parameters Kp and Kb for code amount control are adaptively calculated in accordance with the following expression (30) based on the complexities XI, XP and XB of each frame:                                           K            p                    =                                    (                                                X                  I                                                  X                  P                                            )                                      1                              1                +                m                                                    ;                              K            b                    =                                    (                                                X                  I                                                  X                  B                                            )                                      1                              1                +                m                                                                        (        30        )            
In document 2, it is disclosed that a good picture quality is obtained by setting the value of 1/(1+m) of the expression above to 0.6 to 1.2.
However, when the image information conversion apparatus 101 described above with reference to FIG. 5 performs code amount control using the technique defined in the MPEG2 Test Mode 15, since it cannot cope with a variation in complexity which is caused by a scene change or the like occurring in a GOP, it is difficult to perform the code amount control stably, which sometimes results in picture quality deterioration.
Thus, another image information conversion apparatus is proposed and is shown in FIG. 7. Referring to FIG. 7, the image information conversion apparatus 102 shown includes, in addition to the components of the image information conversion apparatus 101 described hereinabove with reference to FIG. 5, a compression information analysis section 118, an information buffer 119, a complexity calculation section 120 and an MPEG4 image information coding section 121. Detailed description of the common components to those of the image information conversion apparatus 101 of FIG. 5 is omitted herein to avoid redundancy.
The compression information analysis section 118 analyzes an average value Q over an entire frame of the quantization scale used for decoding processing and a total code amount (bit number) B allocated to the frame in the MPEG2 bit stream and sends necessary information to the information buffer 119.
The information buffer 119 stores such generated code amounts (bit numbers) and average quantization scales of I/P pictures of the MPEG2 bit stream.
The complexity calculation section 120 calculates an estimated value of the complexity X for each VOP of MPEG4 image compression information (hereinafter referred to as MPEG4 bit stream) from the information Q and B of each frame stored in the information buffer 119 in accordance with the expression (20) given hereinabove.
The average value Q over the entire frame of the quantization scale used for the decoding processing by the compression information analysis section 118 and the total code amount (bit number) B allocated to the frame in the MPEG2 bit stream are stored into the information buffer 119.
The complexity calculation section 120 calculates the complexity X of each frame stored in the information buffer 119 from the information Q and B for the frame in accordance with the following expression (31):X=Q·B  (31)
The complexities X of the frames calculated in accordance with the expression (31) above are buffered for one GOV and then sent as a parameter for code amount control to the MPEG4 image information coding section 121. Therefore, a delay for one GOV is required. This delay is implemented using the video memory 114 serving as a delay buffer.
In the following, description is given of in what manner the complexity X of each frame in the GOV calculated in accordance with the expression (31) is used by the MPEG4 image information coding section 121. It is to be noted that, in the following description, also a case wherein the apparatus does not include the picture type discrimination section 111 and does not perform conversion of the frame rate is taken into consideration.
The parameters Kp and Kb determined in accordance with the expression (30) represent that the ratios of ideal quantization scales Qp—ideal and Qb—ideal for a P-VOP/B-VOP to an ideal average quantization scale Q1—ideal for an I-VOP are given by the following expression (32):                                                         Q              p_ideal                                      Q              i_ideal                                =                      K            p                          ;                                            Q              b_ideal                                      Q              i_ideal                                =                      K            b                                              (        32        )            
In the MPEG2 Test Mode 15 , the parameters Kp and Kb are not calculated adaptively as in the expression (30), but such fixed values as given by the expression (7) are used therefor.
From the expressions (30) and (32), where the complexities of an arbitrary VOP 1 and another arbitrary VOP 2 are represented by X1 and X2 and the ideal quantization scales are represented by Q1—ideal and Q2—ideal, respectively, then the following expression (33) is obtained:                                           Q                          2              ⁢              _ideal                                            Q                          1              ⁢              _ideal                                      =                                            (                                                X                  1                                                  X                  2                                            )                                      1                              1                +                m                                              ≡                      K            ⁡                          (                                                X                  1                                ,                                  X                  2                                            )                                                          (        33        )            
However, where it is desired to use fixed values as given by the expression (7) as in the MPEG2 Test Mode 15 , the following expression (34) should be used in place of the expression (33) above:                               K          ⁡                      (                                          X                1                            ,                              X                2                                      )                          ≡                  {                                                                                        ⁢                                                            K                      p                                        ⁡                                          (                                                                        1                          =                                                      I                            -                            VOP                                                                          ,                                                  2                          =                                                      P                            -                            VOP                                                                                              )                                                                                                                                                              ⁢                                                            K                      b                                        ⁡                                          (                                                                        1                          =                                                      I                            -                            VOP                                                                          ,                                                  2                          =                                                      B                            -                            VOP                                                                                              )                                                                                                                                                              ⁢                                                                                    K                        b                                                                    K                        p                                                              ⁢                                          (                                                                        1                          =                                                      P                            -                            VOP                                                                          ,                                                  2                          =                                                      B                            -                            VOP                                                                                              )                                                                                                                                                              ⁢                                                                                    K                        p                                                                    K                        b                                                              ⁢                                          (                                                                        1                          =                                                      B                            -                            VOP                                                                          ,                                                  2                          =                                                      P                            -                            VOP                                                                                              )                                                                                                                                            1                  ⁢                                      (                                          when                      ⁢                                                                                           ⁢                      1                      ⁢                                                                                           ⁢                      and                      ⁢                                                                                           ⁢                      2                      ⁢                                                                                           ⁢                      are                      ⁢                                                                                           ⁢                      the                      ⁢                                                                                           ⁢                      same                      ⁢                                                                                           ⁢                      type                      ⁢                                                                                           ⁢                      of                      ⁢                                                                                           ⁢                      VOP                                        )                                                                                                          (        34        )            
Here, it is assumed that the total code amount (bit number) allocated to non-coded VOPs in a GOV is represented by R and the total code amount R is allocated as R1, R2, . . . , Rn to the VOPs. In this instance, the relational expression given as the following expression (35) is satisfied by the total code amount and the allocated code amounts R1, R2, . . . , Rn:R=R1+R2+ . . . R+  (35)
Among the average quantization scale Qk, allocated code amount Rk and complexity Xk of an arbitrary VOPk, the relationship represented by the following expression (36) is satisfied:Xk=Qk·Rk  (36)
Here, by transforming the expression (35) taking the expression (36) into consideration, the following expression (37) is obtained:                                                                         R                1                            =                                                R                                                                                    R                        1                                            +                                              R                        2                                            +                      …                      +                                              R                        n                                                                                    R                      1                                                                      =                                  R                                      1                    +                                                                  R                        2                                                                    R                        1                                                              +                    …                    +                                                                  R                        n                                                                    R                        1                                                                                                                                                                    =                              R                                  1                  +                                                                                    Q                        1                                                                    Q                        2                                                              ·                                                                  X                        2                                                                    X                        1                                                                              +                  …                  +                                                                                    Q                        1                                                                    Q                        n                                                              ·                                                                  X                        n                                                                    X                        1                                                                                                                                                                    =                              R                                  1                  +                                                            1                                              K                        ⁡                                                  (                                                                                    X                              1                                                        ,                                                          X                              2                                                                                )                                                                                      ·                                                                  X                        2                                                                    X                        1                                                                              +                  …                  +                                                            1                                              K                        ⁡                                                  (                                                                                    X                              1                                                        ,                                                          X                              n                                                                                )                                                                                      ·                                                                  X                        n                                                                    X                        1                                                                                                                                                    (        37        )            
Although the value obtained by the expression (33) or the value obtained by the expression (34) may be used for K(X1, X2) in the expression (37), use of the former can achieve a more optimum code amount distribution suitable for an image.
Thereupon, if the value of 1/(1+m) is set to 1.0, then the necessity for exponential operation is eliminated, and consequently, high speed execution can be achieved. Further, even where the value of 1/(1+m) is set to a value other than 1.0, high speed execution can be achieved if a table is prepared in advance and referred to to perform exponential operation.
While the complexity Xk of each VOP according to the expression (37) is obtained by MPEG4 image coding, if it is assumed that the complexity of each frame by MPEG2 image coding and the complexity of each frame by MPEG4 image coding are equal to each other, then if the complexity Xk stored in the complexity calculation section 120 is used, then a target code amount for the VOP can be calculated in accordance with the expression (37).
FIG. 8 illustrates a process when the image information conversion apparatus 102 calculates a target code amount.
Referring to FIG. 8, first in step S111, the MPEG2 image information decoding section 112 extracts the average quantization scale Q and the allocated code amount (bit number) B of each frame in a GOP.
In step S112, the complexity calculation section 120 calculates the complexity X by operation of the product of the average quantization scale Q and the allocated code amount (bit number) B of each frame in the GOP.
Then in step S113, the MPEG4 image information coding section 121 calculates a target code amount (target bit rate) based on the complexity X.
The image information conversion apparatus 102 produces an MPEG4 bit stream of images of a progressive scan having a size of ½×½ of the inputted MPEG2 bit stream. In particular, if the input MPEG2 bit stream complies with the NTSC standards, then the MPEG4 bit stream outputted has the SIF size (352×240) The image information conversion apparatus 102 can change the operation of the reduction section 113 to convert the input MPEG2 bit stream into images of any other image size, for example, in the example described above, into images of the QSIF (176×112 pixels) which is an image size of approximately ¼×¼.
Further, the image information conversion apparatus 102 performs, as processing by the MPEG2 image information decoding section 112, a decoding process using all of eighth-order discrete cosine transform coefficients in the inputted MPEG2 bit stream in both of the horizontal and vertical directions and a decoding process using only low frequency components of eighth-order discrete cosine transform coefficients only in the horizontal direction or in both of the horizontal and vertical directions thereby to reduce the arithmetic operation amount and the video memory capacity involved in decoding processing while suppressing the picture quality deterioration.
If the image information conversion apparatus 102 shown in FIG. 7 is used for conversion of an MPEG2 bit stream having a GOP structure of, for example, n=15 and m=3, then an MPEG4 bit stream having a GOV structure of n=5 and m=1 is obtained as an output. Since the MPEG4 bit stream obtained in this manner has a great number of I-VOPs, the coding efficiency is low and a good picture quality is not obtained in some cases. This problem, however, can be solved by converting an image of an I picture in the input MPEG2 bit stream into a P-VOP of the MPEG4 bit stream to develop GOVs.
The image information conversion apparatus 102 performs motion detection within a fixed search range of an image, which originally is an I picture and includes no motion vector, based on motion vectors used for the last P picture immediately preceding to the I picture to calculate motion vectors with a high degree of accuracy for the corresponding VOP thereby to prevent the image quality deterioration.
Further, if an I picture is converted into a P-VOP, then since the original complexity relates to the I picture, it has an inappropriate value as the complexity after the conversion. The image information conversion apparatus 102, however, solves the problem just described by using the complexity for the immediately preceding P picture to eliminate image quality deterioration.
However, while the MPEG2 Text Mode 15 assumes that the complexities Xi, Xp and Xb as variables representative of the degree of complexity of an image of I, P and B pictures in a GOP are fixed, if the MPEG4 image information coding section 115 actually uses the technique defined in the MPEG2 Test Mode 15 to perform code amount control, then the assumption is not satisfied in such a case that the GOP includes a scene change or the background exhibits a remarkable variation in the GOP, but rather disturbs stabilized code amount control and makes a cause of picture quality deterioration.
Conversion of an I picture of an inputted MPEG2 bit stream into a P-VOP of an MPEG4 bit stream is considered here.
FIG. 9 diagrammatically illustrates a manner wherein an I picture of an inputted MPEG2 bit stream is converted into and outputted as a P-VOP of an MPEG4 bit stream. Referring to FIG. 9, conversion of the second I picture I, into a P-VOP is taken as an example. In this instance, as the complexity as a parameter for code amount control for the I picture I1, the complexity XP3 of the P picture P3 immediately preceding to the I picture I1 is applied.
If the I picture I1 is an image including a scene change, then a comparatively great code amount must be applied to the I picture I1. However, since the complexity XP3 of the P picture P3 of the immediately preceding frame is used as, the complexity for the I picture I1 as described above, a sufficient code amount is not allocated to the I picture I1, resulting in deterioration of the picture quality.