1. Field of the Invention
The present invention relates to a video coding circuit system and its algorithm and, more particularly, to versatile and scalable video coding techniques and their circuit system.
2. Description of Related Art
Common multimedia video CODECs including video compression standards such as JPEG, MPEG, and H.264 have similar coding procedures. After the image slicing and the color mode transform and sampling, the original space domain data is transformed to the frequency domain data through transform coding. After that, the quantization and the VLC coding are performed before storage. In order to achieve realtime video coding requirements, transform coding is one of the key modules in multimedia CODECs. Consequently, how to design a high performance, low power and low cost transform coding hardware implementation is always an important research topic in this domain.
Transform coding includes the discrete cosine transform (DCT) adopted by the JPEG/MPEG system, the integer transform and the Hadamard transform adopted by H.264 system. The conventional transform coding circuit design is described below. Exemplified with the DCT that has the most complicated computation and is most widespread, a 2D N×N DCT expression is shown as follows:
                              X          ⁡                      (                          u              ,              v                        )                          =                              2            N                    ⁢                      C            ⁡                          (              u              )                                ⁢                      C            ⁡                          (              v              )                                ⁢                                    ∑                              i                =                0                                            N                -                1                                      ⁢                                                  ⁢                                          ∑                                  j                  =                  0                                                  N                  -                  1                                            ⁢                                                          ⁢                                                x                  ⁡                                      (                                          i                      ,                      j                                        )                                                  ⁢                                  cos                  ⁡                                      (                                                                                            (                                                                                    2                              ⁢                              i                                                        +                            1                                                    )                                                ⁢                        u                        ⁢                                                                                                  ⁢                        π                                                                    2                        ⁢                        N                                                              )                                                  ⁢                                  cos                  ⁡                                      (                                                                                            (                                                                                    2                              ⁢                              j                                                        +                            1                                                    )                                                ⁢                        v                        ⁢                                                                                                  ⁢                        π                                                                    2                        ⁢                        N                                                              )                                                                                                          (        1        )            where x(i, j) is the input data, X(u, v) is the output result, C(m) is 1/√{square root over (2)} when m=0, and is 1 elsewhere. From Eq. (1), we know that the computational complexity of the 2D N×N DCT is N4 multiplication/addition operations. For an embedded system of a portable electronic product, this computational complexity greatly exceeds its capability. In order to solve this problem, hardware accelerators based on fast algorithms are usually adopted. For instance, in “A Simple Processor Core Design for DCT/ICT” disclosed in IEEE Trans. CSVT., vol. 10, no. 3, pp. 439-447, April 2000, the technique of adder-based distributed arithmetic is used to take apart the inner product operation in the DCT into continuous bit-level add and shift operations. Because the operations of those bits of value “0” can be omitted, the speed of the DCT can be enhanced. The principle can be seen in Eq. (2):
                    Yn        =                                            ∑                              i                =                0                                            N                -                1                                      ⁢                                          C                i                            ·                              X                i                                              =                                                    ∑                                  i                  =                  0                                                  N                  -                  1                                            ⁢                                                          ⁢                                                (                                                            ∑                                              k                        =                        0                                                                    Wc                        -                        1                                                              ⁢                                                                  C                                                  i                          ,                          k                                                                    ·                                              2                                                  -                          k                                                                                                      )                                ·                                  X                  i                                                      =                                          ∑                                  k                  =                  0                                                  Wc                  -                  1                                            ⁢                                                          ⁢                                                (                                                            ∑                                              i                        =                        0                                                                    N                        -                        1                                                              ⁢                                                                                  ⁢                                                                  C                                                  i                          ,                          k                                                                    ·                                              X                        i                                                                              )                                ·                                  2                                      -                    k                                                                                                          (        2        )            where N is the order of the inner product operation, Xi is the input data, Ci is the constant coefficient, Wc is the word length of Ci, Ci,k is the k-th bit value of Ci which is either “0” or “1”, and Yn is the output result. With Eq. (2) expressed by the canonical signed digit (CSD) representation, a 1D 8-point DCT can be represented by the table shown in FIG. 1. In the table, common terms shown with gray shade are calculated out beforehand (non-common terms are shown with slash lines), and are stored in registers for subsequent use. FIG. 2 shows the circuits that realizes the add and shift operations. Input data and common terms are stored in a register 10, the output of an adder/subtractor 20 is the primary output of the circuit system. That is, the value of a first register 14 right-shifted by a shifter 18 adds/subtracts the value of a second register 16. The input sources of the registers 14 and 16 can be selected between the adder/subtractor 20 or the register 10 by a multiplexer 12.
Although the conventional video coding method and circuit system shown in FIGS. 1 and 2 can achieve low cost and high hardware utility efficiency, it has a big drawback in power consumption. Because it requires a large amount of access of the register 10 to finish all the operations in FIG. 1, the data throughput rate is too low which induces that higher operation frequencies are required to meet the realtime requirements. Higher execution frequencies will lead to linear growth of the power consumption, hence being not able to meet the design specification of low power consumption. If one wants to use several circuit systems for parallel processing to enhance the data throughput rate, the bandwidth requirement of memory and the amount of internal registers will increase therewith, making the realization of the whole circuit system more difficult.
The present invention aims to provide high efficiency, low cost and low power video coding methods and their circuit designs to solve the above problems in the prior art.