The present invention relates to a circuit that realizes two-dimensional inverse discrete cosine transform (IDCT) used for time-varying image signal processing in four parallel mode by 16-bit integer arithmetic operation. The present invention relates to a processor that realizes the two-dimensional IDCT circuit and a method of implementing the two-dimensional IDCT.
In recent years, microprocessors tend to employ arithmetic commands in split ALU scheme for high-speed image processing. Here, arithmetic commands in the split ALU scheme means commands each for utilizing a 64-bit ALU as four 16-bit arithmetic units. Utilization of the split ALU scheme allows signals and images including data with parallelism to be easily processed at high speed.
However, when two-dimensional inverse discrete cosine transform is performed in four split mode, the arithmetic error in the 16-bit integer arithmetic operation becomes large. Hence, the arithmetic error cannot satisfy the error criterion defined by "IEEE Standard Specifications for the Implementations of 88 Inverse Discrete Cosine Transform", Std 1180-1190, Dec. 6, 1990.
A scheme for decreasing arithmetic errors and satisfying the error criterion is disclosed in the paper on "A study of IDCT Algorithms by 16-bit Integer Operation", in the 1996 Institute of Electronics, Information and Communication Engineers, System Society Convention, D-225, (hereinafter, referred to as "conventional scheme 1").
FIG. 6 shows the configuration of the two-dimensional inverse discrete cosine transform according to the conventional scheme 1. In the conventional scheme 1, the 8.times.8 two-dimensional discrete inverse cosine transform is realized by separately executing 8-point one-dimensional inverse discrete cosine transform in the row direction and 8-point one-dimensional inverse discrete cosine transform in the column direction. The scheme proposed by the paper "A Fast DCT-SQ Scheme for Images", Y. Arai, T. Agui and M. Nakajima, Trans. IEICE, Vol. 71, No. 11, November 1988, pp. 1095-1097 is used as the 8-point one-dimensional inverse discrete cosine transform.
In the conventional scheme 1, the maximum bit detector 61 first detects the maximum bit as pre-scaling. Then the first digit truncating section 62 adaptively truncates digits for every row according to the detected result. Average precision degradation is suppressed by inputting the truncated value. The row arithmetic section 63 prepares three special instructions including addition with history, conditional addition, and conditional product. The column arithmetic section 63 implements 8-point one-dimensional discrete inverse cosine transform using the three special instructions to suppress arithmetic errors due to 16-bit integer arithmetic operation. The second digit truncating section 64 truncates the digit of the arithmetic result from the column arithmetic section 63 based on a digit truncation history of each column and the number of carry digits of the pre-scaling section 60. The column arithmetic section 65 realizes 8-point one-dimensional and two-dimensional inverse discrete cosine transform using the special instructions and then carries digits in each column to clear the digit truncation history, whereby the arithmetic result of the 8.times.8 two-dimensional discrete inverse cosine transform is obtained.
The technique disclosed in the paper, "Inverse DCT Calculation on VISP-LSI" in the 1996 Institute of Electronics, Information and Communication Engineers, Spring National Convention, A-192 is well known as another scheme (hereinafter referred to as conventional scheme 2). In this scheme, in order to express the product of 16 bits .times.16 bits in the 16-bit form, 16 bits are truncated by adding 1 to the 15th bit counted from the least significant bit and a polarity symmetric rounding is executed to convert the final arithmetic result into an integer, so that the arithmetic errors associated with the 16-bit integer arithmetic operation is suppressed.
However, the conventional schemes 1 and 2 have the following disadvantages. In order to suppress arithmetic errors caused by the 16-bit integer arithmetic, the scheme 1 requires the maximum bit detection by the maximum bit detecting section 61, digit truncation by the first and second digit truncating sections 62 and 64, and digit carrying operation by the column arithmetic section 65. This leads to an increase in arithmetic computation amount. Moreover, requiring three special instructions including addition with history, conditional addition, and conditional product results in a large-sized circuit.
In order to suppress arithmetic errors caused by the 16-bit integer arithmetic operation, the scheme 2 requires the 16-bit operation rounding process and the polarity symmetric rounding process for integer conversion, thus resulting in an increase in arithmetic computation amount.
Moreover, in order realize high-speed 8.times.8 two-dimensional discrete inverse cosine transform, 16-bit integer operation must be realized in a parallel mode using split ALU operations.