The present invention relates to a processor-for transforming time domain signals into frequency domain signals, or frequency domain signals into time domain signals, by means of an orthogonal transform such as a discrete Fourier transform (hereinafter referred to as DFT), a discrete cosine transform (hereinafter referred to as DCT) or the like.
Recently, a fast and small-sized circuit for achieving an orthogonal transform is needed as an important part of a method of compressing and coding image information, audio information or the like with high efficiency. A forward orthogonal transform is required in an encoder, while an inverse orthogonal transform is required in a decoder. U.S. Pat. No. 4,791,598 discloses the inner arrangement of a one-dimensional DCT processor serving as an orthogonal transform processor. This one-dimensional DCT processor employs technique of the first stage decimation-in-frequency and technique of distributed arithmetic for obtaining vector inner products without the use of multipliers. The decimation-in-frequency is known technique for reducing the number of required multiplications in a fast Fourier transform (hereinafter referred to as FFT) which is a fast algorithm of the DFT.
More specifically, the Nx1 DCT processor in U.S. Pat. No. 4,791,598 has an input shift register and a holding register as set forth below. The input shift register comprises N input registers (each having a M-bit width) so connected in cascade to one another as to successively enter N word data which form an input vector comprising one row or column out of one block having N.times.N word (M bits/word) data. The holding register comprises N bit shift registers (each having a M-bit width) having (i) inputs respectively connected to the corresponding input registers of the input shift register such that the inputs receive in parallel the N word data from the input shift register each time all the N input registers of the input shift register are filled up with data, and (ii) outputs for shifting out one bit per cycle as part of an N-bit bit-slice word. These input shift register and holding register form a bit-string distribution circuit with a size of 2.times.N.times.M bits.
The N.times.1DCT processor in U.S. Pat. No. 4,791,598 further comprises a butterfly unit and a ROM-and-accumulator circuit (hereinafter referred to as RAC circuit) as set forth below. In order to execute the first stage decimation-in-frequency operation, the butterfly unit comprises N/2 serial adders and N/2 serial subtracters connected to the outputs of the holding register such that there are produced a pair of N/2-bit words from the N-bit bit-slice word received from the holding register. For example, there are executed butterfly operations of x1+x8, x1-x8, x2+x7, x2-x7, x3+x6, x3-x6, x4+x5, x4-x5 for a data string comprising eight data of x1, x2, x3, x4, x5, x6, x7, x8. The RAC circuit comprises N ROMs and accumulators (hereinafter referred to as RACs) connected to the output of the butterfly unit. Each of the N RACs comprises (i) at least one ROM which contains, in the form of a look-up table, the partial sums of vector inner products based on a discrete cosine matrix, and (ii) an accumulator for adding, with the digits aligned, the partial sums successively retrieved from the ROM with the bit-slice words serving as addresses. The RAC circuit forms a distributed arithmetic circuit for concurrently calculating N vector inner products using no multipliers.
The N.times.1DCT processor in U.S. Pat. No. 4,791,598 further comprises an output shift register as set forth below. The output shift register comprises N output registers which are so connected to the corresponding accumulators of the N RACs as to receive, in parallel, N vector inner products from the RAC circuit and which are so connected in cascade to one another as to successively supply the N vector inner products thus received.
As thus discussed, the conventional DCT processor has a large-scale circuit arrangement having a large number of registers. This produces the problem that integrated processors require large chip area.