It is a common practice to apply various orthogonal transforms to a signal such as an image signal or an audio signal to effect compression of information, removal of an interference pattern and so forth.
For example, such an orthogonal transform which can be applied to a spatial axis to frequency axis transform, there are a Fourier transform (FT), a discrete cosine transform (DCT), an Hadamard transform and so forth. In those transforms, calculation processing is performed based on expressions individually defined. For processing for execution of calculation in each transform, a very long time is required only by processing of software, and accordingly, hardware for exclusive use for each particular processing is developed.
As hardware for exclusive use for an orthogonal transform, a discrete cosine transform apparatus disclosed, for example, in Japanese Patent Laid-Open Application No. Heisei 3-35353 is known. The discrete cosine transform apparatus executes calculation of a discrete cosine transform of degree 2.sup.n+1 and is constructed using a basic arithmetic unit A and another basic arithmetic unit B. The basic arithmetic unit A is a circuit that temporarily stores input data therein and adds or subtracts them. The basic arithmetic unit B is a circuit which temporarily stores input data therein and adds or subtracts results of a number of multiplications smaller than the number of the input data. The basic arithmetic unit A is connected at a first stage and n circuits in each of which the basic arithmetic unit B and the basic arithmetic unit A are connected in cascade connection are connected in cascade connection to the basic arithmetic unit A at the first stage.
In the discrete cosine transform apparatus described above, however, since a DCT (discrete cosine transform) is calculated in accordance with a signal flow diagram (butterfly calculation algorithm) according to the value of n shown in FIG. 18 and an inverse transform to that is calculated in accordance with a flow reverse to that of the transform in the same signal flow diagram, when much parallel calculation processes are executed for a DCT, a great number of communications take place between processing data. Therefore, there is a problem in that an increase in speed of processing by parallel operation is not achieved sufficiently and the discrete cosine transform apparatus cannot always cope sufficiently with a DCT for information which is high in parallelism such as image information.
In particular, in the signal flow diagram of FIG. 18 applied to the apparatus described above, one calculation is performed for each of n ranges defined by broken lines. Accordingly, when a DCT is executed by the apparatus described above, only n parallel calculation processes to the utmost can be executed. Incidentally, in the case of the algorithm of FIG. 18, since n=4, only four parallel processes can be executed to the utmost. Besides, since there are great differences among processing load amounts of the parallel processes, also the effect of parallel processing is low.
Further, if it is tried to increase the number of ranges defined by broken lines in order to increase the number of parallel processes under the algorithm described above, then it becomes necessary to communicate information for butterfly between chips which execute calculations, and consequently, the overhead for communication of information becomes high and the advantage of parallel processing is reduced.
Meanwhile, with regard to a FT, when image information which is constituted from, for example, 1,024.times.1,024 pixels (picture elements) is to be processed by FT processing, several tens of minutes are required with an EWS (engineering work station) in which a general purpose processor is installed, and a calculation time of several minutes is required even with a processor for exclusive use (DSP: digital signal processor).
Further, in transform processing including the DCT and the FT described above, since all of them obtain a result above, a long time is required for calculation processing. Consequently, there is a problem in that, if it is tried to achieve an increase of the speed of processing, then an algorithm for parallel calculation is required.