1. Field of the Invention
The present invention relates to an orthogonal transform apparatus for executing a fast cosine transform or fast inverse cosine transform, for use in preprocessing or postprocessing of a video signal in applications such as high efficiency coding of a video signal.
2. Description of the Related Art
In the prior art, fast algorithms have been used in various types of orthogonal transform apparatus, in order to reduce the scale of hardware of the apparatus. There is a conspicuous trend towards the use of such a fast algorithm in the case of a cosine transform or an inverse cosine transform in which large numbers of multiplication operations are necessary.
FIG. 7 is a signal flow chart to illustrate the 8 points cosine transform, which is one type of orthogonal transform. In FIG. 7, {y.sub.0, y.sub.l, y.sub.2, y.sub.3, y.sub.4, y.sub.5, y.sub.6, y.sub.7 } denote a set of 8 input signals, and {z.sub.0, z.sub.1, z.sub.2, z.sub.3, z.sub.4, z.sub.5, z.sub.6, z.sub.7 } denote a set of 8 output signals. In a practical apparatus for executing such a transform, the 8 input signals actually consist of a set of 8 data values (i.e. digital sample values) from a digital signal such as a digital video signal, with the time axis sequence of the input signal values having been changed by the operation of a reordering unit 1 as described hereinafter, from the original time axis sequence of {y.sub.0, y.sub.1, . . . y.sub.7 }. That is to say, successive sets of 8 sequential data values of the input digital signal are respectively processed in accordance with the algorithm of FIG. 7, with a corresponding set of orthogonally processed output data values being produced in correspondence with each of these sets of data values of the input digital signal.
FIG. 8 is a block diagram showing the basic components of a prior art orthogonal transform apparatus for executing the orthogonal transform of FIG. 7, and FIG. 15 is a timing diagram corresponding to the signal flow chart of FIG. 7, showing the time relationships between various stages of the processing that is executed in the apparatus of FIG. 8.
The relationship between the input and output signal values of the signal flow chart of FIG. 7 is expressed as follows:
______________________________________ ##STR1## z.sub.i = w.sub.i .multidot. z.sub.i ' i w.sub.i ______________________________________ 0 1/cos(.pi./4) 1 cos(.pi./4) .multidot. cos(3.pi./8)/cos(7.pi./16) 2 0.5/cos(3.pi./8) 3 cos(.pi./4)/cos(5.pi./16) 4 0.875/cos(.pi./4) 5 1/cos(3.pi./16) 6 1/cos(.pi./8) 7 1/cos(.pi./16) ______________________________________
In the above, .delta..sub.i is a function which takes the value 1 if i is positive, and takes the value cos .pi./4 if i is zero. In FIG. 7, each of the arrows denotes an addition or subtraction operation, with the full lines denoting addition and the dotted lines denoting subtractions. The circle and square outlines each denote a multiplication operation in which an input signal is multiplied by a fixed coefficient, with the contents of each outline (i.e. 2C.sub.4, 7/8, etc.) indicating the respective coefficients. The square outline denotes a multiplication which can be executed by a binary shift operation, while each circular outline denotes a multiplication which cannot be executed by such a binary shift operation alone. To distinguish between these two types of operation, the latter type of multiplication operation will be referred to in the following as an "actual multiplication".
FIG. 8 is a block diagram of an orthogonal transform apparatus for executing the orthogonal transform algorithm that is shown in the signal flow chart of FIG. 7. In FIG. 8, numeral 1 denotes a reordering unit which receives successive data values (i.e. digital sample values) of an input digital signal in successive sample periods, and functions to reorder these data values of that digital signal. Specifically, the reordering unit 1 rearranges the sequence of values within each of successive sets of 8 sequential data values, and the reordered set of 8 values are then operated on by a pipeline processing flow, described in detail hereinafter. Numeral 2 denotes a butterfly unit for executing a form of computation referred to as a butterfly operation (as described hereinafter) on the output values produced from the first reordering unit, 3 denotes a reordering unit for reordering the outputs produced from the butterfly unit 2, 4 denotes a multiplier for multiplication of predetermined ones of the outputs from the reordering unit 3 by a a fixed coefficient (2C.sub.4), and 5 denotes an adder for addition of outputs produced from the reordering unit 3. 6 denotes a selector unit which functions, in each sample period of the input digital signal, to select one out of three outputs, specifically an output from the reordering unit 3, an output from the multiplier 4 or an output from the adder 5. 7 denotes a butterfly unit, for executing a butterfly operation on outputs produced from the selector unit 6, 8 denotes a reordering unit for reordering the outputs produced from the butterfly unit 7, 9 denotes a multiplier for multiplying specific ones of the outputs produced from the reordering unit 8 by respective predetermined coefficients (i.e. the coefficients C.sub.4, 2C.sub.2, 2C.sub.6 shown in FIG. 7), and 10 denotes an adder for addition of outputs produced from the reordering unit 8. 11 denotes a selector unit for selecting one out of three outputs during each sample period of the input digital signal, specifically, respective outputs from the multiplier 9, from the adder 10 and from the reordering unit 8. 12 denotes a butterfly unit for executing butterfly operation on outputs produced from the selector unit 11. 13 denotes a multiplier, for multiplication of outputs produced from the butterfly unit 12 by predetermined coefficients (i.e. by 7/8, C.sub.4, and C.sub.4 C.sub.6 shown in FIG. 7), and 14 denotes a reordering unit for reordering the outputs produced from the multiplier 13, to obtain orthogonally transformed signals. In FIG. 7 the reference numerals indicate the respective operations that are executed by units in FIG. 8.
With an orthogonal transform apparatus having the configuration of FIG. 8, the operation is as follows. A set of 8 successive input signal digital signal values {y.sub.0, . . . , y.sub.7 } are reordered by the reordering unit 1 to have the sequence {y.sub.0, . . . , y.sub.3, y.sub.7, . . . , y.sub.4 } as shown in FIG. 7. The successive outputs produced from the reordering unit 1 are subjected to butterfly operation by the butterfly unit 2. Here, the term "butterfly operation" signifies an operation of computing respective sums and differences between successive pairs of data values as illustrated in FIG. 7, with each sum or difference being derived within one sample period of the input digital signal. The butterfly unit 2 executes such calculation processing on data that are separated along the time axis by 4 sample periods. Part of the output values produced from the butterfly unit 2 are multiplied by 2C.sub.4 times in the multiplier 4, and another part of the outputs from the butterfly unit 2 are added together in the adder 5. In the multiplication coefficients, the designation C.sub.i signifies cos(i . /16), where i takes the values 2, 4 and 6 as shown in FIG. 7. The reordering unit 3 executes reordering of data that are to be added together, data that are to be multiplied by the coefficient 2C.sub.4, and data that are to be transferred directly, to be operated on in the next butterfly operation. The selector unit 6 selects an output from the reordering unit 3, the multiplier 4 or the adder 5 to be inputted to the butterfly unit 7, in accordance with the time at which the selection operation is being executed. The butterfly unit 7 executes butterfly operation on data that are separated by two sample periods, and the outputs produced from the butterfly unit 7 are supplied to the reordering unit 8, to be reordered along the time axis as required for the succeeding processing. Part of the outputs from the reordering unit 8 are multiplied by C.sub.4 times, by 2C.sub.2 times, or by 2C.sub.6 times in the multiplier 9, and part of the outputs from the reordering unit 8 are added together in the adder 10. The selector unit 11 selects outputs from the reordering unit 8, from the multiplier 9, or from the adder 10, in accordance with the time at which selection is executed, and inputs the selected outputs to the butterfly unit 12. The butterfly unit 12 executes butterfly operation on data that are separated by one sample period, and the results are multiplied by 1, by 7/8 times, by 2 times, by C.sub.4.C.sub.6 times, or by C.sub.4 times, in the multiplier 13. The results of the above processing are generated in the sequence {z.sub.0, z.sub.4, z.sub.2, z.sub.6, z.sub.1, z.sub.7, z.sub.3, z.sub.5 } as shown in FIG. 7, so that the reordering unit 14 rearranges the sequence to become {z.sub.0, z.sub.l, z.sub.2, z.sub.3, z.sub.4, z.sub.5, z.sub.6, z.sub.7 }, as a set of 8 orthogonally transformed output data values.
The operation of the orthogonal transform apparatus of FIG. 8 can be clearly understood from the timing diagram of FIG. 15, which shows an example of actual time relationships between the various operations executed by the apparatus. Pipeline processing is utilized, and successive sample periods of the input digital signal are designated along the vertical direction as t0, t1, t2, . . . respectively, with each processing operation being executed within an integral number of sample periods. As indicated, the reordering unit 1 applies respective different amounts of delay to the 8 successive input data values {y.sub.0, . . . , y.sub.7 } so that, for example, the signal value x.sub.0 is delayed from the sample period t.sub.0 to the sample period t.sub.7, in which it is added to the value x.sub.7 by the butterfly unit 2. It can further be understood that in 8 successive sample periods extending from t7, the butterfly unit 2 executes four successive addition operations followed by four successive subtraction operations. It can further be seen that the multiplier 4 executes two successive actual multiplications by the coefficient 2C.sub.4, on respective ones of two subtraction results produced from the butterfly unit 2, and thereafter the adder 5 executes two addition operations on two successive pairs of subtraction results produced from the butterfly unit 2. Similarly, the adder 10 executes three successive addition operations on sequentially produced pairs of outputs from the butterfly unit 7, and the multiplier 9 executes three successive actual multiplications on outputs produced from butterfly unit 7.
It can be understood from FIG. 15 that various amounts of delay are applied to signals that are produced within the apparatus of FIG. 8, by delay means which are omitted from FIG. 8 for simplicity of description, i.e. delays which are necessary for implementing the pipeline processing flow.
In the timing chart of FIG. 15, the outputs produced from the butterfly unit 12 are shown as being alternately supplied to the reordering unit 14 directly and after being multiplied by a coefficient. Where the output from the butterfly unit 12 is shown as being transferred directly to the reordering unit 14, that output from the butterfly unit is actually multiplied in the multiplier 13 by a coefficient having the value 1.
FIG. 9 is a signal flow chart of an algorithm for an orthogonal transform which is the inverse transform to that of of FIG. 7, i.e. which is an 8 points fast inverse cosine transform. FIG. 10 is a block diagram of an orthogonal transform apparatus corresponding to the signal flow chart of FIG. 9. In these Figs., blocks having an identical operation to blocks in FIG. 8 are indicated by corresponding designation numerals. With this apparatus as shown in FIG. 10, it is necessary to use three multipliers 21, 4 and 9, and two subtractors 22, 23, in addition to three butterfly units 2, 7 and 12. The operation of the apparatus of FIG. 10 is based on a pipeline processing flow, basically of the form described hereinabove referring to FIG. 15, for the apparatus of FIG. 8, so that detailed description of the operation of the apparatus of FIG. 10 will be omitted.
FIG. 11 shows a signal flow chart for a 2-dimensional cosine transform which consists of a 2 points cosine transform and a 4 points cosine transform. FIG. 13 shows a signal flow chart for a 2-dimensional inverse cosine transform which consists of a 2 points inverse cosine transform and a 4 points inverse cosine transform. FIG. 12 is a block diagram of an orthogonal transform apparatus for realizing the orthogonal transform of the signal flow chart of FIG. 11, and FIG. 13 shows an apparatus for realizing the transform of FIG. 12.
In FIG. 11, designating the input signals as {y.sub.0, . . . y.sub.7 } and the output signals as {u.sub.0,0, . . . , u.sub.3,0, u.sub.0,1, . . . u.sub.3,1 }, these output signals are expressed as follows: ##EQU1##
In FIGS. 12 and 14, blocks having identical operation to blocks in FIGS. 8 and 10 are indicated by corresponding numerals. In the orthogonal transform apparatus of FIG. 12, two multipliers 9 and 13, one adder 10, and three butterfly units 2, 7 and 12 are utilized. In the orthogonal transform apparatus of FIG. 14, two multipliers 21 and 4, one subtractor 22, and three butterfly units 2, 7 and 12 are utilized. As the respective operations of FIGS. 12 and 14 can be understood from the signal flow charts of FIGS. 11 and 12 and the description given hereinabove of the orthogonal transform apparatus of FIG. 8, detailed description will be omitted.
With each of the above types of prior art orthogonal transform apparatus, it is necessary to use a plurality of multipliers, so that the scale of the necessary hardware is large. Furthermore it is necessary to use dedicated hardware to execute each of the four different types of orthogonal transforms described above, since it will be clear from the above description that, for example, the numbers of multipliers and numbers of adders required will vary in accordance with the particular type of orthogonal transform. Thus if it is required to provide an orthogonal transform apparatus which can be easily adapted to implementing a number of different types of orthogonal transform, then the scale of the necessary hardware would be further increased.