1. Field of the Invention
The present invention relates to an orthogonal transform processor, and more particularly, to an orthogonal transform processor which employs a fast orthogonal transform algorithm to process a series of source data values.
2. Description of the Related Art
Digital signal processing applications often involve orthogonal transform algorithms such as the Fast Fourier Transform (FFT) and Fast Hadamard Transform (FHT). Particularly, FHT is frequently used in the technical fields of image processing and mobile communication because it can be implemented with simple hardware.
FIG. 12 shows how to generate Hadamard matrices. As seen from FIG. 12, Hadamard matrices are symmetric matrices consisting of ones and zeros. Their row vectors, referred to as the xe2x80x9cWalsh codes,xe2x80x9d are orthogonal to every other row vector. The generation process shown in FIG. 12 may be repeated in the same manner to yield higher-order matrices, e.g., 8xc3x978, 16xc3x9716, 32xc3x9732, 64xc3x9764, and so on.
As just stated above, Walsh codes are orthogonal to each other. A code sequence having such an orthogonal nature is useful in modulating, or encoding, transmission signals. This technique is known as the xe2x80x9corthogonal modulation.xe2x80x9d When the code sequence consists of M orthogonal codewords, the modulation is called the xe2x80x9cM-ary orthogonal modulation.
FIG. 13 is a diagram which shows an example of an M-ary orthogonal modulator using the Walsh code set of M=4. It is a common convention to designate each individual Walsh code by a unique number that starts with zero, such as 0, 1, 2, and 3, denoting the zeroth, first, second, and third Walsh codes, respectively. In the example modulator configuration of FIG. 13, these four Walsh codes are subjected to a selector SW1 being controlled by source data to be modulated. The selector SW1 chooses a Walsh code corresponding to each symbol of the source data sequence and sends it out as the encoded data. A source data symbol xe2x80x9c01,xe2x80x9d for example, causes the selector SW1 to choose and output the Walsh code 1, namely xe2x80x9c0101.xe2x80x9d
The inherent orthogonality of Walsh codes is also used to reconstruct the original data from a modulated data sequence that was produced as above. That is, the data is decoded by computing its correlation with each Walsh code. FIG. 14 is a diagram which shows an example of a Walsh decoder. As seen from FIG. 14, the decoder comprises four correlators 1-1 to 1-4 and a maximum value selector 2. The correlators 1-1 to 1-4 calculate correlation factors between the modulated source data signal and four different Walsh codes concurrently. The maximum value selector 2 selects one of the calculated correlation factors that exhibits the greatest value.
FIG. 15 provides a typical structure of the correlator 1-4 for Walsh code 3 (xe2x80x9c0110xe2x80x9d). This illustrated correlator 1-4 comprises flip-flops (FFs) 10-1 to 10-4, multipliers 11-1 to 11-4, and an adder 12. The flip-flops 10-1 to 10-4 function as delay elements, giving a one-clock delay to their respective input signals. The multipliers 11-1 to 11-4 calculate the product of each bit of the Walsh code 3 and their input data supplied from the corresponding flip-flops 10-1 to 10-4. In this multiplication processing, the bit values xe2x80x9c0xe2x80x9d and xe2x80x9c1xe2x80x9d are interpreted as bipolar levels xe2x80x9c+1xe2x80x9d and xe2x80x9cxe2x88x921,xe2x80x9d respectively. The resultant products are then summed up by the adder 12. In the example of FIG. 15, the correlator outputs a maximum correlation value when the input data sequence is xe2x80x9c0110xe2x80x9d (i.e., xe2x80x9c+1, xe2x88x921, xe2x88x921, +1xe2x80x9d).
The above function of Walsh correlators explains the principle of the decoder of FIG. 14. That is, the decoder reproduces the original data by calculating the correlations between input data and different Walsh codes, finding which correlator indicates the highest correlation, and then outputting the corresponding symbol.
Referring again to FIG. 15, the illustrated correlator employs multipliers to determine whether the input data sequence coincides with a specific orthogonal codeword. Multipliers, however, generally needs a complex circuit structure, which results in an increased scale of hardware. This problem in correlative operations can be avoided by using adders and subtractors, in place of multipliers. FHT operators are known as an example of such correlators. In the FHT computation, correlation can be calculated with simple adders and subtractors, or butterfly operators, which are the fundamental components of Fast Fourier Transform.
FIG. 16(A) is a signal flow diagram which shows the FHT computation based on 2xc3x972 Hadamard matrix (i.e., Walsh code length=2). This diagram represents the summation and subtraction of two input signals w0 and w1 The resultant sum and difference are referred to herein as Walsh0 and Walsh1, respectively.
w0+w1=Walsh0xe2x80x83xe2x80x83(1)
w0xe2x88x92w1=Walsh1xe2x80x83xe2x80x83(2)
Actually, the above 2xc3x972 FHT operation is realized by a combination of an adder 20-1 and a subtractor 20-2, as shown in FIG. 16(B).
When the Walsh code 1 itself (w0=+1, w1=xe2x88x921) is given as an input, the FHT operator of FIG. 16(B) will output the following results (see FIG. 17).
Walsh0=w0+w1=(+1)+(xe2x88x921)=0xe2x80x83xe2x80x83(3)
Walsh1=w0xe2x88x92w1=(+1)xe2x88x92(xe2x88x921)=2xe2x80x83xe2x80x83(4)
That is, the FHT operator outputs an auto-correlation value of xe2x80x9c2xe2x80x9d at its lower output terminal corresponding to the Walsh code 1. Similarly, the FHT operator will produce the following correlation values when the Walsh code 0 itself (w0=+1, w1=+1) is given.
Walsh0=w0+w1=(+1)+(+1)=2xe2x80x83xe2x80x83(5)
Walsh1=w0xe2x88x92w1=(+1)xe2x88x92(+1)=0xe2x80x83xe2x80x83(6)
In this second example, the illustrated FHT operator outputs an auto-correlation value of xe2x80x9c2xe2x80x9d at its upper output terminal corresponding to the Walsh code 0.
FIG. 18 is a signal flow diagram showing the FHT computation based on 4xc3x974 Hadamard matrix (i.e., Walsh code length=4). Consider, for example, that the Walsh code 3 itself (w0=+1, w1=xe2x88x921, w2=xe2x88x921, w3=+1) is given as an input. In this case, the result will be as follows:                                                         walsh0              =                              xe2x80x83                            ⁢                              w0                +                w1                +                w2                +                w3                                                                                        =                              xe2x80x83                            ⁢                                                                    (                                          +                      1                                        )                                    +                                      (                                          -                      1                                        )                                    +                                      (                                          -                      1                                        )                                    +                                      (                                          +                      1                                        )                                                  =                0                                                                        (        7        )                                                                    walsh1              =                              xe2x80x83                            ⁢                              w0                -                w1                +                w2                -                w3                                                                                        =                              xe2x80x83                            ⁢                                                                    (                                          +                      1                                        )                                    -                                      (                                          -                      1                                        )                                    +                                      (                                          -                      1                                        )                                    -                                      (                                          +                      1                                        )                                                  =                0                                                                        (        8        )                                                                    walsh2              =                              xe2x80x83                            ⁢                              w0                +                w1                -                w2                -                w3                                                                                        =                              xe2x80x83                            ⁢                                                                    (                                          +                      1                                        )                                    +                                      (                                          -                      1                                        )                                    -                                      (                                          -                      1                                        )                                    -                                      (                                          +                      1                                        )                                                  =                0                                                                        (        9        )                                                                    walsh3              =                              xe2x80x83                            ⁢                              w0                -                w1                -                w2                -                w3                                                                                        =                              xe2x80x83                            ⁢                                                                    (                                          +                      1                                        )                                    -                                      (                                          -                      1                                        )                                    -                                      (                                          -                      1                                        )                                    +                                      (                                          +                      1                                        )                                                  =                4                                                                        (        10        )            
That is, the illustrated operator outputs an auto-correlation value of xe2x80x9c4xe2x80x9d at its output terminal corresponding to the Walsh code 3.
The above-described FHT computation may be implemented directly in hardware, using adders and subtractors. This simple approach, however, is not realistic particularly when the code length is long, because of the intolerable propagation delay times resulting from its cascaded stages of adders and subtractors. To solve this problem, most implementations use the techniques of pipelined processing.
FIG. 19 is a timing diagram of typical pipelined processing when the code length is four. This processing can be realized by a circuit shown in FIG. 20. The circuit comprises flip-flops (FFs) 50 to 56, butterfly operators 57 and 58, rearrangement switches 59 and 60, a selector 61, and an operation timing generator 62.
The flip-flops 50 to 56 delay their input data by a predetermined time. The butterfly operators 57 and 58 perform a butterfly operation with the supplied data. The rearrangement switches 59 and 60 change the order of the supplied data as required. The selector 61 selects either the output of the flip-flop 56 or the lower-terminal output of the rearrangement switch 60. Based on a framing pulse signal that indicates the boundaries of individual data blocks, the operation timing generator 62 controls the components in the processor so that they will be timed correctly.
The processor circuit of FIG. 20 will operate as follows. Referring to (A) of FIG. 19, every falling edge of the framing pulse signal initiates a new cycle of operation. Referring to (B) of FIG. 19, the source data is supplied to the flip-flop 50, and then it reaches the input of the flip-flop 51 after one clock interval (i.e., one data interval). Operating at half the clock rate, the flip-flops 51 and 52 take in w0 (delayed) and w1 and hold them for two clock cycles. As a result, the butterfly operator 57 is supplied with the source data values w0 and w1 at the same time, as shown in (c) of FIG. 19. The butterfly operator 57 calculates their sum w0xe2x80x2 and difference w1xe2x80x2 according to the following formulas.
w0xe2x80x2=w0+w1xe2x80x83xe2x80x83(11)
w1xe2x80x2=w0xe2x88x92w1xe2x80x83xe2x80x83(12)
The subsequent source data values w2 and w3 are processed in the same way, resulting in the following values w2xe2x80x2 and w3xe2x80x2.
w2xe2x80x2=w2+w3xe2x80x83xe2x80x83(13)
w3xe2x80x2=w2xe2x88x92w3xe2x80x83xe2x80x83(14)
The flip-flop 53 delays the output data of the butterfly operator 57 by two clock intervals. The processed data is fed to the rearrangement switch 59 in the following order. First, the rearrangement switch 59 accepts w0xe2x80x2 at its upper input terminal. Two clocks later, w2xe2x80x2 and w1xe2x80x2 arrive at the upper and lower input terminals, respectively. Lastly, the rearrangement switch 59 receives w3xe2x80x2 at its lower input terminal. While passing w0xe2x80x2 and w3xe2x80x2 straight to the next stage, the rearrangement switch 59 swaps w2xe2x80x2 and w1xe2x80x2 internally. The flip-flop 54 is located at the upper output terminal of the rearrangement switch 59, adding a two-clock delay to that output data. As a result, the second butterfly operator 58 first receives w0xe2x80x2 and w2xe2x80x2, and then w1xe2x80x2 and w3xe2x80x2. It calculates their sums and differences according to the following formulas, as shown in (D) of FIG. 19.
w0xe2x80x3=w0xe2x80x2+w2xe2x80x2xe2x80x83xe2x80x83(15)
w2xe2x80x3=w0xe2x80x2xe2x88x92w2xe2x80x2xe2x80x83xe2x80x83(16)
w1xe2x80x3=w1xe2x80x2+w3xe2x80x2xe2x80x83xe2x80x83(17)
w3xe2x80x3=w1xe2x80x2xe2x88x92w3xe2x80x2xe2x80x83xe2x80x83(18)
The resultant values w2xe2x80x2 and w3xe2x80x2 appear at the lower output terminal of the butterfly operator 58. The flip-flop 55 feeds them to the rearrangement switch 60, with a delay of two clock intervals. Accordingly, the rearrangement switch 60 receives the processed data in the following order. First, the rearrangement switch 60 accepts w0xe2x80x3 at its upper input terminal. Then, w1xe2x80x3 and w2xe2x80x3 arrive at the upper and lower input terminals, respectively. Lastly, w3xe2x80x2 is fed to the lower input terminal. While passing w0xe2x80x3 and w3xe2x80x3 straight to the next stage, the rearrangement switch 60 swaps w1xe2x80x3 and w2xe2x80x3 internally at the above second step. The flip-flop 56 delays the data supplied from the rearrangement switch 60 by two clock intervals. As a result, the selector 61 first receives w0xe2x80x3 and w1xe2x80x3, and then w2xe2x80x3 and w3xe2x80x3 at its upper and lower terminals. Alternately changing its contact position at every clock transition, the selector 61 outputs the four values w0xe2x80x3, w1xe2x80x3, w2xe2x80x3, w3xe2x80x3 in this order, as shown in (E) of FIG. 19.
As described above, the conventional architecture for FHT processing can be implemented with simple logic circuits as long as it is for short codewords. In real-life applications, however, the conventional FHT processors are not practical at all. More specifically, mobile communications systems and other signal processing applications use longer codewords, meaning that the circuit has to employ more pipeline stages. This leads to the use of many flip-flops to adjust the operation timings at each stage. As more pipeline stages are needed, the number of flip-flops increases exponentially, requiring complex large-scale hardware.
Taking the above into consideration, an object of the present invention to provide an orthogonal transform processor which takes advantage of efficient pipelined processing without increasing the scale of its computation circuit.
To accomplish the above object, according to the present invention, there is provided an orthogonal transform processor which processes source data with an orthogonal transform algorithm. This processor comprises the following elements: a data reception unit which accepts a pair of source data values at intervals of T; an adder/subtractor, coupled to the data reception unit, which performs addition and subtraction of a given pair of data values at intervals of T/n, where n is an integer representing the order of the orthogonal transform algorithm being implemented; a storage unit which stores the resultant data values of the addition and subtraction at predetermined storage locations; a feedback unit which reads out the stored data values from the storage unit and feeds them back to the addition/subtraction unit; and a data output unit which reads out the data values stored as final result values in the storage unit, and sends out the final result values.