Many electronic devices process digital signals. Often, this is done by means of a microprocessor such as for example a digital signal processor (DSP) which is particularly suited for processing digital signals such as audio or video signals. For example, in the processing of voice signals it is often required that not only a single number has to be handled but that matrix calculations have to be performed. The elements of a matrix are typically stored in a digital memory sequentially, that is each element is stored in a single memory cell one after the other and row by row. In order to perform the necessary calculations, the processor has to read the values from the memory cells. For this purpose, a DSP typically includes an address generation unit that generates the addresses for an efficient access to the content of the corresponding memory cells. For an efficient access these address generation units often include a modulo arithmetic.
Some calculations require that the elements of a matrix are read linearly that is one after the other row by row. FIG. 1 shows a part of a known address generation unit (AGU) 21 that allows to efficiently generating the addresses for linearly addressing the elements of a matrix.
The AGU 21, designated hereinafter as the “standard modulo”, includes an adder 22 that adds the increment 31 inputted at a second input 22.2 of the adder 22 to the current address, designated hereinafter the input address 30, inputted at the first input 22.1 of the adder 22. At its output 22.3, the adder 22 produces the next address 32 that is inputted to a first input 23.1 of a subtractor 23. A modulo 33 is inputted to a second input 23.2 of the subtractor 23 which outputs a comparison address 34 at its output 23.3. The next address 32 is further inputted at the first input 24.1 of a multiplexer MUX 24 and the comparison address 34 is inputted at the second input 24.2 of the multiplexer MUX 24. Depending on the value of the comparison address 34, the MUX 24 generates the output address 36 at its output 24.3. That is, the MUX 24 provides the next address 32 as the output address 36 if the comparison address 34 is lower than zero and the MUX 24 provides the comparison address 34 as the output address 36 if the comparison address 34 is higher than or equal to zero. For deciding whether the comparison address 34 is lower than zero the AGU 21 includes a comparator 25 where the output 23.3 of the subtractor 23 is connected to the input 25.1 of the comparator 25. The comparator 25 generates a control signal 35 at its output 25.2 for controlling the MUX 24 via its control input 24.0. Starting with a given starting address, repeating the address generation with the AGU 21 several times and using the generated output address 36 as the input address 30 of the next address generation respectively, AGU 21 generates an address sequence for addressing the cells of a matrix.
It is to note that this modulo arithmetic is a simple version. Correct results are only available if the input address is lower than the modulo and if the increment is positive and lower than the modulo.
Storing a n*m with n=4 and m=4 matrix B with the sixteen elements B00, B01, B02, B03, B10, B11, B12, B13, B20, B21, B22, B23, B30, B31, B32 and B33 as shown in FIG. 13 requires for example sixteen memory cells 0-15 as shown in FIG. 2. Starting with the values 0 as the input address 30, 1 as the increment 31 and 16 as the modulo 33, AGU 21 produces the following address sequence:0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1  (I)which means that all elements of the matrix B are addressed linearly. This is shown in FIGS. 2 and 3 by the arrows where each arrow starts at the input address and points to the corresponding output address generated by AGU 21. FIG. 3 shows how the elements of the matrix B are accessed in sequence. And it can be seen that the addressing algorithm jumps from the last matrix element 15 back to the first element 0 as indicated by the arrow 38 (circular buffer).
However, certain matrix calculations such as for example a multiplication of a p*n with p=2 and n=4 matrix A with the eight elements A00, A01, A02, A03, A10, A11, A12 and A13 as shown in FIG. 13 with the n*m matrix B with n=4 and m=4 require that the elements of the matrix B are accessed column by column. Multiplying A with B yields the p*n product matrix C=A×B with the elements C00, C01, C02, C03, C10, C11, C12 and C13. The following equation shows how the elements Cij are determined:
                              C          ij                =                              ∑                          k              =              0                                      n              -              1                                ⁢                                    A              ik                        ·                          B              kj                                                          (        II        )            
That is, the element Cij is nothing else than the scalar product of the ith row vector Ai of A with the jth column vector Bj of B. The elements of the row vector Ai are addressed with a standard modulo AGU as for example AGU 21. But the addressing of the elements of the column vector Bj is more complex and is not realisable with the AGU 21: Starting for example again with a value 0 for the input address 30, n (n=number of rows of B=4) as the increment 31 and n*m−1=15 (n=number of rows of B=4 and m=number of columns of B=4) as the modulo 33, AGU 21 produces the following address sequence:0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 0 4 8  (III)
Again, the arrows in FIG. 4 show the sequence how the memory cells are addressed and FIG. 5 shows the corresponding sequence of the accessed matrix elements. It can be seen that with the modulo n*m−1 the AGU 21 does not allow to efficiently access the 16th memory cell with the element B33 but jumps from the element B23 back to the element B00. Accordingly, there are needed some extra cycles to access the last element B33 of B.
A further addressing possibility with the AGU 21 is to choose n*m=16 as the value of the modulo 33. In this case AGU 21 produces the address sequence:0 4 8 12 0 4 8 12 0 4 8 12 0 4 8 12 0 4  (IV)
In order to address the elements correctly, the address pointer has to be incremented by 1 after each column (after n addresses) which would yield the correct address sequence:
                                          0                                4                                8                                12                                1                                5                                9                                13                                2                                6                                10                                14                                3                                7                                11                                15                                0                                4                                …                                                                                                                                                                                                                                                                                                      =                              0                +                1                                                                                                                                                                                                                                                      =                              1                +                1                                                                                                                                                                                                                                                      =                              2                +                1                                                                                                                                                                                                                                                                      =                                  (                                      15                    +                    1                                    )                                                            mod                ⁢                                                                  ⁢                16                                                                                                                                                                                            (        V        )            
But for adding 1 after each column there are also needed some extra computing cycles to jump to the next column.
The known address generation unit does not allow to efficiently address a matrix by its columns. Additional effort is necessary to generate the address sequence for accessing the elements of a matrix column by column.
Document U.S. Pat. No. 4,809,156 discloses an address generation unit for a computer system that includes a plurality of address generation files each of which being designed for a specific address generation problem. One of these problems is to address a matrix column by column. This circuit allows the generation of the required addressing sequence (as described for example in col. 11, line 30 ff). However, the address generation unit is very complex and requires a lot of space on a chip or even a separate chip. Accordingly, for generating the required address sequences a complicated programming is necessary.
Document U.S. Pat. No. 6,052,768 shows a further address generation circuit. Here, an incremented address is generated by adding an increment to a current address and a revised address is generated by adding or subtracting (depending on the sign of the increment) a data region size value to or from the incremented address. The output address is then generated by selecting either the incremented address or the revised address by means of an output selection circuit which includes two multiplexers, a comparator and an XOR gate. Again, this circuit requires a large area on a chip. Again, a large substrate area is necessary to implement this circuit and it is not possible to generate the addresses for accessing a matrix column by column.
Another addressing circuit that allows accessing the elements of a matrix column by column is known from document U.S. Pat. No. 6,647,484 B1. For generating the addresses, the circuit includes three adders and a subtractor and therefore also requires a large chip area.