In certain applications, such as in printers, it is necessary to write data in a particular order that does not coincide with the order in which the data is stored in a memory. For example, data written in rows of a memory buffer (or in columns) is to be sent to the printer in columns (or in rows).
This is generally done by the use of so-called swath buffers. Swath buffers function as interfaces receiving data to be printed from the memory where the data is stored, and sends the data to a printer in the desired order. A swath buffer may be formed with a pair of memory buffers. Input data is written in rows in a first buffer, and a second buffer is for outputting the data. The second buffer copies the data to be output after reading the first buffer in columns, and arranges the data in the desired printing order.
This technique is burdensome both in terms of silicon area being occupied because it requires the use of two buffers, and in terms of time because the first buffer may be rewritten with new data only after all data stored in it has been copied in the second buffer.
A more convenient approach includes using only a single memory buffer and writing new data in the memory location of the just read data. This technique requires the use of a single memory buffer, but the addresses in which data is to be read and written are to be generated according to a certain sequence based on modular multiplications.
To illustrate how these addresses are generated, the following basic example will be considered. In a memory buffer of 3 rows and 2 columns, data A1, . . . , C2 intended for a printer swath process are initially written in a customary row order in the memory locations from 0 to 5:

A printer swath is obtained by reading data from the buffer by columns, and according to the cited technique, the just read data is overwritten with new data for a successive printer swath. The following table illustrates the read and write sequence:
ReadA1B1C1A2B2C2Address024135WriteD1D2E1E2F1F2
After having written the data of a second swath, the data is read in the appropriate sequence and the same memory locations are immediately rewritten with data for a third swath G1, . . . , T2:
ReadD1E1F1D2E2F2Address043215WriteG1G2H1H2I1I2
It is evident that the fourth (J1, . . . , L2) and fifth swaths (M1, . . . , O2) are read and written as shown in the following tables:
ReadG1H1I1G2H2I2Address031425WriteJ1J2K1K2L1L2and
ReadJ1K1L1J2K2L2Address012345WriteM1M2N1N2O1O2
As may be noticed, the data for the fifth printer swath is written in the same order as the data of the first swath. Therefore, this technique may be implemented by generating for each printer swath an appropriate sequence of memory addresses. These addresses may be calculated by noting that the first location (0) and the last location (5) are to always be read first and last, respectively, while addresses of the other locations are calculated by multiplying each address but the last by the number of columns (two), and by performing a modular reduction of the result with respect to five, which is the address of the last location.
In general, for a memory buffer of M rows and N columns, the recursive formula for calculating the address ζ(s+1) at step s+1 isζ(s+1)=(N·ζ(s))mod(N·M−1)  (1)
The system described in the European patent 497,493 has address generation based on the above algorithm. The above modular operation is performed in two separate steps: the multiplication first followed by the modular reduction.
This approach is burdensome from the point of view of the number of required computations. In fact, a multiplication circuit, if formed by combining devices, requires without optimization a number of n bit adders equal to n*m, wherein n and m are the number of bits of each factor, with n≧m. This multiplication may last several clock pulses if implemented in a sequential mode.
Even if the modular reduction was performed by the Barrett algorithm, it would need divisions and multiplications lasting a relatively large number of clock pulses. Reference is directed to A. Memezes, P. van Oorschot and S. Vanstone, “Handbook of Applied Cryptography”, CRC Press, downloadable from the website http://www.cacr.math.uwaterloo.ca/hac, for additional information. Therefore, the system of the above noted European patent is very straightforward to form but is not very efficient because the time required for generating a buffer address is relatively long. There is thus a need for a relatively faster circuit for generating addresses for a swath buffer.