1. Field of the Invention
The present invention generally relates to an information processing apparatus and, more particularly, to a flexible information processing apparatus capable of efficiently processing parallel accumulations and to an information processing apparatus capable of processing parallel accumulations of a variety of types.
2. Description of the Related Art
FIG. 13 is a block diagram showing a construction of an information processing apparatus according to the related art capable of processing parallel accumulations. Referring to FIG. 13, the information processing apparatus according to the related art comprises a memory 201 for storing data, a register A 202 for storing the data read from the memory 201, an accumulator 203 for accumulating the data stored in the register A 202, a register B 204 for storing results of accumulation performed by the accumulator 203 and a memory controller 205 for controlling an operation of reading from the memory 201.
A description will now be given of the operation according to the related art.
FIG. 14 shows an example of how data is stored in the memory 201. Referring to FIG. 14, data D0 is stored at address 100h, data D1 at address 101h, data D2 at address 102h, data D3 at address 103h, data D4 at address 104h, data D5 at address 105h, data Y2 at address 200h, data Y5 at address 201h, and data Y8 at address 202h.
FIGS. 15A-15E are timing charts showing how the operation of the information processing apparatus according to the related art is timed. FIGS. 15A-15E show that each step of the operation occurs at a rising edge of a clock. From the memory 201, data D0 at address 100h is stored in the register A 202 at T1, data D1 at address 101h is stored at T2 and data D2 at address 102h is stored at T3. The register B 204 is initialized to 0 at T1. At T2, data D0 in the register A 202 and the data in the register B 204 are accumulated by the accumulator 203 so that a result of accumulation D0+0 is stored in the register B 204.
Accumulation and storage in the register B 204 are repeated two additional times (see FIGS. 15C and 15D) so that data Y2, a final result of accumulation stored in the register B 204, is written at T5 to the memory 201 at address 200h shown in FIG. 14. At T10, data Y5 stored in the register B 204, a result of accumulation resulting from a subsequent cycle of accumulation involving three steps, is written to the memory 201 at address 201h shown in FIG. 14.
According to the related-art information processing apparatus as described above, a redetermined number of steps of reading of data from the memory 201 and a predetermined number of steps of accumulation in the accumulator 203 proceed in parallel. Thereby, the processing time is reduced. The initialization of the accumulator 203 and the writing of the result of accumulation to the memory 201, however, are processed separately. As a result, when an accumulation of three data items is repeated twice, for example, a total of 10 cycles T1 through T10 are required.
FIG. 16 is a block diagram showing a construction of another related-art information processing apparatus with the parallel accumulation capability disclosed in Japanese Laid-Open Patent Application No. 10-214261. Referring to FIG. 16, the information processing apparatus comprises a source data memory 501, an automatic consecutive address generator 502 and a register A 505 for storing the source data. The automatic consecutive address generator 502 is used to store the source data from the source data memory 501 in the register A 505 using consecutive cycles. The apparatus further comprises a coefficient data memory 511, an automatic consecutive address generator 512 and a register C 506 for storing the coefficient data. The automatic consecutive address generator 512 is used to store the coefficient data from the coefficient data memory 511 in the register C 506 using consecutive cycles.
Referring also to FIG. 16, the apparatus further comprises a pipeline operation unit 507 producing a product of the source data stored in the register A 505 and the coefficient data stored in the register C 506. A register D 513 stores a result of operation performed by the pipeline operation unit 507. An accumulator 508 accumulates results of operation stored in the register D 513. An initializer 508 initializes a result of accumulation in the accumulator 508. A register B 509 stores the result of accumulation from the accumulator 508. The apparatus also includes a destination data memory 504.and an automatic consecutive address generator 503. The automatic consecutive address generator 503 is used to transfer the result of operation in the register B 509 to the destination data memory 504.
FIGS. 17A-17I are timing charts showing how the operation of the information processing apparatus according to the second related art described above is timed. FIGS. 17A-17I show that each step of the operation occurs at a rising edge of a clock. From the memory 501, data D0 is stored in the register A 505 at T1, data D1 is stored at T2 and data D2 is stored at T3. From the coefficient data memory 511, data C0 is stored in the register C 506 at T1, data C1 is stored at T2 and data C2 is stored at T3.
At T2, the pipeline operation unit 507 multiplies the data in the register A 505 by the data in the register C 506. A result of operation Z0, i.e. D0*C0, is stored in the register D 513. At T3, an initializing signal is at LOW so that the accumulator 508 produces an arithmetic sum of 0 and the data in the register D 513 so as to store a result of accumulation Y0, i.e. Z0+0, in the register B 509. Alternatively, when the initializing signal is at HIGH (at T4, for example) the accumulator 508 produces an arithmetic sum of the data in the register D513 and the data in the register B 509 so as to store the result of accumulation Y1, i.e. Z1+Y0, in the register B 509. The step of accumulation is repeated three times. At T6, data Y2, a result of accumulation stored in the register B 509, is written to the destination data memory 504 at memory address 0h.
The process described above is repeated until, at T9, data Y3, a result of accumulation for a second cycle of accumulation, is written to the destination data memory 504 at memory address 1h. Thus, a repetition including two cycles of accumulation of three data items requires a total of 9 cycles T1 through T9. Excluding the pipeline operation, the first and second related-art apparatuses discussed are directed to a similar operation. A difference is that the second related-art apparatus provides an improvement in the processing efficiency by requiring only a total of 8 cycles.
To summarize, in the information processing apparatus according to the second related art discussed, the reading of the source data from the source data memory 501, the reading of the coefficient data from the coefficient data memory 511, the operation in the pipeline operation unit 507 and the accumulation in the accumulator 508 proceed in parallel such that predetermined number of each of these steps occur simultaneously. Additionally, the initialization of the result of accumulation performed by the accumulator 508, the series of accumulation and the writing of the result of operation to the destination memory 504 proceed in parallel such that predetermined number of each of these steps occur simultaneously. Thereby, the processing time for successive accumulations is reduced.
A disadvantage with the information processing apparatus according to the first related art is that, for each cycle of accumulation, the initialization of the accumulator 203 and the transfer of the result of accumulation to the memory 201 are required. As a result, the overall processing time is relatively long so that the processing efficiency is relatively low, when successive accumulations occur.
While the information processing apparatus according to the second related art is successful in resolving the problem with the first apparatus, the frequency of repetition of accumulation cannot be changed readily since the initializer 510, the automatic consecutive address generators 502, 503 and 512 are constructed to be independent of each other. It is also difficult to modify the read address in the source data memory 501 or the write address in the destination data memory 504. Therefore, the second related art is not successful in realizing a flexible information processing apparatus.
Another disadvantage with the information processing apparatus according to the second related art is that, when an extra operation, such as a shift operation or a round off operation is required in the result of operation, the extra operation should occur separately so that the overall processing time is extended.
Still another disadvantage with the apparatus according to the second related art is that, the source data subject to accumulation should be stored in a continuous area in the source data memory 501.
Yet another disadvantage with the apparatus according to the second related art is that, since the result of accumulation is written in a contiguous area in the destination data memory 504, it is imperative that a contiguous area be reserved for storage of the result of accumulation.
Accordingly, a general object of the present invention is to provide an information processing apparatus in which the aforementioned disadvantages are eliminated.
Another and more specific object of the present invention is to provide a flexible information processing apparatus in which the efficiency of parallel accumulations is improved by reducing the required processing time, and in which parallel accumulations of different types are performed.
The aforementioned objects can be achieved by an information processing apparatus comprising: a memory for storing data; a first memory controller for outputting a read address and controlling reading of the data stored in the memory; a first initial address register for storing an initial value of the read address output by the first memory controller; a first register controlled by the first memory controller to store the data read from the memory; an accumulator for accumulating the data stored in the first register; a second register for storing a result of accumulation by the accumulator; an initializer for initializing the accumulator and outputting the result of accumulation stored in the second register to the memory; a second memory controller for outputting a write address and writing the result of accumulation stored in the second register to the memory; a second initial address register for storing an initial value of the write address output by the second memory controller; an accumulator count register for storing a number of data items to be accumulated by the accumulator and a frequency of repetition of accumulation; and a controller for timing initialization of the accumulator by the initializer, based on the number of data items to be accumulated stored in the accumulator count register, for controlling timing of output of the initial read address from the first memory controller, and for controlling timing of output of the initial write address from the second memory controller, wherein reading, by the first memory controller, of the data from the memory into the first register, accumulation of the read data in the accumulator, and writing, by the second memory controller, of the result of accumulation to the memory proceed in parallel in each cycle of accumulation such that a predetermined number of each of these steps are performed simultaneously.
The information processing apparatus may further comprise: an operation unit for performing an operation on the data stored in the first register; and a third register for storing a result of operation by the operation unit, wherein the operation by the operation unit and accumulation of results of operation stored in the third register proceed in parallel in each cycle of accumulation.
The information processing apparatus may further comprise: a third register for storing first data stored in the first register; an operation unit for performing an operation on second data stored in the first register and the first data stored in the third register; and a fourth register for storing a result of operation by the operation unit, wherein the operation by the operation unit and accumulation of results of operation stored in the fourth register proceed in parallel in each cycle of accumulation.
The information processing apparatus may further comprise: an operation unit for performing an operation on the result of accumulation stored in the second register; and a third register for storing a result of operation by the operation unit, wherein the operation by the operation unit and writing, by the second memory controller, of a result of operation to the memory proceed in parallel in each cycle of accumulation.
The first memory controller may output read addresses in a non-sequential manner.
The second memory controller may output write addresses in a non-sequential manner.