1. Field of the Invention
The present invention relates to a device and method for performing arithmetic operations with cache memory; more particularly a device and method having a cache memory capable of partial output.
2. Discussion of Related Art
Computers perform arithmetic operations such as multiple, divide by manipulating data through repeated shifts, additions, and subtractions. Numerous accesses of data and operational cycles are needed to arrive at the result. For more complex operations such as xe2x80x98square rootxe2x80x99 and performing trigonometry functions, the traditional computing process is very time consuming and in some applications, it becomes not feasible to implement.
Some of the more complex arithmetic functions can be implemented by look-up tables, but the table must be large enough to accommodate many different operands. If there are differences in the operand to be operated upon, even a difference of a single bit, the look-up table would not work. The data from the look-up table would be erroneous. To obtain a correct result, the entire calculation must be performed for every digit of the operand.
Other implementations include use of dedicated circuitry. See, for example, U.S. Pat. No. 4,734,878 to Sutcliffe, xe2x80x9cCIRCUIT FOR PERFORMING SQUARE ROOT FUNCTIONSxe2x80x9d. Although less time consuming, dedicated circuitry require extra hardware and can be used only for the specified functions. That is, the Sutcliffe device can only perform square roots, a multiplier can only be used for multiplication, etc.
A cache memory is a high-speed random access memory (RAM) that a (CPU) can access more quickly than it can access regular memory. The cache memory is typically used to store data, which are frequently accessed by the CPU. The more operations a CPU can perform with data from cache memory, the speedier the operations can be performed as compared to operations with data accessed from regular memory.
The cache memory is directly accessed by the CPU (rather than via, for example, an external memory management unit) and is situated physically close to the CPU, some caches being on the CPU chip, to reduce the signal propagation time between the CPU and the cache memory. Descriptions of the operations of data cache such as in an Alpha AXP 21064 microprocessor can be found in xe2x80x9cComputer Architecture A Quantitative Approachxe2x80x9d, by David A. Patterson and John L. Hennessy, 1996, Morgan Kaufmann Publishers, Inc., pp. 380-383. When the CPU performs a read or a fetch operation, it will first check to see if the data or instructions it requests is in the cache. The data or instruction is accessed quickly if it is in the cache. Otherwise, the data or instructions must be fetched from regular memory (DRAM or ROM) and more time is consumed.
A conventional cache is illustrated in FIG. 1. Cache memory 10 includes a cache tag 11, which includes a plurality of buffers for storing data, addresses, or files frequently or recently accessed by the processor. Cache memory 10 includes circuitry for comparing newly input data against the data stored in the tag. As shown in FIG. 1, an exclusive OR gate is associated with each bit of each buffer in cache tag 11. When it is determined that the newly input data matches the data stored in one of the buffers in cache tag 11, a true comparison is output at gates 15, 16, . . . 19. Cache data 12 is a file which stores data corresponding to each buffer in cache tag 11. Upon receiving a true comparison signal from a corresponding gate 15, 16, . . . 19, the data in the corresponding location of cache data 12 is output as q4, q3, q2 . . . q0. The cache shown in FIG. 1 is 5 bits wide.
It can be seen that the characteristics of the cache memory may be useful as a data-storing device for an arithmetic device to perform high-speed calculation. For example, the cache memory such as one shown in FIG. 1, stores operands in cache tag 11 and corresponding calculating results in cache data 12. The arithmetic operation is bypassed if an operand on which the same arithmetic operation is to be performed is found in cache tag 11. In such case, the cache is used as a look-up table, with access time reduced as compared to a look-up table ROM. However, as in the case of a look-up ROM, usage of the cache in a look-up table mode will cause reduction in overall throughput if the approach only works when the operands match exactly with stored cache tag data.
Accordingly, a need exists for a device having a cache memory used in place of an arithmetic unit, which is capable of partial output of calculation results for partially matched operands.
A method of performing arithmetic operation using a cache structure having a TAG register and a cache data memory, the method comprising the steps of storing in the TAG register a plurality of known operands, each having a corresponding resultant derived from an arithmetic operation, the resultant being stored in a corresponding location in the cache data memory; inputting a new operand on which the arithmetic operation is to be performed to the TAG register; comparing the new operand to the plurality of the known operands; and determining if the new operand matches any of the known operands and if there is a match, outputting from the cache data memory the known resultant corresponding to the known operand that matched the new operand.
The method preferably further including the steps of: if there is no match in the determining step, determining if there are partial matches between the known operands and the new operand; selecting the known operand having the MSBs matching the MSBs of the new operand; accessing from the cache data memory the resultant corresponding to the known operand having the most matched MSBs; determining a first portion of the accessed resultant derived from the matched MSBs of the known operand; masking a second portion of the accessed resultant which is other than the first portion; and providing as a first partial resultant the masked resultant.
The method still further preferably including the steps of performing the arithmetic operation on bits of the new operand other than the matched MSBs to produce a second partial resultant; and outputting a complete resultant by combining the first partial resultant with the second partial resultant.
Advantageously, the illustrative methods according to the present invention are applicable to arithmetic operations such as square root, division, or multiplication.
A cache device is also provided for performing arithmetic operation comprising: a tag register for storing a plurality of known operands, each having a corresponding resultant derived from an arithmetic operation; A cache data memory for storing resultants corresponding to the known operands; Comparison logic in the tag register for comparing a new operand on which the arithmetic operation is to be performed against the plurality of the known operands, wherein if the new operand matches any of the known operands, the cache data memory outputs the known resultant corresponding to the known operand that matched the new operand.
The cache device preferably further including: a logic group detector for detecting a group of data having a common logic level, the logic group detector for detecting the known operand having the highest number of MSBs matching the MSBs of the new operand; masking logic for partially masking the resultant corresponding to the known operand having the most matched MSBs, the masking logic for passing a first portion of the corresponding resultant derived from the matched MSBs of the known operand and for masking a second portion of the corresponding resultant which is other than the first portion; and a multiplexer for multiplexing the first portion of the corresponding resultant into an arithmetic unit.
The cache device according to the present invention, further preferably including a second multiplexer, wherein the arithmetic unit performs the arithmetic operation on bits of the new operand other than the matched MSBs to produce a second portion resultant, the second multiplexer for multiplexing the second portion resultant into the arithmetic unit, wherein the second portion resultant is combined to the first portion resultant to form a complete resultant.