Several conventional architectures exist for calculating an output of multi-argument associative operations. An associative operation is an operation that can process inputs in any order. For example, mathematical operations that are associative operations include, but are not limited to addition, subtraction and multiplication. Associative properties are maintained when performing certain combinations of operations, such as addition and subtraction, but associative properties are not maintained when performing other combinations of operations, such as addition and multiplication. The conventional architectures include components for implementing operators that perform the associative operations. An operator performs a function on two or more operands or inputs and can, for example, perform the mathematical operation of addition.
One example of a conventional architecture is a serial architecture. Using the serial architecture, only one component is necessary despite the number of inputs (N). Assuming that it takes one clock cycle to perform the operation on two operands, the number of cycles necessary to calculate the output of the multi-argument associative operations is equal to the number of inputs minus one (i.e., the number of Cycles=N−1). FIG. 1 illustrates an example of a serial architecture 100 that receives four inputs. To calculate the output 190, the serial architecture calculates a first output (not shown) using a first input 102 and a second input 104 in a first cycle. The serial architecture 100 then calculates a second output (not shown) using the first output and a third input 106 in a second cycle. Using the second output and a fourth input 108, the serial architecture calculates the output 190 in a third cycle. Thus, for a multi-argument associative operation that has four inputs, the serial architecture 100 requires three cycles to calculate the output. If the serial architecture had eight inputs, the serial architecture would require seven cycles to calculate the output. In this manner, a serial architecture calculates the output of a multi-argument associative operation in series.
Another conventional architecture is a tree or parallel architecture. For a multi-argument associative operation that has a number of inputs that is greater than two, the tree architecture uses multiple components to calculate the output of the operation. The components in the tree architecture can operate in stages such that all components in the same stage operate during the same cycle.
For a multi-argument associative operation that has N inputs, the number of cycles required to calculate the output is equal to log2(N), assuming it takes one clock cycle per operation. FIG. 2 illustrates a tree architecture 200 for a multi-argument associative operation that has eight inputs. The tree architecture has a first stage 202, a second stage 204 and a third stage 206.
The first stage 202 includes a first component 210, a second component 220, a third component 230, and a fourth component 240. The first component 210 receives a first input 212 and a second input 214 and produces a first output 216. The second component 220 receives a third input 222 and a fourth input 224 and produces a second output 226. The third component 230 receives a fifth input 232 and a sixth input 234 and produces a third output 236. The fourth component 240 receives a seventh input 242 and an eighth input 244 and produces a fourth output 246. Each of the components 210, 220, 230 and 240 calculates each output 216, 226, 236 and 246, respectively, during a first cycle.
The second stage 204 includes a component 250 and a component 260. The component 250 receives the first output 216 and the second output 226 from components 210 and 220, respectively, and produces an fifth output 256. The component 260 receives the third output 236 and the fourth output 246 from components 230 and 240, respectively, and produces a sixth output 266. Each of the components 250 and 260 calculates each output 256 and 266 during a second cycle.
The third stage 206 includes a component 270 that receives the fifth output 256 and the sixth output 266 from the components 250 and 260, respectively, and produces an output 280 during a third cycle.
In yet another conventional architecture, the tree architecture is combined with the serial architecture to form a serial/tree architecture. The serial/tree architecture attempts to take advantage of the smaller number of cycles required by the tree architecture and the smaller number of components required by the serial architecture. FIG. 3 illustrates a serial/tree architecture 300 for a multi-argument associative operation that has eight inputs. The serial/tree architecture 300 has a first stage 302 and a second stage 304.
The first stage 302 includes a component 310 and a component 350. The component 310 receives a first input 312, a second input 314, a third input 316, and a fourth input 318 and produces a first output 320. The component 350 receives a fifth input 352, a sixth input 354, a seventh input 356, and an eighth input 358, and produces a second output 360. The components 310 and 350 each calculate a first preliminary output (not shown) using the first and second inputs 312 and 314 and the fifth and sixth inputs 352 and 354, respectively. After calculating the first preliminary output the component 310 calculates a second preliminary output (not shown) using the first preliminary output and the third input 316 during a second cycle. The component 350 also calculates a second preliminary output (not shown) using the first preliminary output and the seventh input 356 during the second cycle. On a third cycle, the components 310 and 350 each calculate an output 320 and 360, respectively. The component 310 calculates the output 320 using the second preliminary output and the fourth input 318 and the component 350 calculates the output 360 using the second preliminary output and the eighth input 358. The outputs 320 and 360 are used by the second stage of the serial/tree architecture to calculate the output 380 of the multi-argument associative operations. In this manner, the second stage 304 includes a third component 370 that receives the first output 320 and the second output 360 and produces an output 380 in a fourth cycle.
Comparing the serial/tree architecture to the tree architecture it is observed that the serial tree/architecture provides a reduction in the number of components required to calculate an output of a multi-argument associative operation, but it requires more cycles.
The conventional architectures for calculating an output of multi-argument associative operations do not efficiently use the latency of the components, while minimizing the number of components required to calculate an output. For example, the tree architecture minimizes the number of cycles necessary to calculate the output, but uses a large number of components to do it and the serial architecture minimizes the number of components, but requires the number of cycles to be one less than the number of inputs. While the serial/tree architecture provides a compromise between the tree and serial architecture, the serial/tree architecture does not fully take advantage of the latency of the components. By minimizing the number of components that are used to implement multi-argument associative operations, it is possible to reduce the area required to implement the associative operations as well as the cost of implementing the multi-argument associative operations. Further, by minimizing the number of components that are used, a reduction in the complexity of implementing the associative operations can be realized. An algorithm and architecture are therefore desired that minimizes the number of components required to calculate the output of multi-argument associative operations by using the latency of the components.