System performance is continually being improved in order to maximize the amount of information that can be processed in a given period of time. In computer systems, for example, it is desirable to make mathematical calculations as fast as possible since many problems require millions of such calculations. Likewise, in communication systems, if more data can be transferred between locations in a set time period, the communication can be performed more economically.
Several approaches have been taken to further improve system performance. These approaches include improving the semiconductor technology from which the systems are made, and increasing the hardware used to perform the system functions. Very Large Scale Integrated (VLSI) circuits may be made to operate faster by shrinking transistors to ever smaller feature sizes. Still more performance enhancements may be realized by processing information in parallel which typically requires duplicating the hardware in a system. Two data processors may be used, for example, where each processor performs a different instruction but at the same time (parallel processing).
A system which uses two processors to process instructions in parallel is described in Grote et al. U.S. Pat. No. 4,728,930. In this system, data is latched for processing by a master processor, and other data is latched for processing by an independent slave processor. The slave processor processes its received data while the master processor is busy processing its received data. The slave processor does not completely depend upon the master processor to perform it functions. As a result, it is not necessary for the master processor to complete its processing before new data may be processed and performance can therefor be enhanced.
Parallel processing of two instructions in a single processor may be accomplished by using dual execution units (typically the portion of the processor containing the Arithmetic Logic Unit (ALU) for performing logic and mathematical operations). Chuang U.S. Pat. No. 4,766,566 describes a processor having dual execution units on a single chip for executing ALU instructions in parallel wherein one execution unit includes an ALU and the other execution unit includes an ALU with additional related logic to allow processing of more complex calculations. The single processor as described may perform calculations in parallel thus increasing processing throughput.
In some processes it is necessary for one calculation to be processed before further processing can take place. Processing is thus required to be accomplished in a serial manner so that performance improvement may not be realized by simply duplicating hardware and performing parallel processing. A method well known to those skilled in the art for improving the throughput of serially executed instructions is pipelining. Basically, in a pipelined system, the result of one calculation or instruction execution is stored in a register for the next stage whose output is stored in a register for the next stage and so on. The registers are made up of latches which are under the control of a system clock so that on each clock cycle each stage is executing an instruction or performing a calculation to be provided to the next stage. Performance is improved since each stage can be executing a portion of the process instead of waiting for a result to propagate through the entire structure before another instruction can be started.
A special type of pipelining system exists wherein a process requires the result of a calculation stored in a register to be fed back into the circuit which made the calculation before executing the next calculation. The throughput of the process is limited by the propagation of the process through the circuit and the register. This type of calculation is hereinafter referred to as a dependent identical process. A traditional pipelining architecture does not improve performance since the stage to receive the results of the previous clock cycle is the same stage that provided those results.
An example of a dependent identical process generator is a Cyclic Redundancy Checker (CRC) circuit. A typical application of CRC circuits is to provide a check for ensuring that data sent from one location is the same data that is received at another location. To accomplish the data integrity check, a CRC circuit generates a CRC code representing the sent data. When that data is received, an identical CRC circuit generates a second CRC code. If the data received matches the data sent, the two CRC codes will match. Each CRC code is calculated from a series of CRC sums which are further calculated from each new data word and the previous CRC sum. Calculating the CRC code requires that each CRC sum be temporarily stored in a register which is then fed back into the inputs of the CRC circuit before the next calculation or sum can be computed. Thus, each CRC sum is a function of a previous CRC sum and the sent or received data.
A quasi parallel CRC circuit is described by Mead in U.S. Pat. No. 4,593,393. This invention describes a system where a number of bits for a CRC sum are processed in parallel. The next set of bits cannot be processed until the CRC sum of the previous bit is calculated and fed back into the inputs of the CRC circuit. In this scenario, processes such as CRC sums are not conducive to parallel processing since the present calculation is dependent upon the previous calculation. Furthermore, these calculations will not benefit from traditional pipelining methods since the previous CRC sum is fed back into the same processing block.
The problems of increasing performance of dependant identical processes may be further exacerbated by the need to ensure the integrity of the data that is stored in the registers. It is well known in the art to use Level Sensitive Scan Design (LSSD) techniques for improving the integrity of the stored data. A LSSD register includes two latches which are driven by non-overlapping clocks. In this scenario, the result of a CRC calculation would be loaded into latch one on a first clock and stored in the second latch on a second clock. As a result, only one CRC sum can be calculated during the period of the non-overlapping clocks.
Thus what is needed is an apparatus and method which provides for higher throughput of calculations of dependent identical processes wherein more than one calculation can be executed during the period of non-overlapping clocks in an LSSD design environment.