1. Technical Field
The present invention relates generally to data processing systems and in particular to the data manipulation within a data processing system. Still more particularly, the present invention relates to an improved apparatus and method of performing data operations with within a data processing system that reduces the utilization of critical processor cycles.
2. Description of the Related Art
Improving the performance and robustness of processors and the speed of data processing within processors is an ongoing goal in processor development. One recent development in processor technology involves the introduction of power performance computing (PowerPC®) and its corresponding reduced instruction set architecture. While several new instructions have been provided to support this new processor system, there exist several operations held over from previous implementations of processor architecture, which operations tend to be performed at relatively “slow” speeds on the PowerPC, while utilizing critical processor cycles and bandwidth, and causing reductions in the overall performance of the processes requiring the results of the operations.
One such operation, which incurs a measurable latency when performed by the PowerPC® is that of population count. Population count (or popcount, as the process is conventionally referred to) involves a processor or other specialized circuit counting the number of 1 bits within a block of data (e.g., a 32-bit word) that has been stored to memory. The popcount is typically triggered by a special popcount instruction which is received by the processor during processing of fetched instructions of an executing thread. The result of the popcount operation may be utilized for any series of more advanced data manipulations. Typically, popcounts are calculated at the time the popcount instruction is received, and the time for completion of the calculation may hamper the completion speed of the advanced processes. Unlike with prior art implementations in which a specialized dedicated circuit performs the popcount operation, most conventional processing devices perform popcounts via the processor executing the popcount instruction and triggering one or more of the processor's execution units to perform the popcount operation on the selected data. U.S. Pat. No. 4,989,168 for example, provides a method by which the multiplying unit in a computer system is utilized to perform the population counting operation. Other mechanisms and methods for performing such popcounts are generally known and applied.
One of the inherent issues with conventional popcount operations being completed by the execution units is the increased latency seen by the processors as the size of data increases with the increase in processing capabilities. Additionally, popcount operations today occur in real time on the processor, i.e., at the time the popcount instruction is retrieved by the instruction sequencer and placed in the execution units of the processor. The processor execution units then have to process this tedious, sequential calculation (e.g., an iterative summation) on the sample data to generate the popcount. This process tends to utilize significant amounts of the processors critical cycles and bandwidth. This real time processing of the popcount operation tends to tied up processing bandwidth in the processor's Fixed Point Units (FXUs) leading to latency and/or delays with the other processing functions as the popcount operation is pipelined through the FXUs.