1. Field of the Invention
The present invention relates generally to the field of computing, and more specifically to testing or comparing bit fields in high speed computer environments.
2. Description of the Related Art
Certain newer computing devices employ high speed architectures having highly efficient execution and fast throughput. One such high speed computing architecture is the Itanium architecture, a joint development between Intel Corporation of Santa Clara, Calif. and Hewlett Packard Corporation of Palo Alto, Calif., the assignee of the present invention. The Itanium architecture employs EPIC (Explicitly Parallel Instruction Computing), a technology enabling enhanced performance over previously known RISC architectures. Features and a general discussion of the Itanium 2 processor can be found at:                http://h21007.www2.hp.com/dspp/files/unprotected/litanium2.pdfThe Itanium architecture conforms to various Itanium architecture developer's guides, user manuals, reference guides, and related publications, including but not limited to Intel Order Numbers 245317-004, 245318-004, 245319-004, 245320-003, 249634-002, 250945-001, 249720-007, 251141-004, 248701-002, 251109-001, 245473-003, and 251110-001.        
A conceptual arrangement of a system employing Itanium architecture is illustrated in FIG. 1. As used herein, the Itanium architecture may be embodied in different implementations, including but not limited to the Itanium processor and Itanium 2 processor. From FIG. 1, processor 102 resides in computing apparatus 101. Processor 102 employs a series of registers within certain register files, such as general register file 110. The system further includes a compiler 114 that compiles code and facilitates the execution of compiled computer code to interact with and between the aforementioned registers and register files.
Code employed in high speed architectures performs various computing tasks, such as comparisons. Comparisons typically involve comparing two register values to see if some relationship is true, such as one value being equal to the other, one being less than the other, and so forth.
Frequently, data values being compared are not as large as the registers holding them. Two specific situations exist when comparing values that do not occupy the entire register. First, the register may hold only the small value, typically in the lower order bits of the register. Bits above the value or in the higher order portion of the register may be either all zeros, all ones, all a copy of the most significant bit of the value (sign extended), or all irrelevant or “garbage” values. In this scenario, comparisons are typically performed by using special compare instructions that compare only a subset of the total number of bits, and such comparisons are typically restricted to certain bit field sizes. Examples of size restricted comparison instructions include compare-byte and compare-halfword instructions.
Alternately, comparisons may be performed by executing an instruction to convert the smaller value to equivalent larger sized values and using a full register size comparison. The system may convert to larger sized values by sign extending signed values, as with a sign-extend-byte instruction, or by zero extending unsigned values, as with a zero-extend-byte instruction. In these situations, a conversion or possibly a shift instruction may be required prior to performing a comparison. In cases where upper bits are known to be the desired values (all zeros, or copies of the signed bit, depending on whether the value compared is unsigned or signed), the conversion instruction may be omitted.
Secondly, the situation may arise where the register holds multiple values packed together and the value to be compared lies at an arbitrary point within the register. This scenario can occur where the original source program orders certain data to be packed into bit fields. While data packing tends to decrease the efficiency of operating on the data within a processor, it also tends to decrease the total amount of memory needed. The result is an increase in the efficiency of memory usage, cache usage, and so forth. In the presence of a significant amount of such data, increased memory and cache efficiency can be dominating factors, making packing data into bit fields particularly advantageous.
The comparing of values packed into bit fields typically operates as follows. The system extracts the value and places the extracted value in the low order bits of the register. The system compares the extracted value using normal full register compare instructions. Extraction is typically performed by shifting the value to the least-significant end of the register and masking the upper bits, such as by forcing them to zero if the value being extracted is unsigned. Alternately, the system may shift the value to the most significant end of the register, and subsequently shift the value down to the low end of the register. This shift-up, shift-down approach uses either a logical shift right to extract an unsigned field or an arithmetic shift right for extraction of a signed field. The arithmetic shift causes the system to shift a copy of the sign bit to the upper bit(s).
Another previous method employed to compare contents of a bit field containing less than a full complement of data is an extract instruction. An extract instruction performs a right shift together with a masking in a single instruction. This method fills upper bits of the register with either zeros, for an unsigned extract, or copies of the sign bit for a signed extract.
A further method addresses the situation where the bit field compared is a single bit, typically using a special instruction to test the bit against a one or a zero. One such implementation is included in the Itanium architecture as “tbit” or test bit. Generally, such an instruction is designed to use the same shifter or functional unit as employed by shift left and shift right instructions. In the single bit situation, the system shifts the value to the right so the bit to be tested is at the bottom of the result. The system then compares the shifted result against zero or one.
These approaches have been sub-optimal for various reasons. First, certain of these implementations require several instructions to extract, shift and/or convert the value into a full register value before performing a comparison. These instructions tend to be serially dependent, so little parallelism can be exploited by multiple functional units. Second, these comparisons typically require at least one instruction that must be performed by a shifter functional unit. Most modern processors tend to implement more arithmetic/logical functional units (ALUs) and fewer shifter units. The necessary extract, shift and/or convert operations must typically be performed in serial before the associated comparison, so again the opportunity to employ parallelism and perform multiple instructions or functions at one time is reduced. Finally, as shifter designs have significant amounts of execution time taken up by propagation of signals through wires rather than the time required to perform hardware switching, such as switching transistors, and because propagation time through wires does not scale as well as transistor speeds in the presence of ever-smaller integrated circuit geometries, shifter functional units tend to not scale nearly as well as ALUs. Poor scaling when propagating through wires tends to cause shift operations to need more processor clock cycles than ALU operations in newer and currently contemplated future designs. Requiring shift operations to perform bit field comparisons further increases the latency of such operations. Use of shift operations to perform bit field comparisons further decreases the amount of parallelism available in such sequences.
Further, as the size of the data working set for programs increases at a faster rate than the rate of increase of associated cache sizes, the incentive to increase the memory efficiency of the data working set, such as by packing data into data fields, increases as well. It would therefore be beneficial to perform bit field comparisons more efficiently, using fewer instructions and allowing greater parallelism. It would also be beneficial to offer comparisons with less dependency on shift operations. In sum, it would be advantageous to more efficiently perform bit field comparisons; including comparisons of a typically sized values in high speed processor architecture environments, such as the Itanium architecture, and minimize those drawbacks associated with previous bit field comparisons.