1. Technical Field
The present invention relates generally to processing multiple data elements in a computer register, and more specifically relates to a system and method for utilizing a mask register and hardware assist instructions to simultaneously process multiple data elements of arbitrary size.
2. Related Art
Computer processors function by processing data elements through various registers in accordance with instructions provided by a computer program. The registers generally have a capacity that is a power of two. For instance, a register might have a capacity of 8 bits, and it would be able to process, in a single processing cycle, a data element having up to eight bits in the element. As an example, an 8-bit register can process a 4-bit data element in a single cycle. Of course, registers typically have sizes larger than 8 bits, i.e., registers can have 16 bit capacities, or 32 bits, or 64 bits, and so on. Non-limiting illustrative examples of the types of operations undertaken by registers include multiplication by a constant, addition, subtraction, shift left-logical, shift right-logical, AND, and OR operations.
After the data elements have been processed, they can be sent to another register for further processing, or they can be stored or output. To illustrate, in the printer field, a server microprocessor processes an input data stream through its various registers in accordance with a computer program, and it might output a data stream of compressed image data in so-called JPEG format to a printer processor, which then operates on the data as appropriate to instruct a printer apparatus how to print the image.
The processor itself executes instructions in the form of machine language, which are the low level instructions relating to what data elements are processed through which registers. Most software however is written in higher-level programming code such as C, which has the advantages of being human readable and of embodying relatively complex processing operations using comparatively short, quickly-written commands. A compiler receives the high-level programming code, decides the best way among many choices to map it into assembly language, passes the mapping to an assembler, and the assembler then maps the assembly language into so-called machine language that is readable by a processor. From time to time, a programmer may elect to write parts of the program that are executed more frequently than other parts directly in a lower-level language. While more cumbersome to write, these so-called “hand-crafted” portions of code do not have to be translated by a high level language compiler and, thus can be written in a more optimized fashion to facilitate faster processing at run time.
Regardless of whether the processor receives the machine code from a compiler or directly from a handcrafted program, the present invention makes the critical observation that it is often the case that register space is wasted. More particularly, as intimated above a register might not be used to its full capacity in every processing cycle. For instance, when a 16-bit capacity register is used to process 4-bit data elements. 12 bits of the register per cycle are wasted. This slows processing time, creates additional data caching requirements (and attendant cache miss problems), and in general fails to fully exploit processor capacity. Accordingly, the present invention recognizes the potential improvement in processor performance that would inure were multiple data elements to be processed in a register in a single cycle.
The present invention further understands that implementing a solution for the above is not trivial, particularly if both positive and negative (that is, “signed”) values, and not just positive values, are to be processed, owing to the possibility of exceeding register capacity and/or corrupting data during processing. Stated differently, as used by the present invention, a “signed” data element is one that is not constrained to be non-negative, and it is desirable that multiple signed data elements be processed through a single register in a single processing cycle. Furthermore, the present invention understands that for robustness, it is desirable that a processor not be constrained by the manufacturer to accept multiple data elements per register of only predetermined bit sizes, but rather that a programmer have the flexibility to define arbitrary data element bit sizes that can be accepted by a register as the particular application might happen to dictate.
U.S. patent application Ser. No. 09/675779, filed on Sep. 29, 2000, entitled, SYSTEM AND METHOD FOR ENABLING MULTIPLE SIGNED INDEPENDENT DATA ELEMENTS PER REGISTER, which is hereby incorporated by reference, describes a software solution to address the above problems. This solution allows multiple signed independent data elements to be packed in a register. The register is operated on by standard operations with some additional operations in certain cases. The data is then “unpacked” and returned to its 2's complement form. The term “pack” means that the data is possibly dependent on element(s) to the right of it. The described method has however, various opportunities for enhancement.
First, the pack and unpack processes constitute overhead which, for smaller loop sizes, is in some cases unacceptable. Second, some instructions need additional operations to ensure that the data is not modified. Third, the processor has no mechanism by which multiple error flags or condition codes can be set. For instance, if elements “packed” in a register exceed the precision of the space that they have been allocated, there is no error flag or condition code set. The programmer is responsible for ensuring that the data will not overflow its precision by design.
Accordingly, a need exists for a more robust solution to the problems mentioned above.