1. Field of the Invention
The present invention relates to compilers for generating lower-level code from higher-level code to process data in parallel in computer registers.
2. Description of the Related Art
As set forth in the parent application, computer processors function by processing data elements through various registers in accordance with instructions provided by a computer program. The registers have a capacity that is a power of two. For instance, a register might have a capacity of 8 bits, and it would be able to process, in a single processing cycle, a data element having up to eight bits in the element. As an example, an 8-bit register can process a 4-bit data element in a single cycle. Of course, registers typically have sizes larger than 8 bits, i.e., registers can have 16 bit capacities, or 32 bits, or 64 bits, and so on. Non-limiting illustrative examples of the types of operations undertaken by registers include multiplication by a constant, addition, subtraction, shift-left-logical, shift-right-logical, AND, and OR operations.
After the data elements have been processed, they can be sent to another register for further processing, or they can be stored or output. To illustrate, in the printer field a server microprocessor processes an input data stream through its various registers in accordance with a computer program, and it might output a data stream of compressed image data in so-called JPEG format to a printer processor, which then operates on the data as appropriate to instruct a printer apparatus how to print the image.
The processor itself executes instructions in the form of machine language, which are the low-level instructions relating to what data elements are processed through which registers. Most software, however, is written in higher-level programming code such as C, which has the advantages of being human readable and of embodying relatively complex processing operations using comparatively short, quickly-written commands. A compiler receives the high-level programming code, decides the best way among many choices to map it into lower-level language, passes the mapping to an assembler or subsequent compiler which then maps the lower-level language into machine language that is readable by a processor. The higher-level language may be, e.g., C or C++ programming languages with extensions or macros, and the lower-level language may be C with some of the extensions or macros interpreted and removed. Or, the lower-level language may be machine language or assembly language.
From time to time, a programmer may elect to write parts of the program that are executed more frequently than other parts directly in a lower-level language. While more cumbersome to write, these so-called “hand-crafted” portions of code do not have to be translated by a higher level language compiler and, thus, may facilitate faster processing at run time.
Regardless of whether the processor receives the machine code from a compiler or directly from a hand-crafted program, however, the parent application makes the critical observation that it is often the case that register space is wasted. More particularly, as intimated above, a register might not be used to its full capacity in every processing cycle. For instance, when a 16-bit capacity register is used to process 4-bit data elements, 12 bits of the register per cycle are wasted. This slows processing time, creates additional data caching requirements (and attendant cache miss problems), and in general fails to fully exploit processor capacity. Accordingly, the parent application recognizes the potential improvement in processor performance that would inure were multiple data elements to be processed in a register in a single cycle.
The present invention further understands that a compiler can be used to implement the above recognition. This disclosure focusses on such a compiler.