The present invention is generally directed to methods by which data is retrieved from tables in the operation of a computer, and more particularly to a vectorized table lookup method which is not restricted to tables of a relatively small size.
Lookup tables are employed in the field of computer programming as a convenient mechanism to handle various types of data. Color lookup tables are one good example of the use of this programming technique. For example, a graphics program may employ 8-bit data to represent colors. As a result, 256 different colors can be selected. Of course, the entire color spectrum comprises significantly more than 256 different colors, and shades of color. Accordingly, a lookup table can be used to associate a specific color, or shade, with each of the 256 different values that can be designated with an 8-bit word. Furthermore, multiple color tables can be set up with different sets of 256 colors, to thereby establish different color palettes that can be selected by the users.
In addition to color palettes, lookup tables are employed for a variety of different purposes, including sound processing, function approximation, and other types of digital signal processing. In many situations, entries are retrieved from lookup tables in a scalar fashion, i.e. one entry is retrieved with each lookup instruction. However, in a computer which has a vector-based processing architecture, it is possible to simultaneously perform a number of lookup operations with a single instruction. In one approach, a standard xe2x80x9cpermutexe2x80x9d instruction is used for this purpose. The permute instruction functions to store values from two operands into a result vector in any desirable order. In its application to table lookups, the two operands comprise two vectors which constitute a table. In an architecture which employs 128-bit registers, for example, the permuted values from the table can be selectively loaded into a register of this size with one instruction, to store 16 bytes of data, which thereby permits 16 table lookup operations to be performed simultaneously.
While the ability to simultaneously perform multiple table lookups with the permute instruction significantly increases processing efficiency, the use of this technique has been limited to tables which contain no more than two registers worth of data. Thus, in the case where the data registers are 128 bits (16 bytes) in length, for example, the maximum table size is 32 byte entries. For larger tables, it is not possible to utilize the permute operation for perform vector execution, and therefore table lookup operations are carried out in the conventional scalar form.
The need to resort to a scalar lookup operation decreases processing efficiency, for a number of reasons. First, each entry to be retrieved from the table requires a separate instruction, and consequently a greater number of processing cycles are necessary to obtain the data. Secondly, scalar operations and vector operations are typically carried out in separate processing units. If it becomes necessary to halt vector processing to perform a scalar lookup operation, the vector processor must store the table index values in a shared memory location, from which they are retrieved by the scalar processor. Similarly, once the scalar processor has obtained the table entries, they must be placed in the memory in order to return them to the vector processor. The need to write data into and read data from a shared memory location consumes additional time that leads to further processing inefficiencies. Hence, once processing begins in the vector domain, it is desirable to remain in that domain for as long as possible, rather than alternate between vector and scalar operations.
Accordingly, it is desirable to provide a method for table lookups in a vectorized manner which is not so limited in the size of the table that can be addressed. Such a method can result in significantly increased processing speed when multiple table lookup operations are involved, thereby avoiding the need to switch to a scalar processor when larger tables are encountered.
In accordance with the invention, this objective is achieved by logically dividing a large table into a number of smaller tables that can be uniquely indexed with a permute instruction. For instance, a 256-byte table can be logically divided into eight 32-byte tables. Each smaller table consists of two data vectors, which constitute the operands for the permute instruction. Only a limited number of bits in the permute instruction vector are required to index into the table during execution, e.g. five bits in the case of a 32-byte table. The remaining bits of each index are used as masks into a series of select instructions. The mask is generated by shifting one of the higher order bits of the index to the most significant position, and then propagating that bit throughout a byte, for example by means of an arithmetic shift. This procedure is carried out for all of the index bytes in the permute instruction vector, to generate a select mask. The select mask is then used during a select operation, to choose between the results of permute instructions on different ones of the logically divided tables.
By means of this approach, unused bits of each permute instruction byte are employed to expand the size of a table that can be addressed with multiple lookup operations simultaneously. For example, procedures which were previously limited to 32-byte tables can be employed in connection with lookup operations on 64-byte, 128-byte and 256-byte entry tables, through use of the three most significant bits in a vector byte.
As a further feature of the invention, the bytes in an index vector are expanded to create multiple consecutive indices, to permit multi-byte entries to be retrieved from tables. By means of this feature, it becomes possible to use the permute instruction to retrieve table entries that have lengths of a full word (4 bytes) or a half-word (2 bytes).
These and other features of the invention, as well as the advantages offered thereby, are explained in detail hereinafter, with reference to exemplary embodiments illustrated in the accompanying drawings.