The present invention relates generally to improvements in array and indirect very long instruction word (iVLIW) processing, and more particularly to an advantageous data address generation architecture for a VLIW processor with separate compute and address register files that makes possible efficient variable length, run-length, and zigzag decoding in a programmable VLIW processor.
A typical register-based processor architecture utilizes a general purpose register file (GPRF) to contain all the arithmetic operands used in performing computations, all computed results, and the various components, such as base, index, modulo values, and the like, used in resolving effective data or instruction addresses. More complex processors, VLIW processors in particular, may contain multiple arithmetic functional units as well as separate load and store units, thus increasing the number of ports required on the GPRF to provide simultaneous access to all the necessary operands. The GPRF grows increasingly difficult and expensive to implement as the number of ports rises, so it may be advantageous to split the GPRF into two or more separate register files and designate that the separate files serve specific purposes such as a compute register file and an address register file.
A complication arises with this approach, though, for high-performance data-dependent memory addressing operations. This problem is that the data dependent values, used for certain types of addressing, are produced in the compute register file separate from the address register and address generation functions. For example, look up table (LUT) operations use a data value as an offset into a table of values stored in memory to transform the data value into the looked-up value. This would seem to require another read port from the compute register file to provide an efficient table look-up operation. Since efficient handling of look up tables (LUTs) is of crucial importance for many applications, an efficient solution to the look up table problem is needed in processors where the compute and address registers are in separate files. A related problem is how to efficiently accomplish sequential variable length code (VLC) decoding and other front-end sequential video compression processing on an indirect VLIW (iVLIW) processor. The present invention when operating on an iVLIW processor advantageously provides a solution to these and other problems.
Table look-up and store operations are used in many digital signal processor (DSP) applications. They typically require an addressing mode such that a xe2x80x9cbasexe2x80x9d register is used to point to the beginning of a table in memory and a data element stored in a separate register provides the offset into the table. The data type to be accessed (byte, half-word, word, double-word, etc.) determines the scaling of the offset as well as the size of the transfer. A data element may then be loaded or stored to or from the table in memory. These operations may be generally represented in the following way:
xe2x80x83Rt←Memory[Ab+Ri]; For table load
Rsxe2x86x92Memory[Ab+Ri]; For table load
Where Rt is a target compute register, Rs is a source compute register, Ab is a base (address) register, and Ri is a compute register which contains a computed value which is used as an offset. The Memory[address] represents, for a load operation, the value stored in memory at the address within the brackets, and Memory[address], for a store operation, represents the location in memory at which the data Rs is to be stored.
In the ManArray iVLIW architecture, the address and compute registers, Ab and Ri respectively, are in separate register files. Further, the array processor executes in pipeline fashion having at least a fetch, decode, and execute cycle to process instructions. An important question then is how to perform an efficient table-lookup or table store operation that uses registers from both files without increasing the number of read/write ports to the compute register file? With minimal programming conventions or restrictions, it is possible to share the compute register file""s store unit""s read port during the decode pipeline stage to allow a data-dependent address calculation to occur. The resultant address can then be used during execute to load from or store to a table in the processor""s local memory. Utilizing a ManArray compute register file that uses two smaller register files, for example two 16xc3x9732-bit files, provides a cycle-by-cycle reconfigurable register file with the capability of doing dual independent table look-ups and table stores.
The ability to efficiently process compressed video data is an important capability that future digital signal processors need to provide. For example, the motion picture expert group MPEG-1 and MPEG-2 standards specify video compression processes that encode a video image into a compressed serial bitstream for efficient storage and transmission. Rather than utilize special purpose hardware logic, which adds to the complexity of a design and cannot be used for any other purposes, general instruction capability is available in the ManArray processor to efficiently process the sequential codes. A number of architectural features are used including bit-operations, table look-up, table store, conditional execution, and iVLIWs. When these sequential routines are translated into assembler code in a typical general purpose processor or DSP, the routine for decoding the non-zero frequency values or AC coefficients becomes branch intensive, representing a time consuming expense for the application. Because of this time consuming sequential processing, typical prior art systems have used hardware assist approaches to implement the VLC decode function. In one aspect of the present invention, the instruction set capabilities of the ManArray processor are used, including iVLIWs, to provide efficient processing of sequential MPEG variable length codes, as discussed in greater detail below.
These and other features, aspects and advantages of the invention will be apparent to those skilled in the art from the following detailed description taken together with the accompanying drawings.