The present application relates to programmable circuits, and more particularly to I/O circuitry with selectable data reordering for graphics.
A vector processor or array processor is a CPU design that is able to run mathematical operations on multiple data elements simultaneously. A serial vector is a sequence of data held in registers that are processed by the same instruction. For example, a single instruction may cause four registers to be added to another four and the result written to a further four. A parallel vector holds several data items within the same register, each of which has the same instruction applied to it. Vector processing improves code density and allows optimizations that improve performance.
A common problem suffered by vector processors is the need to organize data within the register file such that the same instruction may be applied to a series of registers. Register files generally only allow simultaneous access to a set of values aligned along a particular direction, i.e., along a row of the vector. Accordingly, a single instruction can access multiple values for a horizontal operation, but vertical operation requires either transposing the array being operated or performing separate access operations for each value in a different row. It is common to spend several instructions re-arranging data to make it suitable for vector processing and this overhead may obviate the benefits of using a vector.
In view of these limitations, more efficient architectures and methods for performing transpose and other array manipulations are desired.
Yet another problem arises when a program instruction indirectly accesses a register. Microprocessors control programs' access to register files. Because of pipelining, some instructions must be stalled until the register from which they will read has been written to by another instruction. Scoreboarding stalls these instructions, so the program need not manage stalling. Stall condition is usually applied early in the execution pipeline. However, if a register is to be accessed indirectly by a program instruction, the register may not be known until it is too late—until after the stall condition would normally have already been applied. Without knowing the register at that earlier time, it is difficult to apply stall conditions for instructions that use indirect access.
The inventions disclosed in the present application provide mechanisms to handle indirect register access without additional scoreboarding hardware, and can be further used to build a flexible FIFO access mechanism.
Flexible Register File I/O Architecture
The present application discloses a register file input/output configuration in which a variety of data transpositions are available at minimum power. Power is conserved by avoiding register-to-register data transfers; instead, the sequencer provides executable microinstructions which imply a variety of apparent data formats (as seen by the data channel), without unnecessary physical transfers of data.
Various disclosed embodiments provide new ways for microprocessor register-files to be accessible, in multiple formats in order to reduce the number of program instructions required during byte, word and long word data reformatting. The disclosed innovations, in various embodiments, provide one or more of at least the following advantages:                Variety of data rearrangements;        Minimal power consumption;        Easy accommodation to special data reordering for digital signal processing operations;        Suitability to customized access to data with two-dimensional structure;        Suitability to customized access to data with multidimensional structure.        