1. Field of the Invention
The present invention generally relates to data format conversion and more particularly to a system and method for data reordering in vector processing in order to support the conversion of sequential (vertical) vector component flow into parallel (full vector or horizontal) vector component flow.
2. Description of the Related Art
Graphics data can be represented in vector format with components of geometry information (i.e., X, Y, Z, and W) or pixel value information (i.e., R, G, B, A). A geometry engine processes the components of the vector. FIG. 1 illustrates how a typical graphics engine processes graphics vectors. A graphics vector 10 is inputted into an input buffer 12 which stores the graphics vectors in regular memory. The graphics vector has components Xi, Yi, Zi, and Wi. The input buffer 12 outputs the graphics vector to a vector arithmetic logic unit (ALU) 14 which performs functions on the graphics vector 10. The vector ALU 14 outputs a processed graphics vector 18, which is in the same format as the input graphics vector 10. Specifically, the processed graphics vector 18 contains the Xout, Yout, Zout, and Wout components. In this regard, the vector ALU 14 processes the vector components in time parallel (full vector or horizontal) vector component flow. Each of the components X, Y, Z, and W is processed at the same time by the vector ALU 14 such that the output contains each component Xout, Yout, Zout and Wout in a common format.
Recently, scalar graphics processors have been developed which process the graphics vector in a vertical vector component flow. FIG. 2 shows a SIMD (Single Instruction, Multiple Data) processing unit using scalar ALU's for processing graphics vectors. The graphics vector 10 is inputted into a input buffer 20 which is a 4-bank orthogonal access memory, as is commonly known in the art. The input buffer 20 is operable to rearrange each of the graphics vectors 10 into common components. Specifically, the output of the input buffer 20 will be a vector containing the values of common components in a vertical vector format. Referring to FIG. 2, the input data buffer 20 outputs a component vector 22 which contains common or like components. For instance, the component vector 22 may contain the values of only the X component or only the Y component.
The input data buffer 20 outputs the component vector 22 in a time-sequential (vertical) vector component flow to a scalar processor 24 which operates on each of the components of the component vector 22 individually. The scalar processor 24 contains four scalar ALU's 26a-26d and is described in greater detail in applicant's co-pending U.S. Patent Application “SIMD PROCESSOR WITH SCALAR ALUS CAPABLE OF PROCESSING GRAPHICS VECTOR DATA”, Ser. No. 10/354,795, filed Jan. 29, 2003, the contents of which are incorporated by reference herein.
The scalar processor 24 outputs a scalar results vector 30 that contains the results of the computed vector components. However, the scalar results vector 30 is not in the same format as graphics vector 10. Specifically, the scalar results vector 30 is in a vertical (time-serial) format because the scalar processor 24 operates in a sequential (vertical) vector component flow. Therefore, the scalar results vector 30 needs to be converted into a time-parallel (full vector or horizontal) format.