Media applications have been driving microprocessor development for more than a decade. In fact, most computing upgrades in recent years have been driven by media applications, predominantly within consumer segments, but also in enterprise segments for entertainment, enhanced education and communication purposes. Nevertheless, future media applications will require even higher computational requirements. As a result, tomorrow's personal computing (PC) experiences will be even richer in audio/visual effects, as well as being easier to use and more importantly, computing will merge with communications.
Accordingly, the display of images, as well as playback of audio and video, have become increasingly popular for current computing devices. Unfortunately, the quantity of data required for these types of applications tends to be very large. As a result, increases in computational power, memory and disk storage, as well as network bandwidth have facilitated the creation and use of larger and higher quality images. Unfortunately, the use of larger and higher quality images often results in a bottleneck between the processor and memory, as well as requiring intensive computational requirements.
One such media application, which is driving microprocessor development, is three-dimensional (3D) graphics. Specifically, 3D graphics applications provide user users of such systems with enhanced displays, which come close to imitating the clarity provided by real life objects. Unfortunately, 3D graphic systems require intensive computational requirements required for translating objects and coordinates between the various coordinate systems. In fact, transforming a point from one coordinate system to another is one of the most common operations in 3D graphics.
To accomplish transformation of a point from one coordinate system to another in one operation, a 3D point is treated as a four-dimensional (4D) vector [x, y, z, w]. Accordingly, the 3D point may be represented as a 4D vector such that the 3D point is now represented by a homogenous coordinate [x/w, y/w, z/w]. Utilizing such representation, transforming or transferring a point from one coordinate system to another is often accomplished by multiplying the 4D vector by a 4×4 matrix. As a result, the 4×4 matrix represents the transformations, such as scaling, rotation and translation between the two coordinate systems.
Accordingly, a typical 3D pipeline transforms an object from the coordinate system it was created in (objects space) to the world coordinate system (world space) and then to the viewer coordinate system (view space). However, it is quite common that a value defined in the world or view space may require conversion back to its originally created object space. As an example, lights are defined in the world space and are often transformed back to the object space in order to perform light intensity calculations. Generally, this conversion back to the object space is performed by the operation of 4×4 matrix inversion.
Unfortunately, the calculation of a matrix inverse is one of the heaviest operations on matrices. The standard way to calculate an inverse of a matrix is by using a method called “Gaussian Elimination”. However, for small matrices, it is usually more efficient to calculate the inverse by scaling the adjoint matrix by the matrix's determinant residue. Accordingly, scaling the adjoint matrix is the most commonly used implementations by conventional 3D graphic systems.
One of the modern techniques to accelerate numerical calculations is to use Single Instruction Multiple Data (SIMD) algorithms, where each operation is taken over a vector of a few data elements. Unfortunately, the calculation of the adjoint matrix is not easily converted into a SIMD algorithm, as each element in the adjoint matrix is a function of nine of the elements of the source matrix (actually, the determinant of a 3×3 sub-matrix). Furthermore, the calculation over those elements is not easily vectorized. Even when the calculation is vectorized, usually there are not enough registers within the architecture to contain all of the intermediate results.
Therefore, there remains a need to overcome one or more of the limitations in the above-described existing.