1. Field of the Invention
The present invention relates to a method, an apparatus, and a computer program for generating an SIMD instruction sequence, intended for use with a computer having an extended instruction set called a single instruction multi data (SIMD) instruction for high-speed multimedia processing.
2. Description of the Related Art
In the processing of multi-media data such as images, the same type of operation is typically repeated to data in a fixed format. Some of currently available computers provide an instruction called SIMD instruction to enhance data processing performance and thus perform the same type of operation on a vast amount of data in response to a single instruction.
Data format and instruction set handled by the SIMD are different from computer architecture to computer architecture. Generally, computers process data of 64 bits or 128 bits in bulk in response to a single instruction. A plurality of pieces of data, such as 8-bit, 16-bit, or 32-bit integer type data, or floating-point 32-bit data or 64-bit floating-point data, are packed into this data width, and is then concurrently processed in response to a single instruction. For example, since 16 pieces of 8-bit integer type data are packed into 128 bits, an image processing software program processes data of 16 pixels at a time. The use of the SIMD instruction is particularly effective for high-speed processing of images. Instructions handled by the SIMD include addition, subtraction, multiplication, and division operations, logical operations such as AND gating and OR gating, mask operation, saturate calculation, multiply and accumulation, inner product operation, maximum/minimum value calculation, absolute value calculation, and mean value calculation, etc.
Typical SIMD instruction sets may be MMX technology and streaming SIMD instruction in the Pentium® architecture, 3Dnow! in AMD K6/K7, AltiVec in the PowerPC architecture, MDMX in the MIPS processor, and VIS in the SPARC architecture.
A processor instruction sequence containing an SIMD instruction must be generated to efficiently develop a program intended for use in a computer having an SIMD instruction set. Japanese Unexamined Patent Publication (JP-A) No. 10-228382 discloses a compiler (for SIMD) for generating a target program. An SIMD type loop structure is extracted from a source program in which processes of a plurality of data elements are successively described. The SIMD type loop structure is then converted to the one using an SIMD instruction, and the data elements are processed using an SIMD instruction set The target program is thus produced.
The compiler disclosed in JP-A No. 10-228382 automatically generates a target program intended for use in a computer having an SIMD set from a sequential processing program described using a high-level language such as C language. A technique for analyzing the sequential processing program to automatically extract an SIMD eligible portion from the program is still immature, and is subject to limitations. A sequential processing program containing a complicated process such as an image filtering process is not sufficiently SIMD enabled even if the program is subjected to an SIMD enabling compiler. A target program efficiency utilizing the SIMD instruction set is not generated. The generation of the target program using the SIMD instruction set depends on a manual operation using an assembler language.
The manual operation using the assembler language requires a high degree of skill and is time consuming. It is likely that an error is introduced into the instruction sequence. In a method of manually generating an assembly designation, processors of different types (for example, Pentium® II, III, and 4 of Intel and K-6, K-7 of AMD, etc.) use different SIMD (for example MMX, SSE, 3Dnow, etc.). Each time the target processor is changed, another assembly must be prepared.