1. Field of the Invention
The present invention relates to an information processor used to process data, such as a general-purpose processor, central processing unit (CPU), media processor, digital signal processor (DSP) or the like.
2. Description of the Related Art
For use with the multimedia having been spreading, processors to process digital data such as CPU, DSP, etc. have been proposed which have to effect a digital-filter operation more frequently. Since the digital-filter operation is an inner-product operation, it is effected using the following arithmetic expression:                               ∑                      i            =            0                    n                ⁢                  xe2x80x83                ⁢                  Ci          xc3x97          Xi                                    (        1        )            
For an effective inner-product operation, the recent CPU, DSP, etc. incorporate a multiply and accumulate (MAC) unit. The construction of a CPU incorporating an MAC unit is shown in FIG. 1.
As shown in FIG. 1, the CPU is generally indicated with a reference 100. The CPU 100 includes a register file 1001 to store a plurality of data, a MAC unit 102 to effect an inner-product operation of the data, shift (SHIFT) unit 103 to shift the data to the right and left, and an arithmetic logic (ALU) unit 104 to effect arithmetic and logical operations of the data. For an inner-product operation by the CPU 100, the data stored in the register file 101 are multiplied and accumulated by the MAC unit 102, and the result of the multiplication and accumulation is stored again into the register file 101. Then, the data stored in the register file 101 is repeatedly multiplied and accumulated by the MAC unit 102 to provide the result of the inner-product operation.
The recent processor used in a work station, personal computer or the like is designed to effect a single-instruction multiple data stream (SIMD) type operation in units of a sub-word for a higher speed of the image processing and sound processing. In the SIMD type operation, a word-long data (one word is 32 or 64 bits long) stored in the register file is divided into a plurality of data each of a predetermined number of bits for arithmetic operation. Each of the data resulted from the division of a word-long data is called xe2x80x9csub-wordxe2x80x9d.
The digital-filter operation, that is, inner-product operation, can be done faster by the combination of a division of a data into sub-words with an inner-product operational unit which effects the SIMD type operation. The digital-filter operation is used for image processing and sound processing among others. It is continuously effected on a series of data in many cases. Thus, to effect a digital-filter operation by the SIMD type operation, a source data to be calculated and a coefficient data by which the source data is multiplied are stored in units of a sub-word into an input register of the inner-product operational unit.
A typical inner-product operation of the SIMD type will be explained below with reference to FIG. 2. The input register of the inner-product operational unit is supplied with a 64-bit source data and 64-bit coefficient data, for example, in units of a 16-bit sub-word, respectively. The source data consisting of four 16-bit sub-words X0, X1, X2 and X3 counted from the least significant bit (LSB) is stored into a first input register 111. The coefficient data consisting of four 16-bit sub-words C0, C1, C2 and C3 counted from the most significant bit (MSB) is stored into a second input register 112. The inner-product operational unit multiplies and accumulates the source data consisting of the four 16-bit sub-words and coefficient data correspondingly consisting of four 16-bit sub-words, on a multiply and accumulate (MAC) instruction (pmaddwd), and stores the result of the multiplication and accumulation (product-sum) into a first intermediate register 113. X2xc3x97C2+X3xc3x97C3 is stored as the result of the multiplication and accumulation at the higher 32 bits (two sub-words) in the first intermediate register 113 while X0xc3x97C0+X1xc3x97C1 is stored as the result of the multiplication and accumulation at the lower 32 bits (two sub-words) in the first intermediate register 113, as shown in FIG. 2. Next, on a data-transfer instruction (movq), the inner-product operational unit copies the content of the first intermediate register 113 to a second intermediate register 114. Then, on a shift instruction (psrlq), the inner-product operational unit shifts to the right the data in the first intermediate register 113 by one sub-word, that is, by 32 bits (namely, shifts the data from the higher place to the lower place). Further, on an add instruction (paddd), the inner-product operational unit adds the higher 32 bits and lower 32 bits in the first and second intermediate registers 113 and 114, and stores the result of the addition at the higher 32 bits and lower 32 bits, respectively, in an output register 115.
As the result of the arithmetic operation, X0xc3x97C0+X1xc3x97C1+X2xc3x97C2+X3xc3x97C3, the result of the inner-production operation by the SIMD type operation, is stored at the lower 32 bits in the output register 115. Note that the data stored at the higher 32 bits in the output register 115 are independent of the inner-product operation.
The processor used in the work station, personal computer, etc., has to frequently effect a continuous digital-filter operation of a source data such as a series of images, sounds, etc. In this case, for such a continuous digital-filter operation, there are provided a plurality of input registers having stored therein coefficient data shifted by a sub-word from each other, and a source-data input register. The coefficient data whose bit positions have been shifted are read from each of the coefficient-data input registers each time an inner-product operation instruction is issued, and a source data whose bit positions are fixed is multiplied by a coefficient data of which the bit positions have been shifted, thereby permitting the digital-filter operation to be done at a high speed. Also, there are provided a coefficient-data input register and a source-data input data constructed as a shift register capable of storing a two-word data, a source data of which the bit positions have been shifted each by one sub-word is read each time an inner-product operation instruction is issued, and a coefficient data of which the bit positions are fixed is multiplied by a source data of which the bit positions have been shifted, thereby permitting the digital-filter operation to be done at a high speed.
The inner-product operation has been described in the foregoing. The SIMD type operation can be done for the arithmetic and logical operations by the common ALU such as addition, subtraction, etc. as well.
However, the above-mentioned arithmetic operation is disadvantageous as will be described below:
For example, when a series of arithmetic operations is done, the results of the operations are stored in a plurality of intermediate registers and output register. That is, many registers are required for this data storage.
Also, even with an arithmetic operation done by the SIMD type one, the result of the operation will be given in units of a word, not in units of a sub-word in which the data has been stored into the input register. Thus, when the SIMD type operation is continuously done, a word-long source data has to be re-formed into sub-words by shifting the bit positions of the output data and packing the data, which will lead to an increased number of cycles of operation. In addition, the number of program codes will be increased and the program memory will be increased in size.
It is therefore an object of the present invention to overcome the above-mentioned drawbacks of the prior art by providing an information processor in which the result of an arithmetic operation can be provided as sub-words each having an arbitrary data length and thus the operation can be completed with a reduced number of execution cycles.
According to the present invention, there is provided an information processor including:
an arithmetic circuit to provide a result of arithmetic operation in units of a word length;
an intermediate register to store the result of arithmetic operation supplied from the arithmetic circuit;
a shifting circuit to shift the data stored in the intermediate register by an arbitrary number of bits;
a clipping circuit to clip the data shifted by the shifting circuit to an arbitrary bit length; and
an output register to store as a sub-word the data clipped by the clipping circuit and sequentially shift the existing data therein by one sub-word from the higher to lower bits each time a data is entered for storage as a sub-word.
In the above information processor, the result of arithmetic operation, provided in units of the word length from the inner word unit, is shifted in bit position and then clipped, and the result of the shift and clipping is stored into the output register as a sub-word. Namely, the result of the arithimetic operation effected in the arithmetic circuit is not written once into any external buffer such as a register file or the like, but is directly shifted in bit position and then clipped.
According to the present invention, there is also provided an information processor including:
an input register to store a source data divided in sub-words;
a coefficient register to store a coefficient data divided in sub-words;
an inner-product operational unit to effect an inner-product operation of the source data stored in the input register and coefficient data stored in the coefficient register, in units of a sub-word, and provide the result of the operation in units of a word length;
an intermediate register to store the result of the operation effected in the inner-product operational unit;
a shifting unit to shift the data stored in the intermediate register by an arbitrary number of bits;
a clipping circuit to clip the data of which the bit positions have been shifted by the shifting circuit to an arbitrary bit length; and
an output register to store as a sub-word the data clipped by the clipping circuit, and sequentially shift the existing data therein by one sub-word from the higher to lower bits each time a data is entered for storage as a sub-word.
In the above information processor, the result of inner-product operation, provided in units of the word length from the inner-product operational unit, is shifted in bit position and then clipped, and the result of the shift and clipping is stored into the output register as a sub-word. Namely, the result of the arithmetic operation effected in the inner-product operational unit is not written once into any external buffer such as a register file or the like, but is directly shifted in bit position and clipped.
According to the present invention, there is also provided an information processor including:
an arithmetic circuit to provide the result of an arithmetic operation in units of a word length; and
an output register to store the result of the arithmetic operation effected in the arithmetic circuit as a sub-word and sequentially shift the existing data therein from the higher to lower bits in units of a sub-word each time a data is entered for storage as a sub-word.
According to the present invention, there is also provided an information processor including:
an input register to store a source data divided in sub-words;
a coefficient register to store a coefficient data divided in sub-words;
an inner-product operational unit to effect an SIMD type inner-product operation, in units of a sub-word, of the source data stored in the input register and coefficient data stored in the coefficient register, and provide the result of the SIMD type inner-product operation in units of a word length; and
an output register to store as a sub-word the result of the SIMD type inner-product operation effected in the inner-product operational unit and sequentially shift the existing data therein from the higher to lower bits in units of a sub-word each time a data is entered for storage as a sub-word.