1. Field of the Invention
The present invention relates to a processor that performs processing according to instruction sequences that are stored in a ROM or the like.
2. Background of the Invention
In recent years, there has been a visible increase in the use of application software that can interactively reproduce various kinds of data, such as video data, still image data, and audio data, that have been compressed according to techniques such as frame encoding, field encoding, or motion compensation. As such software has been developed, there has been increasing demand for multimedia-oriented processors that can efficiently execute the software. These multimedia-oriented processors are processors designed with a special architecture to facilitate programming, such as the compression and decompression of video and audio data. The high-speed processing required for handling video data is the matrix multiplication of compressed data that has N*N matrix elements with coefficient data that also has N*N matrix elements. Representative examples of compressed data that has N*N matrix elements are the luminescence block composed of 16*16 luminescence elements, the blue color difference block (Cb block) composed of 8*8 color difference elements, and the red color difference block (Cr block) composed of 8*8 color difference elements used in MPEG (Moving Pictures Experts Group) techniques. The matrix multiplication for compressed data referred to here is performed very frequently when executing the approximation calculations for an inverse DCT (Discrete Cosine Transform) in image compression methods such as MPEG and JPEG (Joint Photographic Experts Group).
The following is a description of conventional multimedia-oriented processors that can perform high-speed matrix multiplication. The basic architecture of conventional multimedia-oriented processors is provided with a sum-product result register (hereinafter simply referred to as an MCR register) as hardware, and is provided with an instruction set that includes a “MOV MCR,**” transfer instruction for transferring a sum-product value.
An example of the hardware construction of a conventional multimedia-oriented processor is shown in FIG. 1. As shown in FIG. 1, the arithmetic logic unit (hereinafter, “ALU”) 61 performs the multiplication of an element Fij that forms part of the compressed data and an element Gji that forms part of the coefficient matrix in accordance with a multiplication instruction. The ALU 61 also reads the sum-product value stored in the sum-product result register 62, adds the multiplication result of Gji*Fij to the read sum-product value, and has the result of this addition stored in the sum-product result register 62. By repeating the above calculation, a sum-product value is accumulated in the sum-product result register 62. Once the multiplication has been performed a predetermined number of times, the programmer issues a sum-product value transfer instruction. By issuing a transfer instruction, the accumulated value in the sum-product result register 62 is transferred to the general registers, and is used as the matrix multiplication result for one row and one column. By performing N*N iterations of the above processing, the matrix multiplication of N*N compressed data and an N*N coefficient matrix can be completed.
When a conventional multimedia-oriented processor is used, however, positive correction saturation operations for amending the sum-product value pose many difficulties for programmers.
Positive conversion processing refers to the conversion of a sum-product value that is a negative value into either zero or a positive value. Normally, compressed data is expressed as a coded relative value that reflects the relation of the present value to the preceding and succeeding values. As a result, there are many cases when the sum of products for each element in the compressed data and the corresponding coefficients is a negative value. Most reproduction-related hardware, such as displays and speakers, however is only able to process uncoded data, so that when the sum-product values are to be reproduced, it is first necessary to perform positive conversion processing.
Saturation calculation processing refers to processing that sets all values that exceed a given range (or, in other words, which are “saturated”) at a predetermined value. This is to say, when an element that includes an erroneous bit generated during transfer is used in a sum-product calculation as part of the sum-product processing for compressed data, there is an increase in the probability of the sum-product value exceeding a value that can be expressed by the stated number of bits. Since most reproduction-related hardware is only physically capable of reproducing uncoded data with a fixed valid number of bits, such as eight bits, saturation processing is required to convert the sum-product value into a value that can be expressed using the valid number of bits.
It has been conventional practice to perform this kind of positive value conversion processing and saturation calculation processing by converting the-sum-product value using a subroutine that corrects the sum-product value. An example of a subroutine that corrects the sum-product value is explained below. In this example, the register width and the calculation width of the calculation unit are 32 bits, with the width of the MCR being 32 bits, and the sum-product value being expressed as a coded 16-bit integer. The data that can be handled by the reproduction-related hardware needs to be expressed using uncoded 8-bit integers. This subroutine is set as using the data register D0 for storing the calculation result. Each instruction is expressed using two operands, with the left and right operands being respectively called the first and the second operands. The second operand is used both to indicate the transfer address of a transfer instruction and the storage address of an arithmetical instruction.                Instruction 1: MOV MCR,D0        Instruction 2: CMP 0XFFFF—8000,D0        Instruction 3: BCC CARRY        Instruction 4: MOV 0x0000—00000,D0        Instruction 5: BRA END        CARRY:        Instruction 6: CMP 0x0000—00FF,D0        Instruction 7: BCS END        Instruction 8: MOV 0x0000—00FF,D0        END: (end of positive conversion saturation calculation processing)        
Describing the above instructions in order, Instruction 1, “MOV MCR,D0”, transfers the stored value of the MCR register into the data register D0. Instruction 2, “CMP 0xFFFF—8000,D0”, compares the value in the data register with the immediate “0xFFFF—8000”, where “0x” shows that the value is given in hexadecimal. This comparison is performed by subtracting the immediate “0xFFFF—8000” given in the first operand from the stored value of the data register D0 given in the second operand.
The sixteenth bit of the immediate “0xFFFF—8000” in Instruction 2 is the code bit used for a 16-bit coded integer, so that when the stored value of the data register D0 is greater that the immediate “0xFFFF—8000”, this shows that the value stored in the MCR is a negative number.
On the other hand, when the stored value of the D0 register is less than “0xFFFF—8000”, this shows that the value stored by the MCR is a positive number. If this number is a positive number, a carry is performed and the carry flag in the flag register is set.
The letter “B” in the “BCC” in Instruction 3 stands for “Branch”, while the letters “CC” stand for “Carry Clear”.
When the comparison in Instruction 2 finds that the stored value of the register D0 is less than the immediate “0xFFFF—8000”, a branch is performed to Instruction 6 which has the label “CARRY”. Conversely, when the comparison in Instruction 2 finds that the stored value of the register D0 is greater than the immediate “0xFFFF—8000”, Instruction 4, “MOV 0x0000—0000,D0” transfers the value zero into the register D0, amending the sum-product value to zero. After this amendment, the unconditional branch “BRA END” in Instruction 5 is performed to transfer the processing to the “END” label, thereby completing the positive conversion processing.
The processing described above is performed when the stored value of the register D0 is negative. The following is a description of the processing performed when the stored value of the register D0 is greater than the immediate “0xFFFF—8000”. In such a case, Instruction 6, “CMP 0x0000—00FF,D0” compares the stored value of the register D0 with the immediate “0x0000—00FF”. This comparison is performed by subtracting the immediate “0x0000—00FF” given in the first operand from the stored value of the data register D0 given in the second operand. When the stored value of the D0 register is smaller than the immediate “0x0000—00FF”, a carry is performed and the carry flag in the flag register is set.
The letters “CS” in Instruction 7, “BCS END”, stand for “Carry Set”, so that when the carry flag is set, a branch is performed to the label “END” from Instruction 7.
When the carry flag is not set, no branch is performed in Instruction 7 and processing advances to Instruction 8, “MOV 0x0000—00FF,D0”, where the immediate “0x0000—00FF” is transferred into the register D0 to amend the calculation result to “0x0000—00FF”, thereby completing the saturation calculation processing.
The problem with the sum-product value amendment process described above lies in the considerable increase in code size caused by the insertion of the above eight instructions for one amendment of a sum-product value. When the program is written into a ROM to embed the software into the information processing apparatus, the required amount of installed ROM will have to need to be increased by an amount equal to this increase in code size, leading to an increase in manufacturing cost. A large number of manufacturers of domestic appliances such as digital video players, electronic notebooks, and word processors seek to improve on their rivals' products by using their own decompression processing programs, although the installation of such decompression processing programs presently has the drawback of increasing costs by increasing the required amount of ROM, making such installation problematic.
There is also the problem that since eight instructions need to be executed to correct one sum-product value, there is a large increase in processing time. When, as shown in FIG. 2, an approximation calculation for an inverse DCT is performed by multiplying compressed data Fij (where i,j=1,2,3,4,5 . . . 8) composed of 8*8 elements with a coefficient matrix Gji (where i,j=1,2,3,4,5 . . . 8) also composed of 8*8 elements to produce the multiplication result matrix Hij (where i,j=1,2,3,4,5 . . . 8), the calculation of the matrix multiplication result element H21 requires the sum-product processing of the multiplication results of one column of compressed data elements F11, F21, F31, F41, F51, F61, F71, F81 by one row of coefficient data elements G11, G12, G13, G14, G15, G16, G17, G18. The result is then subjected to positive conversion saturation calculation processing. Following this, the calculation of the matrix multiplication result element H12 requires the sum-product processing of the multiplication results of the column of compressed data elements F12, F22, F32, F42, F52, F62, F72, F82 by one row of coefficient data elements G11, G12, G13, G14, G15, G16, G17, G18, with the sum-product result then being subjected to positive conversion saturation calculation processing.
The same sum-product processing and positive conversion saturation calculation processing is required to obtain the other matrix multiplication result elements H21, H31, H41, H51, H61, H71, H81, . . . , and since there are 64 elements in the coefficient matrix Gij (where i,j=1,2,3,4,5 . . . 8), the sum-product value amending subroutine for positive conversion saturation calculation processing needs to be performed 64 times. This sum-product value amending subroutine includes branch instructions (as Instructions 3, 5, and 7), so that when this sum-product value amending subroutine is executed, branches will occur regardless of whether negative values or saturation occur, so that the 64 iterations of the subroutine will not be performed smoothly. When attempts are made to improve the processing speed of the sum-product operation by introducing pipeline processing to the processor, the execution of the stated three branch instructions will result in a noticeable drop in processing efficiency.
In order to increase the speed of the matrix multiplication, it is possible to install a specialized circuit for performing matrix multiplication. However, if all of the matrix multiplications are performed by a specialized circuit, there would be a vast increase in hardware, and the processor characteristic known as versatility, whereby the processor executes a variety of processes in accordance with the program written by the programmer, is lost. If the versatility of the processor is lost, there is the risk that the processor will not be able to respond to programmers' wishes, and so will not, for example, be able to execute an original decompression processing program.