The present invention relates to a processor suitable for multimedia processing such as digital animation and three-dimensional graphics and, more particularly, to a processing for implementing processing of a high degree of parallelism with a small code size.
Recently, mainly personal computers and workstations have been increasingly made multimedia compatible. Capabilities mainly required by multimedia include motion picture compression and expansion, voice compression and expansion, three-dimensional graphics processing, and a variety of recognition processing. For voice processing and the like, a DSP (Digital Signal Processor) having performance of several tens of MOPS is conventionally used. However, handling of motion pictures and graphics requires a processor of fairly high performance. For example, motion picture expansion requires performance of about 2 GOPS and its compression requires performance of about 50 GOPS. To satisfy these performance requirements, performance of computing units must be enhanced. Computing unit performance can be enhanced in two approaches; increase of operation frequency and parallel computing.
The former can be achieved comparatively simply but increases the difficulty of packaging design, resulting in increased cost. To implement the performance at a reasonable cost, the latter approach may also be necessary. However, the parallel computing approach presents problems of whether applications are ready for parallelism and that control for effective use of a plurality of computing units is complicated. As for applications, a fairly high parallelism is found as long as multimedia is concerned. For example, 8 computational operations is concurrently executable in motion picture compression.
Approaches for good use of a plurality of computing units include superscalar architecture and VLIW (Very Long Instruction Word). The former is mainly used by general-purpose processors and the scheduling for concurrently executing a plurality of computational operations is performed by these processors. This approach is advantageous in exchangeability of objects with an existing single-processing processor, but at the cost of its extremely complicated hardware because the scheduling is dynamically performed by the processors. On the other hand, VLIW has a problem of compatibility with existing processors but is advantageous in its simplified hardware because no instruction decoder is required.
One of the points of the VLIW hardware simplification is its instruction format. This instruction format is composed of fields for directly controlling computing units, thereby extremely simplifying the control by hardware. A processor having such an instruction format is disclosed in Japanese Non-examined Patent Publication No. Sho 63-98733 xe2x80x9cCOMPUTER CIRCUIT CONTROL METHODxe2x80x9d for example. In this citation, an operation field indicating that a micro instruction for computation is an instruction for computation and a plurality of control bits for controlling a computing circuit are provided, directly controlling each part of the computing circuit by each of these control bits. Thus, VLIW can implement parallel processing by comparatively simple hardware.
As described, superscalar architecture and VLIW provide effective means for enhancing processing parallelism to draw out performance. In order to fully draw out parallelism, the help of a compiler is indispensable. To be specific, a technique such as loop expansion is known. In this technique, a loop body in a program is duplicated (expanded) a plurality of times and the codes in the expanded loop are scheduled. Namely, increasing the number of instructions to be executed between loop return branches increases the possibility of executing a plurality of instructions concurrently.
The above-mentioned technique duplicates a loop, thereby imposing a problem of increasing code size. A larger code size requires a larger memory capacity in which a program is stored, resulting in increased system cost. In the processors sharing a cache memory, increased code size lowers hit rate, thereby lowering system performance.
Increasing processor parallelism increases the number of computing units. This results in increased circuit scale, thereby increasing the number of development steps. In the computer market mainly dominated by personal computers, well-timed introduction of new products on the market is important in terms of business. To satisfy this requirement, it is important to reduce the number of development steps.
It is therefore an object of the present invention to provide a processor having an architecture for minimizing the code size while enhancing the processing parallelism for enhanced performance.
Another object of the present invention is to provide a processor capable of executing many computational operations by a small number of instruction codes.
Still another object of the present invention is to provide a VLIW processor based on static scheduling.
Yet another object of the present invention is to provide a VLIW processor compatible with various applications and enhanced in the operating ratios of the computing units.
A further object of the present invention is to provide a processor suitable for multimedia processing effective for reducing the instruction code amount of a parallel processor that repeatedly executes computational operations of a same type as with multimedia processing.
A still further object of the present invention is to provide a superscalar processor effective for reducing code size.
A yet further object of the present invention is to provide a processor architecture capable of enhancing processing parallelism while minimizing the number of development steps.
In order to solve the above-mentioned first problem, the present invention, as long as multimedia processing is concerned, pays attention to that a plurality of computations of a same type are often executed concurrently and prepares mode information for controlling a plurality of computing devices with a single instruction in the instruction format.
For example, in order to execute a plurality of computations with a single instruction by a plurality of computing devices, in a VLIW processor in which one instruction is constituted by a plurality of fields for controlling the computing devices, mode information for controlling the plurality of computing devices is provided in one field. Further, an instruction expansion circuit for generating a plurality of fields from one field in one instruction is provided and the above-mentioned plurality of computing devices are constituted by arranging a plurality of computing devices having a same function.
In a superscalar processor, mode information for simultaneously controlling a plurality of computing devices is provided in one instruction. In addition, an instruction expansion circuit for generating a plurality of instructions from one instruction is provided and a plurality of computing devices having a same function are arranged such that the plurality of generated instructions can be executed concurrently.
In a processor having three or more computing devices, specification information for specifying the computing devices to be executed concurrently is provided and the above-mentioned instruction expansion circuit is provided with a function for generating the required number of instruction fields for the VLIW processor and generating an instruction for the superscalar processor according to the above-mentioned specification information.
In order to solve the above-mentioned second problem, the present invention provides a plurality of computing units constituted by a computing device for concurrently executing a plurality of computations of a same function, an integer computing device for mainly reading an operand to be supplied to this computing device from a memory, and a register file for storing an operand to be used by the above-mentioned two types of computing devices.
Namely, the present invention is a processor having a memory for storing an instruction code, an instructing code holding means for holding a plurality of instruction codes read from said memory, and a plurality of computing units capable of performing computational operations in parallel according to said plurality of instructions codes held in said instruction code holding means, wherein specification information for instructing execution of computations in a plurality of computing units is provided in the instruction code stored in said memory and an analyzing means is provided for analyzing said specification information to determine a plurality of computing devices specified by the instruction code and input said instruction code into a plurality of specified computing units, thereby controlling a plurality of computations in said plurality of computing units with a single instruction code.
Further, the present invention is a processor having a memory for storing an instruction code, an instructing code holding means for holding a plurality of instruction codes read from said memory, and a plurality of computing units capable of performing computational operations in parallel according to said plurality of instructions codes held in said instruction code holding means wherein specification information for instructing execution of computations in a plurality of computing units is provided in the instruction code stored in said memory and an analyzing means is provided for analyzing said specification information to determine a plurality of computing devices specified by the instruction code and input said instruction code into a plurality of specified computing units, thereby executing, in said plurality of computing units, a computation equivalent to a plurality of instructions with a single instruction code in said plurality of computing units.
Still further, the present invention is a processor having a memory for storing an instruction code, an instructing code holding means for holding a plurality of instruction codes read from said memory, and a plurality of computing units capable of performing computational operations in parallel according to said plurality of instructions codes held in said instruction code holding means, wherein, in addition to an ope code for indicating a computation type and an operand, a field for specifying an execution mode as specification information is provided in the instruction code stored in said memory and an analyzing means is provided for analyzing said field and inputting at least the ope code and the operand of the instruction for which said execution mode is enabled into a plurality of computing units, thereby executing computations of similar type in said plurality of computing units.
Yet further, the present invention is a processor having a memory for storing an instruction code, an instructing code holding means for holding a plurality of instruction codes read from said memory, and a plurality of computing units capable of performing computational operations in parallel according to said plurality of instructions codes held in said instruction code holding means, wherein, in addition to an ope code for indicating a computation type and an operand, a field for specifying an execution mode as specification information and a computing unit specification field for specifying the computing unit are provided in the instruction code stored in said memory and an analyzing means is provided for analyzing said fields and inputting at least the ope code and the operand of the instruction for which said execution mode is enabled into the computing unit specified in the computing unit specification field, thereby executing, in said plurality of computing units, the specified computations of similar type.
Moreover, the present invention is the above-mentioned processor, wherein each of said plurality of computing units has a unique register file. In addition, the present invention is the above-mentioned processor, wherein each of said plurality of computing units has a unique register file and the operand field performs register specification in a register file unique to each of said plurality of computing units to make computation data different from each of said plurality of computing units to another. Besides, the present invention is the above-mentioned processor, wherein each of said plurality of computing units has a register file in common.
Further the present invention is the above-mentioned processor, wherein each of said plurality of computing units has a register file in common, has an operand field for specifying a register number from said register file, and adds an offset value unique to the computing unit to be specified to a value of said operand field, thereby making different registers available and enabling computation by different pieces of data.
Still further, the present invention is a processor having a memory for storing an instruction code, an instruction code holding means for holding the instruction code read from said memory, and a plurality of computing units, wherein said instruction code is constituted by a plurality of fields corresponding to said plurality of computing units, control information for controlling a plurality of computing units and field information for each field to specify the corresponding computing unit are provided in any one field in this instruction code, an analyzing means is provided for analyzing said field information and said control information to identify the computing unit to be controlled by said field and inputting said field into this identified computing unit, and one field in said instruction code controls a plurality of computing units, thereby allowing a short instruction code constituted by the number of fields smaller than the above-mentioned computations to execute a plurality of computations.
Yet further, the present invention is a processor having a memory for storing an instruction code, an instruction code holding means for holding the instruction code read from said memory, and a plurality of computing units, wherein said instruction code is constituted by a plurality of fields corresponding to said plurality of computing units, control information indicating that any one field in this instruction code controls a plurality of computing units and header information indicating the number of fields existing in said instruction code are stored in said memory beforehand, an analyzing means is provided for analyzing said header information and said control information to identify the computing unit to be controlled by said field and inputting said field into the identified computing unit, and one field in said instruction code controls a plurality of computing units, thereby allowing a short instruction code constituted by a small number of fields by use of said header information to execute a plurality of computations.
Moreover, the present invention is a processor having a memory for storing an instruction code, an instruction code holding means for holding the instruction code read from said memory, and a plurality of computing units constituted by at least one computing device controlled by information held in said instruction code holding means and a register file for storing operand information of said computing device, wherein said instruction code is constituted by a plurality fields corresponding to the number of computing units, this one instruction code operates a plurality of computing units, and at least one computing device having a same function is provided in all of said computing units, thereby allowing each of all computing units to execute a same computation.
In addition, the present invention is a processor having a memory for storing an instruction code, an instruction code holding means for holding the instruction code read from said memory, and a plurality of computing units constituted by at least one computing device controlled by information held in said instruction code holding means and a register file for storing operand information of said computing device, wherein said instruction code is constituted by a plurality fields corresponding to the number of computing units, at least one computing device having a same function is provided in all of said computing units and a special register for holding a data type having a bit width too large to specify by a register in said register file is provided in each of said computing units, thereby allowing computational processing of both of a data type having a bit width specifiable by a register in said register file and the data type stored in said special data type.
Besides, the present invention is a processor comprising a memory for storing an instruction code having specification information for indicating execution of a plurality of computing units, an analyzing means for analyzing the specification information in the instruction code stored in said memory to determine a plurality of computing units specified by the instruction code, an instruction code holding means for holding an instruction code for specifying the plurality of computing units determined by said analyzing means, and a plurality of computing units for executing computations in parallel according to the instruction code stored in said instruction code holding means.
Further, the present invention is a processor comprising a memory for storing an instruction code having specification information for indicating execution of a plurality of computing units, an analyzing means for analyzing the specification information in the instruction code stored in said memory to determine a plurality of computing units specified by a single instruction code such that a computation equivalent to a plurality of instructions is executed by said single instruction code, an instruction code holding means for holding the single instruction code for specifying the plurality of computing units determined by said analyzing means, and a plurality of computing units for executing computations in parallel according to the single instruction code held in said instruction code holding means.
Still further, the present invention is the above-mentioned processor, wherein each of said plurality of computing units is constituted to execute computations of different types.
Yet further, the present invention is a processor comprising a memory for storing an ope code for indicating a computation type, an operand, and an instruction code having a field for specifying an execution mode as specification information, an analyzing means for analyzing the field in the instruction code read from said memory and inputting at least the ope code and the operand of an instruction for which said execution mode is enabled into a plurality of computing units, an instruction code holding means for holding at the least the ope code and the operand of the instruction inputted by said analyzing means and for which the execution mode is enabled for the plurality of computing units, and a plurality of computing units for executing computations of a same type in parallel according to at least the ope code and the operand held in said instruction code holding means.
Moreover, the present invention is a processor comprising a memory for storing an ope code indicating a computation type, an operand, and an instruction code having a field for specifying an execution mode as specification information and a computing unit specification field for specifying a computing unit, an analyzing means for analyzing the fields read from said memory and inputting at least the ope code and the operand of the instruction for which said execution mode is enabled into the computing unit specified by said computing unit specification field, an instruction code holding means for holding at least the ope code and the operand of the instruction inputted by said analyzing means and for which the execution mode is enabled for the computing unit specified by said computing unit specification field, and a plurality of computing units for executing computations of a same type according to at least the ope code and the operand held in the instruction code holding means.
In addition, the present invention is the above-mentioned processor, wherein each of said plurality of computing units has a unique register file. Besides, the present invention is the above-mentioned processor, wherein each of said plurality of computing units has a unique register file and the operand field performs register specification in a register file unique to each of said plurality of computing units to make computation data different from each of said plurality of computing units to another. Further, the present invention is the above-mentioned processor, wherein each of said plurality of computing units has a register file in common. Still further, the present invention is the above-mentioned processor, wherein each of said plurality of computing units has a register file in common, has an operand field for specifying a register number from said register file, and adds an offset value unique to the computing unit to be specified to a value of said operand field, thereby making different registers available and enabling computation by different pieces of data.
Yet further, the present invention is a processor comprising a memory for storing an instruction code constituted by a plurality of fields corresponding to the number of computing units and, in any one field of said plurality of fields, having control information for controlling a plurality of computing units and field information by which each field specifies the corresponding computing unit, an analyzing means for analyzing the field information and said control information of the instruction code read from said memory to identify the computing unit to be controlled by said field and inputting said field into the identified computing unit, an instruction code holding means for holding said field by said analyzing means, and a plurality of computing units for executing parallel computations according to the field held in said instruction code holding means, wherein one field in said instruction code controls said plurality of computing units, thereby allowing a short instruction code constituted by the number of fields smaller than the above-mentioned computations to execute a plurality of computations.
Moreover, the present invention is a processor comprising a memory for storing an instruction code constituted by a plurality of fields corresponding to the number of computing units and having control information for indicating that any one field of said plurality of fields controls a plurality of computing units and header information for indicating the number of fields existing in said instruction code, an analyzing means for analyzing said header information and said control information read from said memory to identify the computing unit to be controlled by said field and inputting said field into the identified computing unit, an instruction code holding means for holding said field inputted by said analyzing means, and a plurality of computing units for executing parallel computations according to the field held in said instruction code holding means, wherein one field in said instruction code controls said plurality of computing units, thereby allowing a short instruction code constituted by a small number of fields by use of said header information to execute a plurality of computations.
Besides, the present invention is the above-mentioned processor, wherein said analyzing means has an instruction expansion means for reading a compressed instruction code from said memory and converts the compressed instruction code into a directly executable expanded instruction code.
Further, the present invention is the above-mentioned processor, wherein said analyzing means has an instruction expanding means for reading at least one field of a compressed one instruction code from said memory and converting the field into an expanded instruction code composed of a plurality of directly executable fields. Still further, the present invention is the above-mentioned processor, wherein said analyzing means has an instruction buffer for latching a compressed instruction code from said memory, a field controller for analyzing the header information indicating the number of fields existing in said instruction code, and a selector, which corresponds to each field, sorts fields by including presence and absence of each field based on a select signal of the fields analyzed by said field controller and a signal indicating the presence and absence of each field to form expanded fields. Yet further, the present invention is the above-mentioned processor according to claim 16 or 17 or 22 or 23, wherein said analyzing means a SIMD controller for analyzing execution mode (S mode) and SIMD of each field of said instruction code and selectively determining a copy source field of each field and a selector for copying the copy source field selectively determined by said SIMD controller and inputting the copy into each computing unit.
Moreover, the present invention is a processor comprising a memory for storing an instruction code constituted by a plurality of fields corresponding to the number of computing units to operate a plurality of computing units, an instruction code holding means for holding the instruction code read from said memory, and a plurality of computing units constituted by at least one computing device having a same function controlled by information held in said instruction code holding means and a register file for storing operand information of said computing device, wherein said plurality of computing units execute a same computation.
In addition, the present invention is a processor comprising a memory for storing an instruction code constituted by a plurality of fields corresponding to the number of computing units, an instruction code holding means for holding the instruction code read from said memory, and a plurality of computing units constituted by at least one computing device having a same function to be controlled by information held in said instruction code holding means, a register file for storing operand information of said computing device, and a special register for holding a data type having a bit width too large to specify a register in said register file, wherein said plurality of computing units can execute computational processing of both of a data type having a bit width specifiable by the register in said register file and the data type stored in said special register.
Besides, the present invention is a processor having a memory for storing an instruction code and data, an instruction code holding means for a plurality of instruction codes read from said memory, and a plurality of computing units operating in parallel according to the plurality of instruction codes held in said instruction code holding means, wherein each computing unit is constituted by a plurality of computing devices and a plurality of access port register files, each of said plurality of computing devices reads a content of each of said register files from a corresponding access port for computation, and said plurality of computing units have a same function.
Further, the present invention is a processor having a memory for storing an instruction code and data, an instruction code holding means for holding a plurality of instruction codes read from same memory, and a plurality of computing units operating in parallel according to the plurality of instruction codes held in said instruction code holding means, wherein each computing unit is constituted by a plurality of computing devices and a plurality of access port register files, each of said plurality of computing devices reads a content of each of said register files from a corresponding access port for computation, and said plurality of computing units has a subset of a same function.
Still further, the present invention is the above-mentioned processor, wherein at least one computing device in said computing unit can execute a data transfer instruction for transferring data between said memory and said register file.
According to the present invention, if a VLIW processor has eight computing devices, one instruction is constituted by eight fields. One field has operation information, operand information, and the above-mentioned mode information. If this mode information specifies concurrent computation mode for controlling the plurality of computing devices, the remaining seven fields do not exist in the memory at reading an instruction. Consequently, the instruction expansion circuit copies the operation information and the operand information specified in the above-mentioned one field to generate the remaining seven fields. Thus, one instruction equivalent to eight fields is generated with a code size for one field. Because all computing devices have the same function, a plurality of computation instructions become executable in parallel without problem, resulting in the code size compressed to xe2x85x9. Especially, if computing device specification information is set to the mode information, only the field corresponding to this setting information is generated, so that, if the setting information is provided in three bits, the number of concurrent computations can be controlled in a range of two to eight.
If the above-mentioned superscalar processor has four computing devices, one instruction has operation information, operand information, and the above-mentioned mode information. If this mode information specifies concurrent computation mode, the instruction expansion circuit generates the operation information and the operand information specified in the above-mentioned instruction to generate three instructions. Because all computing devices have the same function, a plurality of computation instructions equivalent to four instructions with a code size for one instruction become executable in parallel, resulting in the code size compressed to xc2xc. Especially, if computing device specification information is set to the mode information, only the instruction corresponding to this setting information is generated, so that, if the setting information is provided in two bits, the number of concurrent computations can be controlled in a range of two to four.
Thus, the present invention can enhance parallelism in concurrent computation processing and reduce the code size to a small extent.
Thus, if the architecture that increases or decreases the processing parallelism on a computing unit basis is employed and a circuit of one computing unit is developed in the development of a processor having two computing units for example, the computing devices for the two computing units can be developed by copying the circuit of this one computing unit. Consequently, the number of development steps of the computing devices for the two computing units becomes generally the same as the number of development steps of the computing devices for one computing unit. If, along with advance in the technology of making chips more microscopic in the future, a highly parallel processing processor such as using four computing units and eight computing units is to be developed, the number of computing device development steps will not increase.
As described before, in multimedia processing, computations of a same type are repetitively executed a plurality of times, so that increasing the processing parallelism surely enhances the performance.
In addition, in one computing unit, the integer computing device can load data to be processed in the next cycle while the multimedia computing device is executing its processing. The loaded data is stored in the register file in the computing unit, the data can be used as an operand to be processed by the multimedia computing device.
Consequently, by employing the software structure in which processing is performed on a computing unit basis, the number of computing units can be adjusted in unit of programs for the computing units. Therefore, if a developed program is migrated to a processor in which the number of computing units has been changed, the number of software development steps involved in a change of the number of computing units can also be decreased.
As described and according to the present invention, not only the number of hardware development steps but also the number of software development steps can be decreased while enhancing the parallelism of processor processing.