1. Field of the Invention
The invention concerns the management and control of the order of instruction codes for execution in a computing architecture comprised of a plurality of computing functional units, for completing a plurality of jobs in a job queue. Specifically, the invention concerns the efficient and dynamic generation and ordering of very long instruction words (VLIW) in a VLIW computing architecture so as to reduce the amount of idle time for each of the computing functional units.
2. Description of the Related Art
A VLIW computing architecture is typically comprised of multiple computing functional units, wherein each computing functional unit may be a separate arithmetic logic unit (ALU) or may be a portion of one CPU which has been divided into separate functional units. A VLIW system operates like a multi-processor system except that each functional unit shares a common program counter (PC). Such an architecture allows for the execution of multiple ALU-level instructions simultaneously, in parallel, during each machine cycle. In this manner, the overall processing speed of a VLIW computing architecture can be increased over that of a single, undivided CPU architecture. The multiple instructions are organized for distribution to the multiple computing functional units in sequence very long instruction words (VLIW).
The common ways in which a VLIW system is generally used include letting a code compiler make one composite program in which the program steps are executed in parallel across the multiple functional units of the VLIW, or to write specific programs which are tailored to run in parallel. The foregoing methodologies are very effective when it is possible to predict in advance the job balances among the multiple functional units. In such a situation, the predicted job balances can be used to write program codes which are tailored to the predicted job balances. Also, in such a situation, the predicted job balances can be used to notify compilers to optimize the program codes to achieve the job balances. However, in some situations, job balances cannot be predicted in advance at the time the program codes are written. The invention of the subject application addresses such a problem. In particular, the invention improves upon these two methods by utilizing many predetermined job combinations of program codes, and then having the task manager, or the real-time operating system (RTOS), as the case may be, choose and assign the best job combination to achieve the desired execution of programs in parallel in the VLIW.
For example, if the VLIW computing architecture includes four computing functional units, then each VLIW will include four separate sub-instructions, one for each computing functional units. In this manner, each VLIW is formatted into a plurality of instruction fields (four in the above example) each of which contains a sub-instruction for execution by a respective computing functional unit. A sequence of VLIWs therefore can represent a plurality of computing jobs for execution by the computing functional units, wherein each particular instruction field of the VLIWs contains a sequential instruction of one of the computing jobs for execution by a respective computing functional unit. Accordingly, each of the computing functional units in the VLIW architecture sequentially executes a respective computing job, wherein each computing job is comprised of a sequence of instructions. Each computing job is typically a portion, or page, of a single computer program.
Generally, operating systems (OS), real-time operating systems (RTOS), and task monitors include a task manager that generates a sequence of instructions which, when executed, carries out a series of computing jobs. The task manager obtains multiple computing jobs from a job queue and then creates a sequence of VLIW instructions wherein each instruction field of the VLIW instructions contains an instruction from one of the computing jobs. A program counter is used to keep track of the address location of the last computing job (program page) to be assigned to one of the computing functional units through incorporation into the sequence of VLIW instructions. In this manner, the task manager can check the address location stored in the program counter to determine which of the remaining computing jobs in the job queue should be incorporated into the next sequence of VLIW instructions, thereby allowing the computing jobs of the computer program to be executed in proper sequential order.
The above-described method for managing VLIW instructions for execution in a VLIW architecture can result in the inefficient use of the multiple computing functional units because VLIW systems share a common program counter. For instance, the multiple jobs represented in the sequence of VLIW instructions may be of different job code sizes, wherein one or more of the jobs will be completed before completion of the other jobs. Multiple jobs are of different sizes when they contain different numbers of sequential instructions. In such a case, the computing functional unit to which the completed job was assigned will be unused while the remaining jobs are completed. This is because, in each of the remaining VLIWs, the instruction field corresponding to the unused computing functional unit contains a no-operation instruction, while the other instruction fields corresponding to the other computing functional units contain operation instructions corresponding to the remainder of the uncompleted jobs. This results in the inefficient use of one or more of the computing functional units.
When one of the computing functional units is in a free, unused state, it is desirable to assign a new job to the unused computing functional unit in order to more efficiently use the resources of the VLIW architecture. However, when only one program counter is used to keep track of the last assigned instruction, it is not feasible to assign a new job to the unused computing functional unit before the other assigned jobs are completed because each functional unit (ALU) shares a common program counter. This would cause the task manager to load the next computing job out of sequence, instead of based on the next proper address location succeeding the last assigned job. Accordingly, the task manager must wait until all assigned jobs are completed by the execution of the current VLIW sequence until a new job can be assigned to the unused computing functional unit. The unused number of machine cycles in which the one or more computing functional units go unused results in overall inefficiency in the VLIW computing architecture.