One aspect relates to a computer system for electronic data processing having a first data processing unit and a second data processing unit.
In addition to a central data processing unit in the form of a microprocessor, modern computer system frequently have a further data processing unit which is usually referred to as a coprocessor.
In contrast to the central data processing unit, which will be referred to below as a standard processor, a coprocessor is typically specialized for specific computational tasks.
Owing to its specialization for specific tasks, the coprocessor is typically able to execute certain computer program instructions faster than the standard processor.
What is understood here by faster execution of an instruction is that the instruction is executed by the coprocessor within fewer standard processor clock cycles than are required for its execution if the standard processor executes the instruction itself.
For example, many modern personal computers have a graphics card with a separate graphics coprocessor. Owing to its specialization for graphics calculations, said graphics coprocessor is able to execute computationally intensive graphics calculations, for example calculations of light effects in a 3D landscape, much faster than the standard processor of the personal computer.
A coprocessor specialized for specific tasks can thus be suitable for relieving the load on the standard processor where applications have tasks for which the coprocessor is specialized.
The specialization of coprocessors results in less flexibility in comparison with the standard processor. Coprocessors are typically not capable of autonomously executing complete computer programs, but are supplied with instructions and with data required for executing said instructions by a standard processor. This means that the standard processor typically transfers to the coprocessor a data record containing a specification of the instruction to be executed itself plus a specification of the data required for executing the instruction.
For example, a standard processor transfers to an associated coprocessor a data record containing a bit code that specifies the instruction “Add two data elements”, which data record additionally contains two memory addresses that address two memory cells of a computer data memory in which the data elements to be added are stored.
Said specification of the instruction and the specification of the data required for executing the instruction are referred to below as the instruction parameters of the instruction or as the (instruction) parameters required for an instruction.
The number of parameters required for different instructions varies. Likewise, the memory requirement for the parameters required for different instructions varies.
Many computer systems have for example a floating-point processor, that is to say a coprocessor which is specialized for performing floating-point operations. Such a coprocessor of a computer system is supplied by the standard processor of the computer system with instructions for which few instruction parameters are required.
The reason for this is that the data processed during the operations associated with said instructions contains only individual floating-point values or small vectors with few (for example, four) floating-point components. Consequently only a small amount of data is processed in the case of such instructions, which is why only a few parameters are required to specify this data. Owing to the low number of parameters required, the memory requirement for the parameters is also low.
As floating-point instructions can typically be specified using a few bits, likewise only few parameters with a low memory requirement are required to specify an instruction itself with which the floating-point processor is supplied.
This case, in which the parameters which have a low memory requirement for the instructions that are supplied to a coprocessor by a standard processor, for example because of the low number of parameters, will be referred to below as a tight coupling of the standard processor and the coprocessor.
With such a tight coupling, the coprocessor typically requires only few standard processor clock cycles for executing an instruction supplied to it by the standard processor.
In the case of a so-called loose coupling of a standard processor and a coprocessor, in each case a larger number of parameters with a higher memory requirement are required for the instructions that the standard processor supplies to the coprocessor than in the case of tight coupling of the two processors.
Loosely coupled processors as defined here process more complex tasks than tightly coupled processors, for the processing of which tasks the loosely coupled processors typically require a large number of standard processor clock cycles.
For example, graphics coprocessors execute instructions for which a large number of parameters are required. Up to 30 parameters may be required to specify the corners of a 3D object to be represented, the texture or the lighting of the 3D object for example. A long period of time, that is to say many standard processor clock cycles, is required for executing complex graphics instructions. During the period of time in which the coprocessor is executing a graphics instruction, the standard processor can execute other instructions.
In order for the coprocessor to be able to execute an instruction, the standard processor must transfer to the coprocessor the parameters required for the instruction which specify the instruction itself plus the data required for executing the instruction.
In the case of a loose coupling of standard processor and coprocessor, owing to the high memory requirement for the parameters to be transferred, this communication can entail a lengthy time requirement, that is to say a large number of standard processor clock cycles in which the standard processor is occupied with communication.
The data processing of a standard processor and of an associated coprocessor, that is to say of a coprocessor which the standard processor supplies with instructions, is typically asynchronous. This means that the coprocessor does not immediately commence executing an instruction transferred by the standard processor as soon as the data required is transmitted. For example, the standard processor can transfer the instruction parameters required for an instruction to the coprocessor even while the latter is still executing another instruction.
This has the advantage, for example, that the standard processor does not have to wait until the coprocessor is ready for the transmission, but can transmit the parameters required for an instruction and subsequently immediately execute further instructions.
Owing to the asynchronous cooperation of standard processor and coprocessor, memories are required for the data transmission between standard processor and coprocessor, that is to say for the transfer of parameters required for instructions, since said data must be stored if it is not immediately processed by the coprocessor.
For the data transmission between a standard processor and a coprocessor, it is known to use a memory in which the standard processor can store data and from which the coprocessor can read data.
With this arrangement, the standard processor stores the parameters required for an instruction to be executed in the memory. The specification of the instruction itself and the specification of the data required for executing the instruction can be performed separately here, for example the coprocessor may have a special register and the standard processor stores the parameters that specify the data required for executing the instruction in the memory and requests the coprocessor to execute the instruction by storing the parameters that specify the instruction in the special register of the coprocessor.
Alternatively, the standard processor may store all instruction parameters required in the memory.
The coprocessor executes the instruction by accessing the parameters stored in the memory or additionally in the special register.
With this arrangement, the standard processor must wait with storing the instruction parameters until the coprocessor no longer requires the instruction parameters previously stored in the memory. Otherwise the standard processor overwrites instruction parameters that are still required, which can lead to incorrect execution of one of the instructions executed by the coprocessor.
Since the coprocessor typically no longer requires the instruction parameters only once it has executed the respective instruction, the standard processor must wait with the transmission of instruction parameters required for an instruction until the coprocessor is not currently executing an instruction, that is to say in particular until the coprocessor has completed execution of the instruction preceding the instruction for which instruction parameters are to be transmitted.
Since in this case the standard processor cannot transfer any data to the coprocessor when the latter is executing an instruction, the end effect for processing an instruction is that the total of the time required for transferring the instruction parameters and the time required for the actual execution of the instruction by the coprocessor is required, since the coprocessor must initially wait for the transfer of the instruction parameters required for an instruction, cannot execute any other instruction during this time, and subsequently must execute the instruction.
In the prior art this disadvantage, as a result of which a significant advantage of the cooperation of standard processor and coprocessor is lost, is countered by the use of an alternate buffer or a first-in-first-out (FIFO) memory.
An alternate buffer has two memory regions. The standard processor of a computer system writes instruction parameters for example into the first memory region of an alternate buffer. Once storing the instruction parameters has been completed, the coprocessor of the computer system can read out the instruction parameters from the first memory region and execute the respective instruction.
The standard processor does not need to wait until the coprocessor has completed execution of said first instruction, but can meanwhile store the instruction parameters required for a second instruction in the second memory region. Once the standard processor has completed storing the instruction parameters required for the second instruction and the coprocessor has completed executing the first instruction, by accessing the second memory region of the alternate buffer, the coprocessor can execute the second instruction while the standard processor writes the parameters required for a third instruction into the first memory region, and so forth.
The use of a FIFO memory follows a similar principle. At one end of the FIFO memory the standard processor of a computer system stores the instruction parameters required for executing an instruction, while at the other end of the FIFO memory the coprocessor of the computer system reads out the instruction parameters and executes the respective instructions.
As a consequence, as with the use of an alternate buffer, it is possible for the data transfer from the standard processor to the coprocessor and the execution of instructions by the coprocessor to overlap.
Compared with the use of a simple memory, this enables a faster processing speed of the instructions to be processed to be achieved.
However, the use of an alternate buffer or of a FIFO memory has the disadvantage that the instruction parameters of two successive instructions are written into two different memory regions. As a result, the standard processor must always write the entire instruction parameter set into the respective memory region, even if the instruction parameters of two successive instructions differ only very slightly.
In the field of software, and especially for the communication of program parts, it is customary to pass only changing parameters from one program part to another and not to pass parameters that remain constant a second time.
For example, the OpenGL graphics library operates as a “state machine”. If, for example as a result of an OpenGL function call, the color is set to a specific value, for instance by the commandglcolor3f(1.0,1.0,1.0);
by means of which the color in which the objects are drawn is set to white, then all objects that are drawn by function calls following this command are drawn in white until the color in which objects are drawn is changed by a further glcolor command.
If a program that uses the OpenGL graphics library is executed on a typical conventional computer system having a graphics coprocessor, for example an IBM-compatible personal computer (PC) with a graphics card having a graphics processor, then all instruction parameters required for executing an instruction are always transferred. If the graphics coprocessor is to represent, for example, two white triangles on a screen, then the standard processor transfers to the graphics processor two corresponding instructions with the respective instruction parameters, with the instruction parameters transferred for each of the two instructions containing the specification of the color as “white”.
On the software level on the other hand it is sufficient to specify the color as “white” only once using a suitable function call.
Since in the case of a computer system in which an alternate buffer or a FIFO memory according to the prior art is used for the data transmission from a standard processor to a coprocessor, all instruction parameters must always be transferred from the standard processor, even the ones that have not changed, the transmission of data between the standard processor and the coprocessor can require a considerable amount of time.
Especially in the case of the loose coupling of processors, as described above, the transmission of a large number of instruction parameters with a high memory requirement is required for executing an instruction. Owing to the large volume of data, a high communications outlay is also necessary for transmission of this data. The standard processor stores the instruction parameters in a FIFO memory for example. If the volume of data is very high, the standard processor requires many clock cycles for the transmission.
This can have a considerable adverse affect on the processing performance of the computer system. For example, if the standard processor requires more time for transmitting the instruction parameters required for an instruction than the coprocessor requires for the execution of the instruction preceding said instruction, then the coprocessor is inactive until the transmission is completed. The processing performance of the computer system is thus less than it theoretically could be, that is if both processors were continuously executing instructions.
While the standard processor is transmitting data to the coprocessor, the processing performance of the standard processor available for executing other instructions is limited. In particular when a large volume of data is to be transmitted, it is consequently of great importance for the processing performance of the computer system that the data is transmitted efficiently.
In extreme cases, the standard processor requires even longer for the transmission of the parameters required for an instruction than the standard processor requires for executing the instruction. In this case it is actually more efficient, that is to say less time is required for processing the instruction, if the standard processor does not pass the instruction on to the coprocessor but executes it itself.
U.S. Pat. No. 6,411,301 B1 discloses an architecture for a computer system having a main processor and a graphics processor. In this arrangement the main processor can store graphics commands in a main memory. The graphics processor can read said commands out of the main memory, wherein the graphics commands can be buffered by means of a FIFO buffer arranged between the main memory and the graphics processor.
In US 2003/0222877 A1 a processor is disclosed which has an intermediate memory (cache). The intermediate memory is connected to a coprocessor and the coprocessor can store results in the intermediate memory.
U.S. Pat. No. 6,501,480 B1 discloses a graphics accelerator having a local memory, a coprocessor and a DMA (Direct Memory Access) unit which is used for data transmission between the local memory and an external memory.