1. Field of the Invention
The present invention generally relates to an information-processing device, and more particularly relates to an information-processing device that executes both general-purpose processing and transaction processing of data.
2. Description of the Related Art
Current high-performance general-purpose processors are usually designed to rapidly execute instructions suitable for a wide variety of applications such as commercial applications, scientific applications and multimedia processing. For instance, a current high-performance general-purpose processor includes a floating-point unit so as to rapidly execute scientific applications that require execution of a large amount of floating-point arithmetic operations. In general, high-performance general-purpose processors are designed to execute a single program consisting of a series of instructions. In order to execute the series of instructions rapidly, each high-performance general-purpose processor includes a function to predict a branch outcome for executing a branch instruction before a branch direction is known, a reservation station for efficient out-of-order execution of instructions, and a reorder buffer for keeping track of results from the instructions executed out of order.
However, recently, a demand for computers has been increasing especially in areas of systems that execute transaction processing and web server processing. Operations executed in the transaction processing and the web server processing are mainly logical operations and integer operations so that a frequency of executing the floating-point arithmetic operations is low in the transaction processing and the web server processing. In addition, in the transaction processing and the web server processing, a performance to concurrently execute a large number of small-sized processes takes priority over a performance to execute a single large-sized program rapidly.
As described above, the current high-performance general-purpose processors are designed to achieve high performance over a wide computing area, and thus include a large amount of hardware inefficiently utilized for executing the transaction processing and the web server processing. Additionally, a large amount of hardware required for rapid execution of a single large-sized sequential program is overkill for concurrent execution of large quantities of simple processes as required in the transaction processing and the web server processing without carrying overhead hardware for functions rarely required. Therefore, there has been a demand for an information-processing device that can rapidly execute the transaction processing and the web server processing.
A description will now be given of a conventional high-performance general-purpose processor with reference to FIG. 1. A high-performance general-purpose processor 1 shown in FIG. 1 includes an instruction cache and instruction control unit 2, a branch-prediction unit 3, a program counter 4, a checkpoint unit 5, a fixed-point register file 6, a floating-point register file 7, a fixed-point reorder buffer 8, a floating-point reorder buffer 9, a fixed-point reservation station 10, a floating-point reservation station 11, a load/store reservation station 12, a fixed-point unit 13, a floating-point unit 14, a load/store unit 15 and a data cache 16.
The instruction cache and instruction control unit 2 stores instructions, as well as fetches and distributes instructions to be executed. Each unit in the high-performance general-purpose processor 1 is controlled by a result of decoding fetched instructions. The branch-prediction unit 3 detects a conditional branch instruction from decoded instructions, and predicts a direction of each branch before a branching condition is known. The program counter 4 points to an address of an instruction being executed in order. The checkpoint unit 5 stores a processor status including register values and the like when a branch instruction is executed based on a branch prediction. If the branch prediction has been proved wrong, the high-performance general-purpose processor 1 can recover the processor status including the register values before branching by reading information stored in the checkpoint unit 5, thereby recovering from a branch prediction error.
The fixed-point register file 6 stores operands for fixed-point operations, and supplies the operands necessary to execute the operation to the fixed-point reservation station 10. Additionally, the fixed-point register file 6 stores operands for load/store instructions, and supplies the operands necessary to execute the instruction to the load/store reservation station 12. The fixed-point reorder buffer 8 stores results of calculations executed by the fixed-point unit 13 and results of load instructions executed by the load/store unit 15. Additionally, the fixed-point reorder buffer 8 has capability of rectifying an order of generated results as if they were generated in order, and supplies correct operands to corresponding reservation stations.
The floating-point register file 7 stores operands for floating-point operations, and supplies the operands necessary to execute the operation to the floating-point reservation station 11. Additionally, the floating-point register file 7 stores operands for load/store instructions, and supplies the operands necessary to execute the instruction to the load/store reservation station 12. The floating-point reorder buffer 9 stores results of calculations executed by the floating-point unit 14 and results of a load instruction executed by the load/store unit 15. Additionally, the floating-point reorder buffer 9 has capability of rectifying an order of generated results as if they were generated in order, and supplies correct operands to corresponding reservation stations.
The fixed-point reservation station 10 stores fixed-point instructions used for executing fixed-point operations. The fixed-point reservation station 10 waits for operands necessary to execute fixed-point instructions to be supplied from the fixed-point register file 6, from the fixed-point reorder buffer 8, or directly from the fixed-point unit 13 or the load/store unit 15. The fixed-point instructions stored in the fixed-point reservation station 10 become executable after receiving all the operands necessary to execute the instructions. In such a case, the fixed-point reservation station 10 selects executable instructions, and supplies the executable instructions to the fixed-point unit 13. The maximum number of fixed-point instructions the fixed-point reservation station 10 can supply simultaneously to the fixed-point unit 13 is equal to the number of fixed-point operation modules provided in the fixed-point unit 13. It should be noted that the fixed-point unit 13 includes a plurality of the fixed-point operation modules, each fixed-point operation module executing the instructions supplied from the fixed-point reservation station 10.
The floating-point reservation station 11 stores floating-point instructions used for executing floating-point operations. The floating-point reservation station 11 waits for operands necessary to execute floating-point instructions to be supplied from the floating-point register file 7, from the floating-point reorder buffer 9, or directly from the floating-point unit 14 or the load/store unit 15. The floating-point instructions stored in the floating-point reservation station 11 become executable after receiving all the operands necessary to execute the instructions. In such case, the floating-point reservation station 11 selects executable instructions, and supplies the executable instructions to the floating-point unit 14. The maximum number of floating-point instructions the floating-point reservation station 11 can supply simultaneously to the floating-point unit 14 is equal to the number of floating-point operation modules provided in the floating-point unit 14. It should be noted that the floating-point unit 14 includes a plurality of the floating-point operation modules, each floating-point operation module executing the instructions supplied from the floating-point reservation station 11.
The load/store reservation station 12 stores load/store instructions. The load/store reservation station 12 waits for operands necessary to execute load/store instructions to be supplied from the fixed-point register file 6, the floating-point register file 7, the fixed-point reorder buffer 8, or the floating-point reorder buffer 9. The load/store instructions become executable after receiving all the operands necessary to execute the instructions. In such case, the load/store reservation station 12 selects executable instructions, and supplies the executable instructions to the load/store unit 15. The maximum number of load/store instructions the load/store reservation station 12 can supply to the load/store unit 15 is equal to the number of load/store operation modules provided in the load/store unit 15. It should be noted that the load/store unit 15 includes a plurality of the load/store operation modules, each load/store operation module executing the instructions supplied from the load/store reservation station 12.
The fixed-point unit 13 executes the fixed-point instructions supplied from the fixed-point reservation station 10, and sends results of executed fixed-point operations to the fixed-point reorder buffer 8, the fixed-point reservation station 10 and the load/store reservation station 12. Similarly, the floating-point unit 14 executes the floating-point instructions supplied from the floating-point reservation station 11, and sends results of executed floating-point operations to the floating-point reorder buffer 9, the floating-point reservation station 11 and the load/store reservation station 12. Additionally, the load/store unit 15 executes the load/store instructions supplied from the load/store reservation station 12, and sends results of executed load/store operations to the fixed-point reorder buffer 8 and the fixed-point reservation station 10, or to the floating-point reorder buffer 9 and the floating-point reservation station 11. The load/store unit 15 writes data in the data cache 16 when executing a storing instruction. The load/store unit 15 reads data from the data cache 16 when executing a load instruction.
The high-performance general-purpose processor 1 can execute both fixed-point arithmetic/logical operations and floating-point arithmetic operations by including the fixed-point unit 13 and the floating-point unit 14. Additionally, the high-performance general-purpose processor 1 includes the branch-prediction unit 3, the fixed-point reorder buffer 8, the floating-point reorder buffer 9, the fixed-point reservation station 10, the floating-point reservation station 11 and the load/store reservation station 12 so that the high-performance general-purpose processor 1 can predict a branch direction for executing a branch instruction before a branch direction is known, and can produce results as if instructions are executed in order when real execution takes advantage of out-of-order execution.
As described above, the high-performance general-purpose processor 1 is designed to rapidly execute various instructions over the wide computing area, and to execute sequential programs rapidly by using branch prediction and out-of-order execution. The high-performance general-purpose processor 1 also has multiple operation modules in the fixed-point unit 13, in the floating-point unit 14, and in the load/store unit 15. With those operation modules, the high-performance general-purpose processor 1 can execute six to eight instructions simultaneously.
FIG. 2 is a block diagram showing a multi-thread transaction processing system. A multi-thread transaction processing system 17 using a multi-thread method includes transaction processors 18-1 through 18-n, a memory 19, an input/output interface 20 and a system bus 21.
Each of the transaction processors 18-1 through 18-n includes a plurality of program counters for multi-thread processing, thereby executing transaction processes efficiently. The memory 19 is connected through the system bus 21 to the transaction processors 18-1 through 18-n, and is universally used among the transaction processors 18-1 through 18-n. The input/output interface 20 provides an interface connecting the system bus 21 and peripheral devices located outside the multi-thread transaction processing system 17.
FIG. 3 is a block diagram showing a transaction processor. The transaction processor 18-1 shown in FIG. 3 includes an instruction cache and instruction control unit 22, program counters 23, register files 24, a fixed-point unit 25, a load/store unit 26 and a data cache 27. It should be noted that the instruction cache and instruction control unit 22, the program counters 23, the register files 24, the fixed-point unit 25, the load/store unit 26 and the data cache 27 correspond respectively to the instruction cache and instruction control unit 2, the program counter 4, the fixed-point register file 6, the fixed-point unit 13, the load/store unit 15 and the data cache 16 shown in FIG. 1. In addition, each of the transaction processors 18-1 through 18-n has a structure shown in FIG. 3. In this embodiment, a floating-point unit is not implemented since cost of hardware is too high compared to performance gain in transaction processing, which does not require floating-point calculations often. Furthermore, the transaction processors 18-1 through 18-n are designed mainly to simultaneously execute a large number of programs, for instance, the number of processors multiplied by the number of program counters in each processor, rather than to execute a single program rapidly, and thus do not include functions to predict branch directions, reorder buffers and reservation stations. Instead, each of the transaction processors 18-1 through 18-n includes a plurality of program counters 23 and the register files 24 for multi-threading. Taking ATM transaction processing as an example, A single process using a single program counter in a transaction processor can handle a withdrawal for a user A, at the same time, a withdrawal for a user B by using another program counter. Consequently, each of the transaction processors 18-1 through 18-n can execute a plurality of ATM transaction processes simultaneously.
Recently, processor speed has been increasing substantially, whereas memory speed has not increased as the processor speed. Such a condition makes a long memory access time and a resulting processor stall major performance impediments. In a case that a waiting period to access a memory happens to the transaction processor 18-1 while executing a series of instructions for a first process, the transaction processor 18-1 can execute another series of instructions for a second process by using another program counter shown in FIG. 3, thereby improving overall efficiency of utilizing hardware resources. As described above, the transaction processor 18-1 shown in FIG. 3 can execute transaction processes efficiently.
The multi-thread transaction processing system 17 shown in FIG. 2 can execute a large number of processes simultaneously, and can execute transaction processes efficiently by including the transaction processors 18-1 through 18-n. In addition, a size of each transaction processor shown in FIG. 3 is smaller than the high-performance general-purpose processor so that multiple transaction processors can be fabricated on a single chip.
The high-performance general-purpose processor 1 shown in FIG. 1 includes many functions that are not cost-effective to execute simple transaction processes and web server processes as the high-performance general-purpose processor 1 is designed to execute a wide variety of applications efficiently. Additionally, the high-performance general-purpose processor 1 is less efficient for execution of large quantities of simple processes necessary in the transaction processes and the web server processes, since the high-performance general-purpose processor 1 is designed to execute a single program at a time. Even if an attempt to provide a plurality of high-performance general-purpose processors 1 in a system is made, the number of concurrently executed processes is less since the number of the high-performance general-purpose processors 1 that can be fabricated on a single chip is less because of its large circuit size.
On the other hand, in a case that a system includes only the multi-thread transaction processing system 17 shown in FIG. 2, the system can improve efficiency to execute transaction processes and/or web server processes, but cannot rapidly execute complicated processes such as a scientific arithmetic operation that needs floating-point arithmetic operations or a large process blocking execution of other small processes. Further, the system including only the multi-thread transaction processing system 17 does not have various functions to execute a single program rapidly. Accordingly, in a case that the system is asked to execute a single large-sized process or a complicated process, efficiency to execute such a process by the system decreases remarkably.