The present invention relates to a data processor and a data processing system, and more particularly to a data processing system, which may provide an improved facility for prefetching instructions from an external memory, thereby performing subroutine programs involving fewer branches (including jump instructions) representing the alteration of execution sequence of successive instruction addresses.
There are known instruction cache memories and instruction prefetch buffers embodying conventional techniques for accelerating the speed of consecutive instruction execution in a data processor or the like. Such acceleration techniques take advantage of localities of data to be referenced in the temporal and spatial domains.
As an example, Japanese Published Unexamined Patent Application No. Hei 6 (1994)-243036 (U.S. Pat. No. 5,511,178) discloses a technique concerning a loop lock, which points out to the locality of fetched instructions. According to the disclosed technique, a sequence of instructions in the loop is retained in a cache memory until the program control exits from the loop.
Japanese Published Unexamined Patent Application No. Hei 4(1992)-62637 discloses a microprocessor including instruction queues (instruction prefetch buffers) that may store fetched loop instructions in an FIFO (first-in first-out) buffer in order to improve the execution speed.
Techniques such as those mentioned above may be effective in processes using many loop instructions since the loop instructions are held in a cache memory or an instruction prefetch buffer for accelerating the execution of sequential instructions. In processes including few loop instructions but sequential execution of almost linear and consecutive addresses, however, the conventional techniques may not be as effective even with some measures taken to prevent any loop instructions from being purged from the buffer or memory. In such cases, the inventors of the present invention have found that even the use of a regular instruction cache memory would have practically no significance.
More specifically, the inventors have studied the execution of subroutine programs such as those for protocol handling or system control processing in a cellular phone system. The protocol processing or system control processing by subroutine programs can be a complicated process, and may become large, storing every necessary processing programs in an internal ROM of a data processor may not be a practical solution. On the other hand, the access speed of external memory is slower than the speed of data processing in a data processor. The discrepancy of speed may be compensated for by use of an instruction cache memory in the data processor. However, the above protocol handling or system control processing frequently uses sequential execution of instructions having their addresses arranged in substantially linear and consecutive, and few loop instructions are included. As a result, not much advantageous effect may be expected from the introduction of a cache memory arrangement.
Under such circumstances, the inventors have decided to exclude the cache memory and utilize instead an instruction prefetch arrangement that is relatively simple in structure. In that case, there is no need for a structure to prohibit the overtaking of any loop instruction given the fact that the processing of interest mostly involves sequential execution of instructions having their addresses arranged in substantially linear and consecutive manner with few loop instructions included. From the standpoint of cost-performance, it was found that the correspondence between prefetched instructions and their addresses needed to be ensured in an appreciably simpler manner when compared with a cache memory address tag control feature or a counter-based read/write pointer control feature.
The inventors have further studied on the instruction prefetching and discovered that, when using a fixed-length burst transfer feature, any invalid instructions were also prefetched if a branch by a branching instruction is encountered, thereby overheads would be resulted in.
The inventors have also found that, when the instruction prefetching was performed by a branch to be executed or a conventional instruction fetch request in combination with lower plural bits, the execution of program would be suspended until instruction fetch from the external memory in the following instruction prefetch would be completed in tha case where all prefetched instruction are executed.
When studying deeply the access to the external memory, instruction prefetch is effective for capturing instruction codes (instructions fetch). However, the external memory will be accessed when capturing data described as operands (data fetch), and we have found that the execution of program would be suspended until all data would be fetched from the external memory.
The inventors have studied a countermeasure by refining the scheme of instruction prefetch with regard to those problems discovered when prefetching instructions. In those circumstances, the correspondence between the instruction prefetched and the address of that instruction should be more simplified than the control feature using address tag of the cache memory or the control mechanism for read-write pointers, from the standpoint of cost-performance.
It is therefore an object of the present invention to provide a data processor of a relatively simple structure, capable of prefetching instructions from the outside in order to improve the efficiency of instruction execution.
It is another object of the present invention to provide a data processing system having an instruction prefetch facility of a relatively simple structure in a data processor, so as to accelerate the processing, whereby sequential execution of instructions of linear or consecutive addresses with few loop instructions are fetched from an external memory and executed.
It is a further object of the present invention to provide a data processing system that executes subroutine programs including few branch processes requiring modification of the order in the execution sequence of successive instruction addresses, thereby offering efficient data processing at relatively lower costs.
Major features of the present invention disclosed herein will now be overviewed herein below.
A data processor in accordance with the present invention comprises an instruction executing means which may fetch instructions and decode thus fetched instructions to execute thus fetched and decoded instructions; and a bus controller which may control access to an external bus in accordance with commands from the instruction executing means. The bus controller may include a plurality of instruction buffers, a flag intrinsic to each of the instruction buffers, and a buffer control circuit. The buffer control circuit may allocate to each of the instruction buffers one of intrinsic values that a plurality of lower bits in each instruction address may have; the circuit may prefetch instructions into the instruction buffers corresponding to the order of instruction addresses, each of which are expressed by the lower plural bits and are next to the address of a fetched instruction of interest; and validate the flag of any instruction buffer when an instruction is prefetched into that buffer while making the flag of any instruction buffer invalid in response to output of a prefetched instruction from that buffer.
With the above structure, prefetching instructions into the instruction buffers need only to be done when any one of the values each expressed by the lower plural bits in an instruction address has reached a predetermined value. Illustratively, in order to simplify measures of instruction prefetch control, when an instruction having a starting offset address expressed by the lower plural bits is fetched, some instructions may be prefetched into the instruction buffers corresponding to the addresses in the range from the one next to the starting offset address to the final instruction address expressed by the lower plural bits. Assuming that a branch instruction can occur changing the order of instruction addresses in the sequence, if a branch destination instruction is fetched using a branch instruction, then instructions may be prefetched into the instruction buffers corresponding to the addresses in the range from the one next to the address of the fetched branch destination instruction to the final instruction address expressed by the lower plural bits.
A data processing system utilizing the inventive data processor above may have a memory apart from the data processor. The memory may store operation programs performed by the data processor and is an object of external bus access thereby.
The memory above may contain a program using a number of processes that require sequential execution of instructions with their addresses arranged in a linear, consecutive manner, with few loop instructions included. Not much advantageous effect on the performance can be thus expected from the use of a cache memory in the data processor that executes such programs.
By using the data processor with the means as have been described above in accordance with the invention, a value expressed by lower plural bits in an instruction prefetched from an external source may uniquely determine to which instruction buffer the instruction is directed. This simplifies prefetch control. An architecture for implementing such instruction prefetch may be embodied much more simpler than the control feature using address tags for a cache memory and the control feature of read/write pointers using a counter for the FIFO buffer.
In addition to the above, the flag status will be controlled to be valid if an instruction is prefetched from an allocated address into a given instruction buffer associated therewith, or otherwise to be invalid in response to the output of a prefetched instruction from an instruction buffer. A valid flag thus indicates that a buffer entry is valid, which can be fetched from the corresponding buffer, while on the other hand an invalid flag indicates that the buffer entry is invalid, allowing a newly prefetched instruction to be loaded into the buffer in question.
When using the above measures, under the condition that there is detected a valid flag associated with the instruction buffer to which is allocated the value expressed by the lower plural bits in the address of the instruction to be fetched by the instruction executing means, the buffer control circuit may output the instruction from the corresponding instruction buffer to the instruction executing means. If otherwise detected is an invalid flag, then the buffer control circuit may permit the prefetch of an instruction into the instruction buffer corresponding to that flag.
By considering the occurrence of processes such as a branch that changes the order of execution sequence of instructions in consecutive addresses, the buffer control circuit may initialize all flags to mark invalid status if the instruction executing means indicates a change of the order of execution of instructions in the consecutive addresses of instructions.
If each of the instruction buffers is arranged so as to have the number of bits equal to the instruction fetches performed by the instruction executing means, it will become easier to control the instruction fetches from the instruction buffers by the instruction executing means.
When prefetching an instruction into the instruction buffer, the offset address of prefetching on the basis of information stored in a register and the like may be determined, instead of prefetching up to the offset address based on the plurality of lower bits in the instruction address, or the offset address of prefetching on the basis of the frequency of occurrence of branches and the like. The number of invalid instructions can be controlled thereby when prefetching according to a branch instruction.
The instruction prefetch may stop by triggering by the occurrence of an interrupt, in addition to a branch. This is because when an interrupt occurs, the execution of program will be halted as needed in order to execute the interrupt program. Thus, the instructions already prefetched will be discarded.
At least two units of instruction buffers will be provided, where one unit includes the plurality of instruction buffers. While executing by means of the instruction executing means instructions prefetched to each of instruction buffers of the primary unit (instruction buffers in the first buffer table), the buffer control circuit may prefetch the instruction in the instruction address next to the last instruction address in the instruction buffer of the first unit to the instruction buffers of the second unit (instruction buffer in the second buffer table). The instructions will be executed during the instruction prefetch from an external memory to an instruction buffer without interrupt of the execution of the program, by controlling by the instruction execution means the execution of instructions prefetched to the instruction buffer of the second unit after the execution of instructions prefetched in the instruction buffer of the first unit is completed.
It may be preferable to incorporate an instruction decoding facility to either the instruction buffer or the buffer controller circuit to decode the instructions stating prefetch of instructions into the instruction buffer. It can be determined thereby whether or not the instruction prefetched into an instruction buffer is a branch instruction. When the prefetched instruction is a branch, the number of instructions that may be wasted if prefetched can be reduced by suspending the prefetching of following instructions.
Furthermore, it may be desirable to incorporate an address calculation facility to either the instruction buffer or the buffer controller circuit. If the destination address of branch by a branch instruction can be determined by means of address calculation, prefetching of the instruction at the branch destination will allow to execute instructions during prefetching of new instructions from the external memory to the instruction buffer, without interrupting the execution of programs. In addition, it may be more desirable to incorporate at least two units of instruction buffers to prefetch both the instruction in an address consecutive to the instruction address of a branch instruction and the instruction in the branch destination. In both cases in which the execution sequence branches by a branch instruction or does not branch, instructions may be executed without interrupting the program to be executed during prefetching a new instruction from the external memory to the instruction buffer.
It may be more preferable to incorporate instruction decoding facility as well as an operand buffer to either the instruction buffer or the buffer controller circuit to allow an appropriate operand to be prefetched when an instruction requiring its own operands is prefetched. If the operand is an immediate data modified by an address, the external memory is to be accessed for fetching the immediate data. Thus prefetching of the immediate data at the time of instruction prefetching may allow instructions to be executed without interrupting the program to be executed.
It may be further desirable to incorporate a cache memory to the data processor to reuse part or all of programs stored in the cache memory at the time when a branch to an address or a loop processing already performed, or the entire protocol handling is carried out, resulting in significant decrease of the occurrence of interrupted execution of programs caused by the access to the external memory.
The present invention will be described herein below in greater details when applied to a cellular phone system by way of example. A cellular phone in accordance with the invention may comprise a data processor, a memory, and a bus connected to the data processor and the memory. The memory may store programs for at least either protocol handling or system management. The data processor may include an instruction executing unit for fetching instructions and decoding thus fetched instructions to execute thus fetched and decoded instructions, and a bus controller that includes a plurality of instruction buffers each having the number of bits equal to the number of instruction fetches performed by the instruction executing unit, flags each corresponding to respective instruction buffers, and a buffer control circuit. The bus controller further controls access to the memory through a bus based on signals originating from the instruction executing unit. The buffer control circuit will allocate an inherent value to each of the instruction buffers, which value may be expressed by a plurality of lower bits in each instruction address. If an instruction is fetched which has an address corresponding to the smallest possible value expressed by the plural lower bits, then the buffer controller circuit will load the instructions in the range from the one having an address next to that of the already fetched instruction to the last instruction having an address expressed by the lower plural bits into the instruction buffers corresponding to the addresses of the loaded instructions, and set the flag associated with each of the instruction buffers to a first state. Given an instruction fetch request from the instruction executing unit and if there is detected first state of the flag associated with the instruction buffer corresponding to the value expressed by the lower plural bits in the instruction address to be fetched by the instruction executing unit, then the buffer controller circuit will output the instruction in the appropriate instruction buffer to the instruction executing unit, and set the flag of the corresponding instruction buffer to a second state.
If there is detected in second state the flag associated with the instruction buffer corresponding to the value expressed by the lower plural bits in the address of the instruction to be fetched by the instruction executing unit, the address being output by the instruction executing unit, then the instructions in the range from the one having an address next to that of the instruction to be fetched to the last instruction address expressed by the lower plural bits may be loaded into the instruction buffers corresponding to the addresses of the instructions, the flag of each of the corresponding instruction buffers being reset to a first state.
Among the instructions to be fetched by the instruction executing unit, of which the instruction addresses are output by the instruction executing unit, either an instruction having an address corresponding to the smallest possible value expressed by the lower plural bits in the instruction address or an instruction having an address in which the value expressed by the lower plural bits represents second state of the flag, may be read from the memory and fed directly to the instruction executing unit rather than stored in the instruction buffer.
The instruction executing unit may output predetermined signals depending on types of fetched instructions. The buffer controller circuit may set all of the flags associated with the instruction buffers to the second state in response to first signals output by the instruction executing unit. The instructions which may causes the instruction executing unit to output the first signals may be for instance a branch instruction.
The above and other novel objects and features in accordance with the present invention will be clear when reading the following description of preferred embodiments in conjunction with the accompanying drawings.