(1) Field of the Invention
The present invention relates to a processor that is used in a large scale integration (LSI) such as a microcomputer or a digital signal processor (DSP), and more specifically to a processor containing address decoders that are suited to improvements in clock speed.
(2) Description of the Related Art
Operating speed and performance of a processor that is used in an LSI, such as a microcomputer and a DSP, continue to be improved. In particular, the increase in the operating speed (i.e., operating frequency) of processors has greatly exceeded the increases in operating speeds of semiconductor circuit elements such as logic gates and memory.
Super-pipelining is one method to improve the operating speed of a processor to shorten the processing cycle of the processor. This technique increases the number of pipeline stages for the processor and so reduces a substantial processing time per cycle.
However, an increase in the number of pipeline stages used to process instructions leads to a higher occurrence of hazards due to data dependencies between the instructions being executed. This means that it is not enough to simply increase the number of pipeline stages. In particular, the occurrence of hazards is closely linked to the number of pipeline stages used for instruction fetch operations and memory access operations (memory read and write for operand data). This means that it is preferable to increase operating speed without increasing the number of pipeline stages.
The following describes a memory access by a conventional processor to read or write operand data. This processor is assumed to operate in a five-stage pipeline consisting of: an instruction fetch (hereafter, IF) stage, an instruction decode (ID) stage, an execution (EX) stage, a memory access stage (MEM), and a write back (WB) stage. When executing a memory access instruction, this processor performs the following steps in the EX stage: step 1 for calculating a memory address using operands designated in an instruction; step 2 for decoding the calculated address to judge which memory region should be accessed; and step 3 for setting an access mode in preparation for the following MEM stage, where the processor accesses the memory according to the set access mode.
FIG. 1 shows an example memory map in an address space that is accessed by the above processor according to a 32-bit address. As shown in the memory map, the following three types of regions (hereafter memory-mapped regions) are mapped into the address space, with two separate regions existing for each region type: RAM (random access memory) regions; ROM (read only memory) regions; and I/O (input/output) interface regions. In this way, the I/O interface regions are mapped into the same address space as the memory (i.e., memory-mapped I/O is used).
FIG. 2 shows the operation contents for the EX stage, where the conventional processor processes a memory access instruction by performing the following steps: step 1 for calculating a 32-bit address using the operand data of the memory access instruction; step 2 for decoding the calculated 32-bit address to judge which of the six memory-mapped regions (i.e., the two RAM regions, the two ROM regions, and the two I/O interface regions) is specified by the 32-bit address; and step 3 for setting the access mode based on the result of the judgement in step 2 and the result of the decoding of the memory access instruction in the previous ID stage. This access mode setting in step 3 is performed by determining the contents of the access mode based on the decoding result in the ID stage, and by initializing control signals (i.e., preparing to assert certain control signals) based on the access mode that has been determined. Here, the control signals include a write enable (WE) signal and a chip select (CS), and the access mode shows information such as whether the memory access is for a read or a write, and the size of data (hereafter called an access data size) to be transferred through the access. Note that the decoding of the highest-order eighteen bits is sufficient in step 2 to judge one of the six memory-mapped regions in the memory map of FIG. 1.
With this conventional processor which sequentially performs the above steps 1 to 3 as the EX stage, however, it is difficult to increase the operating frequency because the time taken by the EX stage cannot be shortened to less than the total operating time taken by steps 1-3.
This operating time taken by steps 1 to 3 involves the following delays. In step 1, an adder that adds a base address and an offset address causes a delay. In step 2, an address decoder that decodes the highest-order eighteen bits out of a 32-bit address causes another delay. In step 3, another delay is caused between the access control circuit (i.e., a memory controller) receiving the results of the instruction decoding and the space judgement, and the memory controller initializing control signals in accordance with the access mode that has been set.
Of these delays, the delay in step 2 gets longer as the number of bits to be decoded by the address decoder increases. This is because the address decoder requires a plurality of circuit elements which involve a higher number of stages to decode an address of a higher number of bits. As a result, the time required for step 2 of the address space judgement gets longer.
For the example memory map shown in FIG. 1, the highest eighteen bits of the 32-bit address needs to be compared with the highest eighteen bits of an address of each memory-mapped region (or a boundary between two memory-mapped regions) to detect whether they are the same, and therefore a circuit as an address decoder to perform this operation are necessary. In theory, it would be sufficient for this address decoder to have a construction which involves two decoding stages by containing at least eighteen AND circuits having two input terminals and an AND circuit having eighteen input terminals that receive the outputs of the eighteen AND circuits and performing a logical AND operation. In reality, however, the address decoder needs to have a circuit construction with more than two decoding stages because a logical circuit such as the address decoder in an LSI is usually built by combining circuits of the same type such as NAND circuits having two input terminals, or NOR circuits having two input terminals.
The object of the present invention is to provide a processor having the high operating speed by reducing a time taken to execute a memory access instruction.
The above object can be achieved by a processor that accesses a plurality of regions allocated to memory. The processor includes: a judging unit for judging which region is accessed based on an access address; an assuming unit for assuming which region is accessed based on the access address, the assuming unit producing an assumption result faster than the judging unit produces a judgement result; an accessing unit for starting access based on the assumption result; a detecting unit for detecting a disagreement between the judgement result and the assumption result; and a control unit for stopping the access that has been started if the detecting unit has detected the disagreement, and controlling the accessing unit to perform another access based on the judgement result.
With this construction, the judgement by the judging unit is made in parallel with the assumption by the assuming unit. Without waiting for the judging unit to complete the judgement, the accessing unit starts access based on the result of the assumption by the assuming unit. When the judgement result and the assumption result match, the accessing unit continues the access. When the two results disagree, the accessing unit cancels the access, and starts another access based on the judgement result. Accordingly, a time required to execute a memory access instruction can be reduced when the judgement result and the assumption result match, so that the operating speed of the processor can be increased.
Here, the access address may be in an address space that contains a first region and a second region, and the first region may contain a first subregion and a second subregion that are allocated respectively to a first memory element and a second memory element. By decoding M bits of the access address, the judging unit may judge which region, out of at least the first subregion, the second subregion, and the second region, is accessed. By decoding N bits, wherein N is smaller than M, of the access address, the assuming unit may judge which region, out of at least the first region and the second region, is accessed, and may assume that a region corresponding to the first memory element is accessed when judging that the first region is accessed.
For this construction, the assuming unit only needs to identify at least the first region and the second region without needing to identify the first subregion and the second subregion. This allows the assuming unit to only decode N bits, so that this decoding can be performed faster.
Here, when the assuming unit has judged that the first region is accessed and the judging unit has judged that a region which is not the first subregion is accessed, the detecting unit may detect the disagreement.
To detect the stated case as the disagreement, the detecting unit only needs to have a simple logic circuit, and so can quickly detect the disagreement.
Here, the above processor may further include an address calculating unit for calculating the access address according to operands of a memory access instruction, and the judging unit and the assuming unit may decode M bits and N bits, respectively, of the calculated access address, wherein N is smaller than M.
For this construction, the assuming unit can make the assumption faster than the judging unit makes the judgement although the assuming unit and the judging unit start the decoding simultaneously.
Here, the above processor may further include an address calculating unit for calculating the access address according to operands of a memory access instruction. By decoding the calculated access address, the judging unit may make a judgement. By decoding data shown as an operand of the memory access instruction, the assuming unit may make an assumption.
With this construction, the assuming unit decodes the operand data according to which the address calculating unit has not performed calculation yet. Accordingly, the assumption by the assuming unit can be made in parallel with this address calculation, so that the assuming unit can output the result of the assumption earlier.
The above object can be also achieved by a processor that operates in a pipeline consisting of at least an execution stage where the processor calculates an access address designated by a memory access instruction and a memory access stage where the processor accesses the calculated access address, the memory access stage immediately following the execution stage. The processor include: a judging unit for judging which region is accessed by decoding M bits of the access address in the execution stage; an assuming unit for assuming which region is accessed by decoding N bits, wherein N is smaller than M, of the access address in the execution stage, the assuming unit producing an assumption result faster than the judging unit produces a judgement result, a detecting unit for detecting, in the execution stage, a disagreement between the judgement result and the assumption result; an accessing unit for starting access in the memory access stage based on the assumption result when the detecting unit has detected no disagreement; and a pipeline control unit for extending the memory access stage when the detecting unit has detected the
As disagreement, wherein the accessing unit performs access based on the judgement result in the extended memory access stage.
With this construction, the judgement by the judging unit is made in parallel with the assumption by the assuming unit in the execution stage. In the next memory access stage, the accessing unit starts access based on the result of the assumption made by the assuming unit. If the judgement result by the judging unit and the assumption result match, the accessing unit continues the access. If the two results disagree, the accessing unit cancels the access, and starts another access based on the judgement result in the memory stage that has been extended. Accordingly, a necessary processing time within the execution stage can be reduced, and so the operation clock frequency of the processor can be increased.
Here, the above processor may further include: two operand registers that store, in the execution stage, a base address and an offset address that are designated in the memory access instruction; and an address calculating unit for calculating the access address by adding the base address and the offset address in the two operand registers; and an operand selecting unit for selecting the base address outputted from one of the two operand registers, wherein the judging unit decodes M bits of the calculated access address and wherein the assuming unit decodes N bits of the base address that has been selected by the operand selecting unit.
Here, the accessing unit may include: a result selecting unit for selecting the assumption result in the execution stage, and selecting the judgement result in the memory access stage only when the detecting unit has detected the disagreement; an access control unit for generating, in the execution stage, a plurality of first control signals used for a first memory access based on the selected assumption result, and generating, in the memory access stage, a plurality of second control signals used for a second memory access based on the selected judgement result when the detecting unit has detected the disagreement; and an access control register for storing either the plurality of the first control signals or the plurality of the second control signals, and outputting either the first control signals or the second control signals to the first memory element and the second memory element in the memory access stage, wherein when the detecting unit has detected the disagreement, the access control register is reset.