1. Field of the Invention
The present invention relates to the field of microprocessor architecture. More specifically, the present invention relates to a method and apparatus for the efficient calculation of effective memory addresses in accordance with a segmented memory protocol.
2. Art Background
As the computer revolution has progressed the quest of microprocessor developers has been to develop chips exhibiting more power and faster performance. Initial efforts focused essentially on increasing transistor populations on single microprocessor integrated circuits. That effort continues with today's microprocessors now housing literally millions of transistors on a single chip. Further integration has allowed processor clock speeds to be greatly increased with the increased density of transistors.
In addition to squeezing performance by overcoming physical limitations, microprocessor design has developed into an art form. Microprocessors are divided into discrete functional blocks through which instructions are propagated one stage at a time. This allows for pipelining of instructions such that when one instruction has completed the first stage of processing and moves on to the second stage, a second instruction may begin the first stage. Thus, even where each instruction requires a number of clock cycles to complete all stages of processing, pipelining provides for the completion of instructions on every clock cycle. This single-cycle throughput of a pipelined microprocessor greatly increases the overall performance of computer systems.
Other enhancements to microprocessor design include the development of superscalar microprocessors which are capable of initiating more than one instruction at the initial stage of the pipeline per clock cycle. Likewise, in a superscalar microprocessor, frequently more than one instruction completes on each given clock cycle. Other development efforts have gone into the simplification of microprocessor instruction sets, developing reduced instruction set (RISC) microprocessors which exploit the fact that many simple instructions are more commonly executed than some complicated instructions. Eliminating the complicated instructions from the instruction set provides for a faster executing pipeline. Complicated instructions are carried out by combinations of the more simple instructions.
In order for pipelined microprocessors to operate efficiently, an instruction fetch unit at the head of the pipeline must continually provide the pipeline with a stream of instructions. However, conditional branch instructions within an instruction stream prevent an instruction fetch unit at the head of a pipeline from fetching the correct instruction until the condition is resolved. Since the condition will not be resolved until further down the pipeline, the instruction fetch unit may not be able to fetch proper instructions.
To overcome this problem, many pipelined microprocessors use branch prediction mechanisms that predict the outcome of branches and then fetch subsequent instructions according to branch prediction. Branch prediction is achieved using a branch target buffer (BTB) to store the history of a branch instruction based upon the instruction pointer or address of that instruction. Every time a branch instruction is fetched, the branch target buffer predicts the target address of the branch using the branch history. Speculative execution is where instructions are initiated and completed before knowing if they are the correct instructions. This usually includes prediction with a BTB.
In addition to speculative execution, substantial increases in instruction throughput are achievable by implementing out-of-order dispatch of instructions to the execution units. Many experiments have confirmed that typical von Neumann code provides substantial parallelism and hence a potential performance boost by use of out-of-order execution. Out-of-order execution is possible when a given instruction does not depend on previous instructions for a result before executing. With out-of-order execution, any number of instructions are allowed to be in execution in the execution units, up to the total number of pipeline stages for all the functional units.
In a processor using out-of-order execution, instruction dispatching is stalled when there is a conflict for a functional unit or when a dispatched instruction depends on the result of an instruction that is not yet computed. In order to prevent or mitigate stalls in decoding, previous texts have described the provision of a buffer known as a reservation station (RS) between the decode and execute stages. The processor decodes instructions and places them into the reservation station as long as there is room in the buffer and at the same time examines instructions in the reservation station to find those that can be dispatched to the execution units (that is, instructions for which source operands and execution units are available). Data-ready instructions are dispatched from the reservation station with little regard for their original program order. For further background on the use of reservation stations and out-of-order execution, see Mike Johnson, Superscalar Microprocessor Design and Prentice-Hall, Inc., 1991, Chapters 3 and 7.
Since the advent of the microprocessor industry, one architecture has emerged as the dominant standard in the marketplace. This is the Intel Architecture Microprocessor. The Intel Architecture Microprocessor was one of the earliest general purpose microprocessors that facilitated the proliferation of computers to the extent that they are in use today. The architecture has proceeded through many generations of new designs, each providing more power and greater speed than the previous.
In the Intel Architecture and other microprocessor architectures, it is frequently necessary to identify memory locations in different domains. Often, operating systems will be based on a virtual memory mapping in which virtual memory addresses (also referred to as a linear address) will have to be converted by a translation lookaside buffer into a corresponding physical address. The Intel Architecture Microprocessor implements a segmented memory scheme in which an effective address for a memory location is determined by a base address to a memory block, indexed by an index value which may be scaled by a scaling factor and then ultimately shifted by a displacement value. In the segmented memory model approach, memory is segmented into multiple, independent address spaces. The beginning of each segment is specified by a segment base address, and the location within the segment is indicated by an offset. An address specified in program code is denoted an "effective addresses", which is treated as the offset into the segment. Segmentation hardware translates the virtual address into the address in a segment address called the "linear address" by adding the segment base to the effective address.
When memory operations are carried out by a processor, one component of the operation is generally to calculate the necessary linear or effective address for performing the memory operation. This step is usually incorporated into the memory subsystem components which are provided with the necessary parameters for calculating a memory address for loads and writes to be propagated to the actual memory controllers. There are, at times, situations in which memory addresses are desirable to be calculated when no actual memory operation is intended to be carried out. One particular example has to do with the allocation of memory space for various applications. In the past, when it has been necessary to calculate a memory address it was necessary to use those components of the memory subsystem for calculating such things or to generate effective procedures that to be carried out by other execution units existing features. One disadvantage of prior approaches is that to use the memory subsystem components, introduces a protracted latency because of the time generally required for memory operations and the speed of the memory operating functional units. It would be advantageous, and is therefore an object of the present invention, to provide a mechanism that is independent of the memory execution units and subsystem for calculating memory addresses to increase the throughput of a microprocessor pipeline.