The ever-growing requirement for high performance computers demands that computer hardware architectures maximize software performance. Conventional computer architectures are made up of three primary components: (1) a processor, (2) a system memory and (3) one or more input/output devices. The processor controls the system memory and the input/output ("I/O") devices. The system memory stores not only data, but also instructions that the processor is capable of retrieving and executing to cause the computer to perform one or more desired processes or functions. The I/O devices are operative to interact with a user through a graphical user interface ("GUI") (such as provided by Microsoft WINDOWS.TM. or IBM OS/2.TM.), a network portal device, a printer, a mouse or other conventional device for facilitating interaction between the user and the computer.
Over the years, the quest for ever-increasing processing speeds has followed different directions. One approach to improve computer performance is to increase the rate of the clock that drives the processor. As the clock rate increases, however, the processor's power consumption and temperature also increase. Increased power consumption is expensive and high circuit temperatures may damage the processor. Further, processor clock rate may not increase beyond a threshold physical speed at which signals may traverse the processor. Simply stated, there is a practical maximum to the clock rate that is acceptable to conventional processors.
An alternate approach to improve computer performance is to increase the number of instructions executed per clock cycle by the processor ("processor throughput"). One technique for increasing processor throughput is pipelining, that calls for the processor to be divided into separate processing stages (collectively termed a "pipeline"). Instructions are processed in an "assembly line" fashion in the processing stages. Each processing stage is optimized to perform a particular processing function, thereby causing the processor as a whole to become faster.
"Superpipelining" extends the pipelining concept further by allowing the simultaneous processing of multiple instructions in the pipeline. Consider, as an example, a processor in which each instruction executes in six stages, each stage requiring a single clock cycle to perform its function. Six separate instructions can therefore be processed concurrently in the pipeline, the processing of one instruction completed during each clock cycle. The instruction throughput of an n-stage pipelined architecture is therefore, in theory, n times greater than the throughput of a non-pipelined architecture capable of completing only one instruction every n clock cycles.
Another technique for increasing overall processor speed is "superscalar" processing. Superscalar processing calls for multiple instructions to be processed per clock cycle. Assuming that instructions are independent of one another (the execution of each instruction does not depend upon the execution of any other instruction), processor throughput is increased in proportion to the number of instructions processed per clock cycle ("degree of scalability"). If, for example, a particular processor architecture is superscalar to degree three (i.e., three instructions are processed during each clock cycle), the instruction throughput of the processor is theoretically tripled.
These techniques are not mutually exclusive; processors may be both superpipelined and superscalar. However, operation of such processors in practice is often far from ideal, as instructions tend to depend upon one another and are also often not executed efficiently within the pipeline stages. In actual operation, instructions often require varying amounts of processor resources, creating interruptions ("bubbles" or "stalls") in the flow of instructions through the pipeline. Consequently, while superpipelining and superscalar techniques do increase throughput, the actual throughput of the processor ultimately depends upon the particular instructions processed during a given period of time and the particular implementation of the processor's architecture.
Memory management is one broad operation type that typically expends vast processor resources. More particularly, memory management refers to any one of a number of methods for storing and tracking data and programs in memory, as well as reclaiming previously occupied memory spaces that are no longer needed. The efficiency of a given memory management process and, in particular, the efficiency of a processor in performing the same, is measured largely by processor utilization.
Of particular concern to the present invention is segmentation. "Segmentation" is a memory management process that divides memory into sections commonly referred to as "segments." x86-based processors support a number of different processing modes, among which are real and protected modes. Real mode is an operational state, available first in the 80286 processor and its successors, that enables the processor to function as an 8086/8088 processor. Real mode addressing is limited to one megabyte of memory. Protected mode, by comparison, is an operational state, available first in the 80286 processor and its successors, that allows the processor to address all available memory. Protected mode is directed to preventing errant programs from entering each other's memory, such as that of the operating system. Segmentation is available in both the real and protected modes. In the 80386 processor and its successors, protected mode also began to provide access to 32-bit instructions and sophisticated memory management modes, including paging.
In conventional x86-based protected mode, memory objects (i.e., collections of fields, records or the like of addressable information in memory) and descriptor tables (i.e., tables of eight-byte data blocks that describe various attributes of the segments) are stored within one or more of a plurality of segments. A two-step process is required to gain access to a particular memory object. First, the processor combines the base address of a particular descriptor table and a selector (i.e., an offset or index) to access a particular descriptor therein. Then, in a second, separate operation, the processor uses the accessed descriptor to construct a base address of a particular segment associated with the particular memory object, combining the same with an offset into the segment to access the particular memory object.
The above-described segmentation process, while advantageously increasing the size of addressable memory, can be very time-inefficient (e.g., requiring multiple memory accesses, multiple instructions to facilitate each memory access and processor downtime awaiting completion of the memory accesses). Advantageously, a conventional descriptor cache may be employed to store descriptors that have been retrieved from memory. However, the process of retrieving descriptors from memory introduces latencies that may decrease the performance of the processor.
There accordingly exists a need in the art for systems and methods for improving memory management in x86-based processors and, more particularly, for reducing the inefficiencies associated with accessing segmented memory in x86-based protected mode.