1. Technical Field
The present invention relates in general to improved multiscalar data processing systems and in particular to improved methods and systems for instruction address translation in a multiscalar data processing system. Still more particularly, the present invention relates to methods and systems for distributed instruction address translation in a multiscalar data processing system.
2. Description of the Related Art
Designers of modern state-of-the-art data processing systems are continually attempting to enhance the performance aspects of such systems. One technique for enhancing data processing system efficiency is the achievement of short cycle times and a low Cycle's-Per-Instruction (CPI) ratio. An excellent example of the application of these techniques to an enhanced data processing system is the International Business Machines Corporation RISC System/6000 (RS/6000) computer. The RS/6000 system is designed to perform well in numerically intensive engineering and scientific applications as well as in multi-user, commercial environments. The RS/6000 processor employs a multiscalar implementation, which means that multiple instructions are issued and executed simultaneously.
The simultaneous issuance and execution of multiple instructions requires independent functional units that can execute concurrently with a high instruction bandwidth. The RS/6000 system achieves this by utilizing separate branch, fixed point and floating point processing units which are pipelined in nature. In such systems a significant pipeline delay penalty may result from the execution of conditional branch instructions. Conditional branch instructions are instructions which dictate the taking of a specified conditional branch within a application in response to a selected outcome of the processing of one or more other instructions. Thus, by the time a conditional branch instruction propagates through a pipeline queue to an execution position within the queue, it will have been necessary to load instructions into the queue behind the conditional branch instruction prior to resolving the conditional branch in order to avoid run-time delays.
Another source of delays within multiscalar processor systems is the fact that such systems typically execute multiple tasks simultaneously. Each of these multiple tasks typically has a virtual or effective address space which is utilized for execution of that task. Locations within such a virtual or effective address space include addresses which "map" to a real address within system memory. It is not uncommon for a single space within real memory to map to multiple effective or virtual memory addresses within a multiscalar processor system. The utilization of effective or virtual addresses by each of the multiple tasks creates additional delays within the multiscalar processor system due to the necessity of translating these addresses into real addresses within system memory, so that the appropriate instruction or data may be retrieved from memory and placed within an instruction queue for dispatching to one of the multiple independent functional units which make up the multiscalar processor system.
This problem is further exacerbated by the fact that multiple translation mechanisms may be utilized. For example, a page table translation (PTE) is utilized to map an effective or virtual page of memory to a real page of memory within a page system memory and is utilized with a consistently sized translation object. However, an address block translation (BAT) may be utilized to map a translation object which ranges in size from a one hundred twenty-eight kilobyte block to an eight megabyte block. Thus, the variation in translation algorithm and the necessity of translating each effective instruction address into a real instruction address during application execution can result in substantial delays in a multiscalar processor system.
This translation problem is particularly burdensome during execution of instructions and in many known multiscalar processor systems the retrieval of instructions has been accomplished utilizing a so-called "fetcher" which retrieves instructions and then dispatches those instructions to one of the independent processor units. Known multiscalar processor systems typically include instruction fetchers which are capable of "in-page" fetching, that is the fetcher is not capable of translation and can only prefetch instructions from a single specified page within memory. Alternately, an "out-of-page" fetcher typically requires an instruction translation lookaside buffer (TLB), segment register access and all of the overhead associated with a separate translation unit. These capabilities were required in addition to the memory management units provided with such systems for address translation and memory management.
Upon reference to the foregoing, those skilled in the art will appreciate that it would be advantageous to provide an instruction fetcher which could rapidly and efficiently translate effective instruction addresses into real instruction addresses without requiring the utilization of substantial hardware assets or the inherent delay required to access and utilize the system memory management unit.