Many modern computer systems implement a variety of operating features, each designed to improve the overall speed of operation of the system. For example, the computer system may include multiple processors coupled to one another in a synchronous, master-slave relationship and utilize an instruction pipeline to increase the speed of execution of the various instructions required by an application program.
Moreover, the computer system may further include a virtual address space and a physical memory to facilitate the construction of multiple application programs for processing by the computer system, as is well known. One advantageous computer system configuration comprises a scalar central processing unit coupled in a synchronous, master-slave relationship to a vector processor.
A vector processor is designed to perform high speed execution of program loops on a vector, which may be defined as an array of data elements stored in the computer system's memory with a fixed increment or stride between successive data elements of the vector. A vector's stride is defined as the number of memory locations, typically in bytes, between the starting addresses of consecutive vector elements. For example, a contiguous vector that has longword elements (four contiguous bytes) has a stride of four and a contiguous vector that has quadword elements (eight contiguous bytes) has a stride of eight. Vectors most often represent rows or columns of matrices or tables, as for example, arrays of measured temperatures, pressures and other physical variables relating to the solution of a scientific problem. The vector processor operates through the execution of vector instructions constructed to directly perform computational tasks in vector or array form. Such vector instructions are far more efficient than conventional loop operations of a scalar processor.
In the synchronous, master-slave coupling arrangement between the scalar central processing unit and the vector processor, the scalar central processing unit fetches all instructions and, when it recognizes a vector instruction, issues the vector instruction to the vector processor for execution. If a vector instruction that depends upon the result of a previous vector instruction is issued and the previous vector instruction caused a memory management fault in the vector processor, it would be very difficult to restore the instruction pipeline and restart the instructions. Accordingly, the scalar pipeline is typically stalled until the vector processor indicates that there was no fault during the load/store operation relating to the previous vector instruction.
It should be understood that, as is well known, the purpose of a pipelined operation is to increase the speed of operation of the computer system through overlapped execution of instructions. The overlapped operation advantage of a pipeline is defeated during the time the scalar pipeline is stalled while the scalar central processing unit is awaiting a no fault indication from the vector processor. Thus, the time it takes a vector processor to assert a signal indicating that there will be no fault greatly affects the overall speed of operation of the master-slave scalar and vector processors. More specifically, the earlier the vector processor can determine that no fault has occurred, the earlier the scalar central processing unit can resume vector instruction issue and refill the instruction execution pipeline.
For purposes of the present invention, the term fault is defined as a memory management fault. As discussed above, the present invention contemplates a computer system having a virtual address space and a physical memory. In such computer systems, a translation scheme is provided to translate a virtual address to a physical address such that data referenced by a virtual address and residing in the physical memory may be fetched by translating the virtual address into a corresponding physical address. All application programs to be processed by the computer system are constructed with reference to the virtual address space by use of virtual addresses which define the locations of instructions and data required by the program within the virtual address space. The computer system is provided with a mechanism to dynamically translate the virtual addresses generated by the program being executed into correct physical memory locations, each defined by a unique physical address.
During the execution of the program, the scalar processor continues to reference data and instructions by virtual addresses and issues information relating to the virtual addresses for the data elements of the vector to the vector processor. A translation mechanism must be provided in the vector processor to continuously translate the virtual addresses of the data elements into corresponding physical addresses where the data elements may be found in the main physical memory.
Typically, the virtual memory space is divided into memory units called pages. A page contains a predetermined number of basic addressable units. For example, the basic addressable unit may comprise an 8-bit byte and a page may contain 512 bytes. The format for a virtual address to uniquely identify a basic addressable unit would be the virtual page number containing the addressable unit and the byte number of the addressable unit within the specified page. A page table is maintained in the physical memory to cross reference virtual addresses to physical addresses. As the computer system dynamically transfers data to and from auxiliary memory devices, it generates page frame numbers which define 512 byte pages of physical memory to be used on references to the virtual addresses. A page table entry is provided for each virtual page then residing in physical memory. The page frame number assigned to a particular virtual page at the time of a transfer of the related data from auxiliary memory to main memory is stored in the page table entry for that virtual page.
Accordingly, in concept, a physical address corresponding to a particular virtual address can be obtained by fetching the page table entry for the virtual page containing that virtual address from physical memory and merging the byte number of the addressable unit of data with the page frame number contained in the page table entry. However, in practice, the vector processor maintains a translation buffer that is a special purpose cache of recently used page table entries. Most often, the translation buffer already contains the page table entries for the virtual addresses being used by a program and the processor need not access physical memory to obtain them.
In known computer systems, the translation buffer is in the translation mechanism which is coupled on a timing-critical data path between the vector processor and the physical memory system. The translation mechanism ascertains the page frame number for the virtual page number of the virtual address to be translated from the translation buffer and attaches the byte number of the virtual address to the page frame number listed in the translation buffer to provide the physical address.
Memory management encompasses the operation of the translation mechanism and the loading of the translation buffer with page table entry information. A translation buffer hit is said to occur when the translation buffer contains the page table entry relating to a particular virtual address. A memory management fault is defined as a translation buffer hit wherein, e.g., a protection code does not permit the program being processed to access the physical page specified by the page table entry required to complete the translation. A translation buffer miss occurs when the translation buffer does not contain a virtual page number for the virtual address to be translated and, therefore, cannot provide the translation mechanism with the physical address information required to complete the virtual-address to physical-address translation. When a translation buffer miss occurs, the vector processor must service the miss by loading the required page table entry into the translation buffer. In the event of a memory management fault, the vector processor suspends operation and the computer system takes an exception. As discussed above, the scalar processor is not to issue a subsequent vector instruction prior to the completion of a valid translation for each of the virtual addresses of the data elements of the vector for a previous vector instruction. A valid translation is defined as a translation buffer hit without a memory management fault.