Many scientific data processing tasks involve extensive arithmetic manipulation of ordered arrays of data. Commonly, this type of manipulation or "vector" processing involves performing the same operation repetitively on each successive element of a set of data. Most computers are organized with an arithmetic unit which can communicate with a memory and with input-output (I/O). To perform an arithmetic function, each operand (numbers to be added, subtracted or multiplied or otherwise operated upon) must be successively brought to the arithmetic unit from memory, the functions must be performed, and the result must be returned to the memory. Machines utilizing this type of organization, called scalar machines, have been found inefficient for practical use in large scale vector processing tasks.
In order to increase processing speed and hardware efficiency when dealing with ordered arrays of data, vector machines have been developed. A vector machine is one which deals with ordered arrays of data by virtue of its hardware organization, rather than by a software program and indexing, thus attaining higher speed of operation. One such vector machine is disclosed in U.S. Pat. No. 4,128,880, issued Dec. 5, 1978 and incorporated herein by reference. The vector processing machine of this patent employs one or more registers for receiving vector data sets from a central memory and supplying the data to segmented functional units, wherein arithmetic operations are performed. More particularly, eight vector registers, each adapted for holding up to sixty-four vector elements, are provided. Each of these registers may be selectively connected to any one of a plurality of functional units and one or more operands may be supplied thereto on each clock period. Similarly, each of the vector registers may be selectively connected for receiving results. In a typical operation, two vector registers are employed to provide operands to a functional unit and a third vector register is employed to receive the results from the functional unit.
Further vector type machines are described in U.S. Pat. No. 4,661,900, issued Apr. 28, 1987 and incorporated herein by reference wherein multiple processors are each connected to a central memory through a plurality of memory reference ports. The processors are further each connected to a plurality of shared registers which may be directly addressed by the processors at the faster access rates commensurate with intraprocessor operation. A vector register design provides each register with at least two independently addressable memories, to deliver data to or accept data from a functional unit.
A better memory architecture is desired for vector computers to perform well. Many modern operating systems and application programs assume and rely on virtual memory functions. In a virtual memory organization, programs running as jobs in a computer identify data by means of a virtual address. These addresses must be mapped or translated by the computer into real addresses to find the data identified by the virtual address. The real address corresponds to real storage, such as fast physical random access memory used by the computer. This mapping must be done quickly so the processors do not have to wait long for the data. Some computers use a small associative mapping in which a table of pairs of real and virtual addresses is accessed in one step. This is fairly expensive to implement in hardware, so it is usually combined with a complete table of virtual and real addresses residing in main memory and managed by the operating system. Such tables take longer to search and find the corresponding address given one of the virtual or real addresses.
In virtual memory organizations, data is organized into blocks, referred to as segments. Some systems use segments of varying lengths, while others may use segments of one or more fixed lengths. Since the real storage is usually much smaller than the virtual address range, blocks of data are transferred back and forth between real storage and a secondary storage such as disk drives, tape drives, optical storage and other slower, cheaper forms of long term storage. The blocks of data are transferred back and forth between real and secondary storage in accordance with whether the computer will likely need the data in the near future, or whether other data will be needed sooner that is not yet in real storage. The virtual address structure in such memory organizations comprises a segment portion and an offset within the segment portion directly pointing to the address of the data. Some systems use segments of varying lengths, making management of the memory system more complex.
A major criterion used for evaluating a memory addressing organization or architecture is performance. In many cases, the value of an architecture feature can be judged in terms of clock periods gained or lost when executing a sequence of instructions. However, features of the memory system organization often affect performance in ways that only can be measured on a larger scale. Large blocks of virtual memory address ranges compete for the same real memory resources. Contending requirements must be arbitrated and managed efficiently by the operating system, and such management profoundly affects system performance.
Virtual memory organization provided by operating systems are well known in the art for their ability to optimize the use of memory in a hierarchical memory system. Virtual memory managers can produce system memory access speeds that approach the access speed of the fastest memory components in the system. They do this by keeping active blocks such as pages in real memory which has the fastest access speed, and as the blocks of data become inactive, migrating them back to lower speed memory. When a job tries to access data and the corresponding virtual address does not have a real address assigned to it, a page fault is generated. On page faults, the virtual memory manager transfers data contained in that page to real memory. This can take a significant number of clock periods to occur.
Vector processing supercomputers have special memory needs that are not met by standard virtual memory system. Vector machines employ pipelining techniques to hide memory latency through the prefetching of instructions and data.
The use of pipelining in vector machines places an additional burden on an operating system. System exceptions can incur substantial time penalties as the operating system software attempts to determine the state of the system at the time of the exception. Additional hardware may be required to track the state of the machine through all stages of pipelining. The difficulty of determining (and saving) the state of a pipelined vector machine has led designers of past systems to minimize hardware impact by designing many exceptions as non-recoverable. A non-recoverable exception results in aborting a job because correct execution cannot be resumed.
Demand-paged virtual memory systems are difficult to implement in a vector supercomputer. In conventional virtual memory systems, the virtual to real memory mapping tables reside in main or real memory. For reasonable performance, a recently-used subset of memory mapping information is cached via a translation lookaside buffer (TLB). This requires extra hardware to control the loading to the buffers from memory-resident page tables, or special traps and privileged instructions have to be provided to support a software-managed lookaside buffer effectively. A second drawback in virtual memory systems lies in the fact that, even with lookaside buffers, memory mapping hardware may require additional pipeline stages. It is necessary to translate virtual addresses to real address, and to check for translation faults caused by unmapped addresses. Finally, addressing exceptions (traps) occur at times that are difficult for the hardware to handle. Potentially any memory reference can result in a trap. This causes problems in a highly pipelined processor where a trap condition is not detected until several clock periods after an instruction issues. Particularly difficult cases occur with vectors where a trap can occur in the middle of a vector load or store (or multiple traps within the same load or store). While not impossible, handling addressing exceptions adds to hardware complexity, and can easily lead to reduced performance.
It is evident that there is a need for a memory management system for a vector based computer system that provides some of the mapping capabilities of virtual memory management, but is tailored for a vector processing environment. There is a need for such a system to anticipate potential page faults near the beginning of execution of an operation on a vector. Such a system should be designed such that addressing errors are detected as soon as possible after instruction execution begins.