In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
A modern computer system typically comprises one or more central processing units (CPU) and supporting hardware necessary to store, retrieve and transfer information, such as communication buses and memory. It also includes hardware necessary to communicate with the outside world, such as input/output controllers or storage controllers, and devices attached thereto such as keyboards, monitors, tape drives, disk drives, communication lines coupled to a network, etc. The CPU or CPUs are the heart of the system. They execute the instructions which form a computer program and directs the operation of the other system components.
From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. CPUs (also called processors) are capable of performing a limited set of very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Sophisticated software at multiple levels directs a computer to perform massive numbers of these simple operations, enabling the computer to perform complex tasks. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, but using software with enhanced function, along with faster hardware.
Many diverse innovations have improved the throughput, or amount of useful work per unit of time, which can be performed by a computer system. Among these is a class of innovations relating to parallelism. Parallelism involves using multiple copies of hardware components, and in particular multiple CPUs, to increase the throughput of the computer. Reductions in the size and cost of integrated circuitry have made it practical to include multiple CPUs in a single computer system, even to the point of including multiple CPUs or processors on a single integrated circuit chip. It can be readily understood that two processors should be able to accomplish more than a single processor having the same characteristics, that three should be able to accomplish more than two, and so on. However, this relationship is usually not a direct proportion. Multiple processors interfere with one another in their need to use memory, communications buses, and other system components, increase the complexity of assignment of processes to execute on the processors, generate additional management overhead, and so forth. The proliferation of processors introduces a range of new problems relating to these management issues.
A related development involves the use of coprocessors. A coprocessor is a digital device which performs selective operations on behalf of one or more processors. Unlike the processor, the coprocessor does not independently execute a thread of instructions and maintain thread state for determining the instruction flow. It is invoked by a processor to perform an operation on data, performs the operation asynchronously with the processor (i.e, while the processor continues to execute instructions and perform other operations), and makes results available to the processor when finished. The coprocessor is often, although not necessarily, implemented as a multiple-stage pipeline. Thus, the coprocessor has the capability to off-load some of the work from one or more processors. In many environments, the use of one or more coprocessors provides better performance than devoting an equivalent amount of hardware circuitry to additional processors, which would only further aggravate the management issues of processor parallelism.
In the case of certain coprocessor operations, it is practical to communicate a relatively small amount of input data directly over a bus from the processor to the coprocessor as the coprocessor's operation is invoked, and receive results directly from the coprocessor, storing the data temporarily in one or more buffers as needed. However, this straightforward technique limits the range and complexity of operations that can be performed by the coprocessor. It may be desirable to support an arbitrarily large volume of input data to and/or output data from the coprocessor operation, or to support operations which require data which is not known until some part of the operation is complete.
In order to support a greater range and complexity of coprocessor operations, it would be desirable to support access by the coprocessor to data in memory (including cache memory). If the coprocessor needs input data, it would be able to fetch it from memory as required; if it produces output data, it would be able to store it in memory as required.
Unfortunately, there is a problem. Almost all modern general purpose computer systems support the use of different address spaces, and some form of address translation. Each thread executing on the processor executes in the context of a respective process. The executable instructions reference memory addresses within an address space corresponding to the process. With the exception of addresses generated by a relatively small set of privileged processes which administer the computer system, these memory addresses (which are herein referred to as “effective addresses” although they may alternatively be called “virtual addresses”, or by some other name) do not reference fixed physical storage locations in main memory. Some form of address translation mechanism translates the addresses used by the executing instructions to the fixed addresses of storage locations in memory (which are herein referred to as “real addresses”, although they may alternatively be called “physical addresses”, or by some other name).
Address translation is used for several reasons. Because the effective address space(s) can be larger than the physical address space, processes are freed from the constraints of available physical addresses. The programmer and compiler need not know in advance the amount of physical addresses available (which can vary from system to system, or within logical partitions of the same system), or the usage of physical addresses by other processes executing on the system. Finally, different address spaces are used to isolate processes from one another and thus provide a measure of security and data integrity.
A processor typically maintains thread state information and includes address translation mechanisms which enable it to translate these effective addresses to real addresses in order to access memory (including cache memory) in a secure, authorized manner. It is theoretically possible to include such mechanisms in a coprocessor, but this adds significant cost and complexity to the design. In addition to hardware logic, the address translation mechanisms usually include one or more tables, such as a translation lookaside buffer (TLB), which map effective addresses to real addresses. The information in these tables changes frequently, and duplication of the address translation mechanisms in the coprocessor requires that such information be kept current. The complexity is aggravated by the fact that the coprocessor may be shared by multiple processors, and must therefore be able to translate addresses in the context of any thread executing on any of the multiple processors, and to synchronize its address translation data with those of all the processors. This complexity is further aggravated by the fact that a coprocessor operations might be executing asynchronously relative to the process which invokes it, to the point where the invoking process may be no longer executing in the processor by the time the coprocessor completes executing on its behalf.
In order to support continuing improvements to the throughput of computer systems, and in particular to support continuing improvements to the design and capabilities of coprocessors which off-load work from one or more processors, there is a need for improved data accessing mechanisms in a coprocessor.