The present invention relates to read/write interfaces between processors and memories. More generally, it relates to interfaces between clients of a memory mapped resource and that resource. In a particular embodiment, the invention provides a solution to the problem of efficiently using the interface while still ensuring that reads and writes are performed in proper sequence when a particular sequence is required.
In order to clearly explain the problems and solutions of memory interfaces, several definitions are here provided:
"Memory" refers to a memory system, which may include data paths, controller chips, buffers, queues, and memory chips. While this disclosure describes the problems and solutions in data storage memory, it should be understood that the problems and solutions can be generalized in many cases for memory-mapped circuits which perform more than just storage of data (e.g,, memory-mapped I/O, memory-mapped compute devices).
A "memory location" (or simply "a location") is an individually addressable unit of the memory that can be addressed and that holds data (or transports the data to and/or from an I/O device or a compute device).
A "client" is a central processing unit (CPU), processor, I/O controller or other device which uses the services provided by the memory system. In many instances herein, a statement refers to a processor by way of example; it should be understood that the processor is only one example of a client and the statement is equally applicable to other, nonprocessor clients.
A "request" is an action performed by a client in using the services of a memory system.
A "read request" (or simply "a read") is a request from a client to the memory requesting the contents of a memory location specified by an address of the memory location to be read; the read request is accompanied by the address of the read memory location.
A "write request" (or simply "a write") is a request from a client to the memory requesting that the memory place a write value into a write memory location; the write request is accompanied by the write value and the address of the write memory location.
An "acknowledgment" (or simply "an ack") is an indication returned by the memory to the client indicating that a request has been satisfied; an acknowledgment to a read request includes the data read from the specified memory location.
"Pending reads" is the set of read requests which are pending; a read request is "pending" from the time it is accepted by the memory until the memory issues an ack.
"Pending writes," analogous to pending reads, is the set of write requests which are pending; a write request is "pending" from the time it is accepted by the memory until the memory issues an acknowledgment.
When building memory systems for large computers, one feature which provides for high performance is concurrency, wherein more than one memory operation is in progress at the same time. One limitation on concurrency is that a CPU, or other client, requires memory consistency. A memory appears consistent when a "read" of a memory location returns a value most recently "written" in that location. In some systems with concurrency, reads and writes are reordered into an optimized execution order to achieve higher performance, however this may lead to loss of consistency. For example, if a write to location A changes the value there from "X" to "Y", and a read follows the write, the result of the read is the value "Y". However, if the read and write are reordered in an optimization step, the read will return the erroneous value "X". Therefore, any optimization process must ensure that the reads and writes are performed such that the read returns the correct value.
Memory consistency is essential, as the purpose of a memory is to retain a data value associated with each memory location. A read request addressed to a particular location (address) will return the current value held in that location. A write request addressed to that location will change the current value (unless, by coincidence, the old value and new value are the same). Consistency is easy to implement if memory requests are always processed in exactly the same order as they are issued by the client. Preserving the order exactly, however, is often not possible in high-performance memory designs which may need to reorder requests to speed up processing. For example, the system requirements might be such that read requests must be completed faster than writer requests because pending read requests hold up processing until the read data is returned.
However reordering of requests is done, it must not violate the consistency that is inherent in the one-request-at-a-time memory model described above. One set of reordering constraints are as follows:
Rule 1: A read of location X followed by a write of location X cannot be reordered among themselves.
Rule 2: A write of location X followed by a read of location X cannot be reordered among themselves.
Rule 3: A write of location X followed by another write of location X cannot be reordered among themselves.
Where one of the above rules hinders performance of the system, schemes have been proposed to modify requests in some cases to maintain consistency. For example:
Rule A. If a write (W1) of location X is followed by a read (R2) of that same location X, read R2 can be acknowledged immediately, with the acknowledgment reporting the data value which was to be written as part of write W1.
Rule B. If a write (W1) of location X is followed by a write (W2) of that same location X, write W1 can be acknowledged immediately, without actually doing a write. Write W2 is allowed to proceed normally.
Rule A is often implemented by adding "store buffers" to the processor. Rule B is almost never implemented because its performance advantage is very slight. Nonetheless, Rules A and B give some insight into what can be done at the processor to increase concurrency and thus improve performance while still maintaining consistency.
A simple approach to improve performance is to defer write requests in order to satisfy a read request, so that the client can proceed with its computation using the read result sooner than if it had to wait until after the write request was complete. However, if the deferred write would have changed the result of the read, consistency is violated. Various schemes have been devised for maintaining consistency in cases such as this.
One such scheme for enforcing ordering requirements is described in Section 8.4.3 of the SUN SPARC-V9 manual. That manual was published by the Assignee of the present invention and is incorporated herein for all purposes. Section 8.4.3, beginning on page 124, describes the "MEMBAR" instruction. An MEMBAR instruction provides a way for a programmer to enforce an order of reads and writes issued by client. MEMBAR instructions are interspersed in instructions codes executed by a processor. When a processor is executing instructions and encounters a MEMBAR instruction, it holds up further read and write operations until the operations which preceded the MEMBAR instructions have completed.
U.S. Pat. No. 5,748,539 (Patent application No. 08/811,909 filed Mar. 5, 1997 and entitled "Recursive Multi-Channel Interface", hereinafter "Sproull-Sutherland") discloses a method of determining whether the read and write operations have been completed (that patent/application is commonly assigned to the assignee of the present application and is incorporated herein by reference for all purposes).
Referring again to the SUN SPARC-V9 manual, that reference explains how the ordering constraints are enforced by the processor. There, given a first operation and a second operation, if the second operation must not be performed before the first operation, the execution unit delays the submission of the second operation to the memory until the first operation is no longer pending. This is disadvantageous in systems where the processor-memory interface is bandwidth limiting, as the deferral of submission of the second operation by the processor to the memory may result in lower performance if the bandwidth of the interface is left idle when the processor could have used the idle time to send the request for the second operation.
If the memory system is designed in such a way that read requests and write requests are processed by separate paths, consistency checks cannot be easily performed and the client maintains consistency by allowing only a single pending request at once. To achieve more performance in the dual-path memory system, several requests must be pending at once. The consistency constraint must be relaxed to require only that a new request a client is about to send cannot violate consistency if reordered with respect to any pending request. This check can be accomplished if the client retains a record of pending reads and pending writes and checks each new request against the pending requests before the new request is issued. If the client has a "store buffer" as mentioned above, the store buffer may do some of this checking.
The client maintains its record of pending reads and writes by noting (a) when it issues each new request and (b) when each request is eventually acknowledged, signifying that the request is no longer pending. However, where the processor holds up an operation instead of using a bandwidth-limited interface whenever the interface is available, performance may be lost as extra time would be needed to send the held-up request and the critical path involving that request would be lengthened.
Much has been written on memory consistency and memory coherence models which use memory barrier, or "fence", instructions, but those are generally used in the context of multiprocessor systems where the goal is to properly order instructions among multiple processors, without fully addressing whether or not each individual processor sees a consistent order.
Memory consistency and memory coherence models generally exist in the context of a multiprocessor systems, and therefore address how operations of separate processors are ordered among themselves rather than how an individual processor is assured a consistent order. Nonetheless, it is known how to address memory consistency in the context of single processors. For example, the following references describe commercial microprocessors and their corresponding multiprocessing systems using some of these techniques:
(1) K. Gharachorloo et al., "Two Techniques to Enhance the Performance of Memory Consistency Models," Proceedings of the 1991 Intern. Conf. on Parallel Processing, I:355-364, August, 1991;
(2) S. Adve et al., "Weak Ordering--a New Definition," Proc. 17th Intern. Symp. on Comp. Arch., June 1990, pp. 2-14;
(3) K. Gharachorloo et al., "Hiding Memory Latency using Dynamic Scheduling in Shared-Memory Multiprocessors," Proc. 19th Intern. Symp. on Comp. Arch., May 1992, pp. 22-33 (surveying the field in its "Background" section);
(4) C. Scheurich et al., "Correct Memory Operation of Cache-Based Multiprocessors," Proc. 14th Intern. Symp. on Comp. Arch., June 1987, pp. 234-243;
(5) K. Gharachorloo et al., "Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors," Proc. 17th Intern. Symp. on Comp. Arch., June 1990, pp. 15-26;
(6) D. Lenoski et al., "The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor," Proc. 17th Intern. Symp. on Comp. Arch., June 1990, pp. 148-159; and
(7) D. Fenwick et al., "The AlphaServer 8000 Series: High-end Server Platform Development," Digital Technical Journal, vol. 7, no. 1, August 1995 (hereinafter "Fenwick"),
each of which is incorporated herein for all purposes. Fenwick appears to show how barrier instructions operate in the context of the Alpha 21164 microprocessor, built by Digital Equipment Corporation. There, a barrier instruction, MB or "memory barrier", is provided. The MB instruction is reported off-chip, and may be used at the interface between the microprocessor and the memory bus, but the MB instructions do not apparently pass over the memory bus. As Fenwick indicates, since the memory system processes requests in the order they are issued by the processor, the MB information is not needed beyond the bus. A similar instruction is used in the memory interface of most microprocessors (for example, waiting for all pending memory transactions to complete before allowing any new memory requests to be issued), but the interface circuitry is commonly provided on the microprocessor chip itself.
Therefore, what is needed is a processor-memory interface which allows the processor to enforce execution order of concurrently submitted operations, even when multiple operations required to be ordered are submitted to the memory which may reorder operations for its own purposes.