1. Field of the Invention
This invention relates generally to processor-based systems, and, more particularly, to providing a higher bandwidth, lower-latency implementation of a scaled shared memory (SSM) protocol.
2. Description of the Related Art
Businesses typically rely on network computing to maintain a competitive advantage over other businesses. As such, developers, when designing processor-based systems for use in network-centric environments, may take several factors into consideration to meet the expectation of the customers, factors such as functionality, reliability, scalability, and performance of such systems.
One example of a processor-based system used in a network-centric environment is a mid-range server system. A single mid-range server system may have a plurality of system boards that may, for example, be configured as one or more domains, where a domain, for example, may act as a separate machine by running its own instance of an operating system to perform one or more of the configured tasks.
A mid-range server, in one embodiment, may employ a distributed shared memory system, where processors from one system board can access memory contents from another system board. The union of all of the memories on the system boards of the mid-range server comprises a distributed shared memory (DSM).
One method of accessing data from other system boards within a system is to broadcast a memory request on a common bus. For example, if a requesting system board desires to access information stored in a memory line residing in a memory of another system board, the requesting system board typically broadcasts on the common bus its memory access request. All of the system boards in the system may receive the same request and the system board whose memory address ranges match the memory address provided in the memory access request may then respond.
The broadcast approach for accessing contents of memories in other system boards may work adequately when a relatively small number of system boards are present in a system. However, such an approach may be unsuitable as the number of system boards grows. As the number of system boards grows, so does the number of memory access requests, thus to handle this increased traffic, larger and faster buses may be needed to allow the memory accesses to complete in a timely manner. Operating a large bus at high speeds may be problematic because of electrical concerns, in part, due to high capacitance, inductance, and the like. Furthermore, a larger number of boards within a system may require extra broadcasts, which could further add undesirable delays and may require additional processing power to handle the extra broadcasts.
Designers have proposed the use of directories in a distributed shared memory system to reduce the need for globally broadcasting memory requests. Typically, each system board serves as a home board for memory lines within a selected memory address range, and where each system board is aware of the memory address ranges belonging to the other system boards within the system. Each home board generally maintains its own directory for memory lines that fall within its address range. Thus, when a requesting board desires to access memory contents from another board, instead of generally broadcasting the memory request in the system, the request is transmitted to the appropriate home board. The home board may consult its directory and determine which system board is capable of responding to the memory request and identify any system boards that need to be informed of the request.
Directories are generally effective in reducing the need for globally broadcasting memory requests during memory accesses. However, implementing a directory that is capable of mapping every memory location within a system board generally represents a significant memory overhead. As such, directory caches are often designed to hold only mappings for a subset of the total memory. The system typically must use some other method, such as broadcasting, to resolve requests for memory that are not currently mapped in the directory cache.
Communication requests between the multiple boards described above (e.g., the requesting board and the home board) generally cause them to develop a client/server relationship. Communications between the multiple boards with client/server relationships may experience an inherent latency of operation during communications between the client and the server. Many times, several system clock cycles may pass during which no significant activity relating to transactions between the client and the server is accomplished. This results in communication latency, which may adversely affect the operation of the server.
Often, latency in communications between the requesting board and the home board may cause several portions of a transaction request to be placed in a queue. An appreciable number of requests may be queued, which may slow the operation of the server. While transaction requests are queued, several system clock cycles may be bypassed due to the latency of communication operations. This may cause a backlog to develop in a queue, which may slow the operation of the server.
The present invention is directed to overcoming, or at least reducing, the effects of, one or more of the problems set forth above.