Historically, main memory was physically situated on a central bus. Within this type of system, memory requests consisting of full physical addresses were forwarded to the memory subsystem and the data was returned. In a distributed-memory system, main memory is physically distributed across many different cells. A cell may consist of a number of processors, one or more input/output (I/O) devices, a cell controller, and memory. Each cell holds a different portion of the main memory space. Each processor can access not only the local memory, but also the memories of other cells via cell communications link circuitry, such as one or more crossbar switches.
Caching can ameliorate the performance limitations associated with memory accesses. Caching involves storing a subset of the contents of main memory in a cache memory that is smaller and faster than main memory. Various strategies are used to increase the probability that cache contents anticipate requests for data. For example, since data near a requested word in memory address space is relatively likely to be requested near in time to the requested word, most caches fetch and store multi-word lines. The number of words stored in a single cache line defines the line size for a system. For example, a cache line might be eight words long.
Caches typically have far fewer line storage locations than main-memory. A “tag” is typically stored at each cache location along with data to indicate uniquely the main-memory line address owning the cached data.
In both single-processor and multi-processor systems, there is a challenge of ensuring “coherency” between the cache and main memory. For example, if a processor modifies data stored in a cache, the modification should be reflected in main memory. Typically, there is some latency between the time the data is modified in the cache and the time the modification is reflected in main memory. During this latency, the yet-to-be-modified data in main memory is invalid. Steps should be taken to ensure that the main memory data is not read while it is invalid.
In the case of a distributed-memory multi-processor system in which each processor or input/output module has a cache memory, the situation is somewhat more complex than for single processor systems having a cache memory. In a multi-processor system, the current data corresponding to a particular main memory address may be stored in one or more cache memories, and/or in a main memory. The data in a cache memory may have been operated on by a processor, resulting in a value that is different from the value stored in main memory. Thus, a “cache coherency scheme” is implemented to ensure that the current data value for any address is provided independent of where that data value resides.
Typically, “permission” is required to modify cached data. That permission is typically only granted if the data is stored in exactly one cache. Data stored in multiple caches is often treated as read only. Each cache line can include one or more state bits indicating whether permission is granted to modify data stored at that line. While the exact nature of the states is system dependent, there is typically a “privacy” state bit used to indicate permission to modify. If the privacy bit indicates “private,” only one cache holds the data and the associated processor has permission to modify the data. If the privacy bit indicates “public”, any number of caches can hold the data, but no processor can modify it.
In a multi-processor system, for a processor desiring to read or modify data, a determination is typically made regarding which caches, if any, have copies of the data and whether permission is given for modification of the data. “Snooping” involves examining the contents of multiple caches to make the determination. If the requested data is not found in the local cache, remote caches can be “snooped”. Recalls can be issued to request that private data be made public so that another processor can read it, or recalls can be issued to invalidate public data in some caches so that another cache can modify it.
For large numbers of processors and caches, exhaustive snooping can impair performance. For this reason, some distributed-memory multi-processor systems snoop within cells and rely on directory-based cache coherency for inter-cell coherency. A distributed-memory multi-processor system with directory-based cache coherency is described in U.S. Pat. No. 6,055,610, filed on Aug. 25, 1997, issued on Apr. 25, 2000, and entitled “DISTRIBUTED MEMORY MULTIPROCESSOR COMPUTER SYSTEM WITH DIRECTORY BASED CACHE COHERENCY WITH AMBIGUOUS MAPPING OF CACHED DATA TO MAIN-MEMORY LOCATIONS”, which is hereby incorporated herein by reference.
In a distributed-memory system employing directory-based cache coherency, the main memory of each cell typically associates a directory entry with each line of memory. Each directory entry typically identifies the cells caching the line and whether the line of data is public or private. The directory entries may also identify the particular cache(s) within a cell caching the data, and/or snooping may be used to determine which cache(s) within a cell has the data. Thus, each cell contains a directory indicating the location of cached copies of data stored in its main memory.
As an example, in an 8-cell system, each directory entry might be 9 bits long. For each of the eight cells, a respective “site” bit indicates whether or not that cell contains a cached copy of the line. The 9th, “privacy”, bit indicates whether the data is held privately or publicly.
It is occasionally desirable to move or migrate memory from one cell to another, or within a particular cell. For example, memory may be migrated from a defective memory device to a spare memory device. As another example, a board containing one or more memory devices may need to be removed from the system, perhaps because the board contains a defective component, the board is being replaced by a newer revision, or for some other reason. Before removing the board, it may be desirable to migrate the memory from the board to another location.
Memory migration typically occurs with operating system intervention, with the memory first being deallocated, and then later reallocated to the desired destination. With such prior art memory migration techniques, processes accessing the memory being migrated might stall, or the system may have to wait for the processes to terminate before the memory can be migrated. Thus, prior art memory migration techniques affect the operation of the software, and occasionally cause the system to be unavailable for use for a period of time. In addition, certain pages that the operating system and firmware need cannot be easily migrated using prior art migration techniques.
Memory may also be interleaved, causing additional difficulties for memory migration using conventional techniques. De-interleaving memory is not a simple task, and sometimes a de-interleaving solution does not exist.