1. Field of the Invention
The present invention generally relates to multi-processor or multi-node coherent systems and, more particularly, to systems and methods that allow for the dynamic configuration (e.g., deconfiguration, reconfiguration, and the like) of one or more nodes of a directory-based, glueless, symmetric multi-processor system in a manner free of necessarily requiring a reboot of the system, changing of software configuration handles that reference input/output (I/O) devices interconnected to the system, and/or the like.
2. Relevant Background
A cache is a component that transparently stores data so that future requests for that data can be served faster. More specifically, a cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations. For instance, a central processing unit (CPU) cache is used by one or more CPUs of a computer or computing system to reduce the average time to access memory (e.g., to read executable instructions, read or write data, and the like). When the CPU needs to read from or write to a location in main memory, the CPU (e.g., the processing core) typically first checks whether a copy of the data is in the cache and then proceeds to read/write from/to the cache when the data is in the cache (i.e., when there is a “cache hit”). Otherwise (i.e., in the event of a “cache miss”), the CPU typically proceeds to read/write from/to the main memory (e.g., DRAM) and copies the data or instructions into the cache as a cache line (e.g., including the copied data/instructions and tag or memory location of the data/instructions) for subsequent use.
Oftentimes, a CPU may include and/or have access to a number of independent caches, such as an instruction cache to speed up executable instruction fetch, a data cache to speed up data fetch and store, and a translation lookaside buffer (TLB) used to speed up virtual-to-physical address translation for both executable instructions and data. Also, one or more of the caches may be made up of a hierarchy of cache levels (e.g., L1, L2, L3, and the like), where each successive level has greater storage capacity but also increased latencies. In this case, the CPU would first check the smallest level (e.g., L1) cache for the required instructions/data. If a cache hit results, the CPU can proceed at high speed; otherwise, the next larger cache (e.g., L2) is checked, and so on, before external memory is checked.
To be able to dynamically balance a workload among a number of processors (and, as a result, serve more users faster), symmetric multiprocessing may be implemented in a computer or system (e.g., as a symmetric multiprocessor (SMP)). Symmetric multiprocessing generally refers to the processing of programs by multiple processing units or nodes (e.g., each having one or more CPUs or cores, and/or with integrated system on chip (SOC) which includes a hierarchy of cache levels, memory interface, I/O device interface, and/or system directory (SD)) that share a number of components such as an operating system and main memory via a bus and/or data path. The main memory may be appropriately partitioned or divided into a number of sections (e.g., each associated with a particular address range) for use in storing shared data (e.g., accessible by any of the nodes), private data (e.g., accessible by only one or more particular nodes), and the like.
Generally, an SMP allows any of the nodes (e.g., 2, 4, and/or the like) to work on any task no matter where the data for that task is located in memory, provided that each task in the system is not in execution on two or more nodes at the same time. An SMP may have a “glueless” configuration in which the various nodes are connected with each other using coherence links, and the term “glueless” generally refers to the ability of one processor to communicate with other processors without special “glue” (e.g., circuitry, logic, and so on) to tie them all together. Assuming proper operating system support, an SMP can easily move tasks between processors to balance workloads efficiently.
In the event that a particular node in an SMP experiences a cache miss in relation to one or more of its caches, the node often checks the caches of the other nodes in the SMP before resorting to reading/writing the data/instructions from/to the main memory. In this regard, one or more of the nodes may include an SD that is operable to determine (e.g., via logic) where in the SMP a particular cache line resides. For instance, the SD may include node IDs (e.g., Node0, Node1, and so forth), corresponding physical addresses (PA) in memory for each node, cache line tags, logic, and the like. Upon a node experiencing a cache miss in its local cache, the node may then send a node request (NR) to the system directory which executes logic to determine whether a copy of the cache line is resident in the cache of another node and/or the main memory.
In SMPs, such as directory-based and glueless SMPs, many processes, modules and the like utilize contiguous, consecutive, or continuous node IDs (CNIDs) that generally represent a relative relation between configured nodes (e.g., such as CNID0, CNID1, CNID2, and so on), which may be different than a total number of nodes. For instance, upon experiencing a cache miss, each node typically maintains logic (e.g., hashing algorithm) that allows the node to determine which node is the directory node (DN) (i.e., the node in the SMP that maintains the SD for the cache miss address), where the logic generates a CNID for the DN between CNID0 and CNIDn-1 (where n is the total number of configured nodes, which may be equal to or less than the total node capacity of the SMP). Upon obtaining the DNCNID, the node can then attempt to locate the cache line in the cache of another node via accessing the DN. Furthermore, the SD is typically required to utilize CNIDs (e.g., as opposed to node IDs that are independent of the number of configured nodes of the SMP) to facilitate implementation of the same.