A multiprocessor system is a system that includes two or more processors that have similar functions. The processors can exchange data with each other, and all the processors share a memory, an input/output (I/O) device, a controller, and an external device. The entire hardware system is controlled by a unified operating system to implement concurrency of jobs, tasks, programs, and arrays and elements therein at all levels, between the processor and a program.
Most commercial cache coherent non-uniform memory access (CC-NUMA) multiprocessor systems use a directory-based cache coherence protocol, storages in the CC-NUMA multiprocessor system are physically distributed, and all local storages form shared global address space. A most obvious advantage of the CC-NUMA multiprocessor system is that a programmer does not need to explicitly allocate data to a node, and hardware and software in the system automatically allocate data to each node. During program running, cache coherence hardware automatically transfers data to where the data is required.
Referring to FIG. 1, FIG. 1 is a schematic architectural diagram of a CC-NUMA multiprocessor system. In the system, a plurality of nodes (for example, a Node 0, a Node 1, a Node 2, . . . , and a Node N) are interconnected using a network, each node includes a node controller (for example, an NC 0 in the Node 0 and an NC 2 in the Node 2) and at least one processor (for example, a central processing unit (CPU) 0, . . . , and a CPU n), and each processor includes a memory and a level 3 cache (L3).
In the CC-NUMA multiprocessor system, for all the nodes, a total L3 capacity is approximately hundreds of megabytes (MB), while a total memory capacity is usually approximately dozens of terabytes (TB). Generally, a majority of data cached by the processor is located in the memory, only a minority of data is located at L3, and even less data is cached at L3 of another node. To record data cached by another node, a directory (also referred to as DIR) is disposed on the node controller to store information about caching of data in a current node by a processor in the other node. For example, if data in the node 2 is cached by L3 of the CPU 0 in the node 0, a directory corresponding to the NC 2 records that the data is cached by the node 0, and records that a cache status of the data is shared or exclusive.
Referring to FIG. 2, FIG. 2 is a schematic structural diagram of a directory in a CC-NUMA multiprocessor system. The directory uses a cache structure, and the structure includes a tag array including tag entries of j groups (for example, group0, group1, group2, group3, . . . , groupj−2, and groupj−1) and m ways (for example, way0, way1, . . . waym−2, and waym−1) and a vector array including vector entries vector entries of j groups (for example, group0, group1, group2, group3, . . . , groupj−2, and groupj−1) and m ways (for example, way0, way1, . . . , waym−2, and waym−1). Both m and j are natural numbers, and the tag entries and the vector entries are in a one-to-one correspondence. Each tag entry includes a tag field and an occupation state field. The tag field is a most-significant-bit address of a memory address corresponding to data that the system is to access, and the state field includes an exclusive state and a shared state. Each vector entry includes a significant bit V field and a share vector field, and the share vector field is used to indicate a node that occupies the data exclusively or shares the data.
When the system accesses cross-node data, a node controller queries the directory according to a memory address corresponding to the data. If a node corresponding to the memory address is found, the data is extracted from the node. The memory address includes a tag field, an index field, and an offset field. Further, the node controller first determines a first group in a tag array according to the index field and determines a second group in a vector array according to the index field, and then sends tag fields corresponding to all tag entries in the first group to a comparator such that the comparator compares the tag fields with the tag field corresponding to the memory address. If a tag field in the tag fields matches the tag field corresponding to the memory address, a tag entry corresponding to the memory address is determined in the first group. Because the tag entries and the vector entries are in a one-to-one correspondence, a vector entry corresponding to the memory address may be determined in the second group according to the tag entry. Finally, the node controller extracts data corresponding to the memory address from a node that is indicated by the vector entry and that occupies the data exclusively or shares the data.
In the foregoing directory, the tag entries and the vector entries are in a one-to-one correspondence. The directory has highest precision in terms of data searching, but a directory capacity is also the largest. A larger directory capacity leads to a larger area, higher power consumption, a longer query time, and lower query efficiency.