Due to the increasing advance of science and technology, digitized information processing means plays a more and more important role on our daily lives and business activities. Consequently, the data processing amount is too huge to be operated by using a simple data processing device, such as a computer system with a single processor and a local memory. In order to efficiently deal with a large quantity of data, a multi-processor system is developed to solve this problem.
So far, two types of parallel data-processing systems have been used. One is the tightly coupled parallel data-processing system, and the other is loosely coupled parallel data-processing system.
The tightly coupled parallel data-processing system includes a plurality of central processing units (CPUs) and a memory accessible by all the CPUs. This architecture is extended from a single-CPU system so as to have a relatively simple design. Such system, however, has an inherent limit. Since the plurality of CPUs access the memory via a single common bus, the overall scale of the system cannot be too large. Aside from, the large number of CPUs will load heavy burden on the bus.
On the other hand, the loosely coupled parallel data-processing system is a system consisting of a plurality of computers interconnected via a high-speed network. Via a delicately designed topological architecture, the loosely coupled parallel data-processing system can be quite expansible, compared to the tightly coupled parallel data-processing system. In other words, a large number of processors can be included in the system. Since the communication of the entire system is conducted via network, the complexity of the architecture would be much more difficult than the tightly coupled parallel data-processing system in order to achieve high performance.
In order to solve the problems of the above systems, a processing system involving a distributed shared memory (DSM) is developed for parallel data-processing and rapid data-sharing purpose for a remote node to access a local memory. The DSM system has the advantages of both of the tightly and loosely coupled parallel data-processing systems. That is, the DSM system is simple and expansible. Since 1980, a plurality of DSM systems have been practiced. One of the examples is the cache coherency non-uniform memory access (ccNUMA) architecture.
Please refer to FIG. 1, which is a block diagram illustrating a conventional ccNUMA-type DSM system. The DSM system 10 includes four nodes 11˜14 interconnected by a network 15. The nodes 11˜14, as shown, include respective processors 111, 112, 121, 122, 131, 132, 141, 142, memory control chips 113, 123, 133, 143 for I/O control and memory access, local memories 1131, 1231, 1331, 1431, DSM controllers 114, 124, 134, 144, external caches or L3 caches 1141, 1241, 1341, 1441, system buses 115, 125, 135, 145, and internal buses 116, 126, 136, 146. Each of the local memories 1131, 1231, 1331, 1431 is divided into a plurality of local memory lines for separately storing data which are local primary data belonging to its own node. Likewise, each of the caches 1141, 1241, 1341, 1441 is divided into a plurality of cache lines for separately storing cache data which are foreign data belonging to local memories of other nodes. The presence of the caches is for saving time for accessing data from the local memories of other nodes.
Each of the DSM controllers 114, 124, 134, 144 maintains a memory coherency directory stored therein (not shown) in order to realize the states of all the local memory lines. When any of the nodes is going to read data from a specific local memory line, the reading operation is guided by the DSM controller according to the memory coherency directory. The DSM controller also maintains a cache coherency directory stored therein (not shown) in order to realize the states of all the cache lines. When any of the nodes is going to read data from a specific cache line, the reading operation is guided by the DSM controller according to the cache coherency directory.
Since the DSM controllers of all nodes communicate with one another via the network 15, a network communication protocol such as TCP/IP would be used as the data transmission format for inter-communication.
The states of each of the L3 cache lines indicated by the cache coherency directory include CLEAN, FRESH, DIRTY, VOID, IDLE and STALE states. The meanings of these states are described as follows:    CLEAN: The data in the cache line of the local node also exists in a remote node, which is so-called as a home node, and the data has not been modified by a certain processor of the local node in spite the certain processor exclusively owns the data;    FRESH: The data in the cache line has not been modified by any of the nodes, and is shared by all the nodes;    DIRTY: The data in the cache line has been modified and exclusively owned by a certain processor of the local node, and thus has become different from that existing in the home node;    VOID: The data in the cache line has been invalidated, and new data is permitted to be written into the same position of the L3 cache;    IDLE: The cache line is in a transition state waiting to receive new data; and    STALE: The cache line is in a transition state waiting to delete therefrom stored data.
The data maintenance of the L3 cache, although being normally operated according to the above states, it is possibly inefficient especially when the cache line is in the DIRTY state. As is known, the data in the cache line is exclusively owned by a certain processor of the local node in the DIRTY state. Therefore, when another processor of the local node is to access the data in the cache line, it has to assert another system bus transaction request to access the data from the certain processor of the same node. Consequently, the load of the system bus inside the node is increased and thus the system efficiency is adversely affected.
Further, the access dead lock problem likely to occur between any two nodes for processing parallel data is generally solved by operation system (OS). Once the operation system cannot deal with the dead lock problem in time due to an unstable state thereof or any other factor, the DSM system possibly halts so as to adversely affect the reliability of the DSM system.