1. Field of the Invention
The present invention relates generally to data processing systems employing multiple instruction processors and more particularly relates to multiprocessor data processing systems employing multiple dayclocks in conjunction with multiple levels of cache memory.
2. Description of the Prior Art
It is known in the art that the use of multiple instruction processors operating out of common memory can produce problems associated with the processing of obsolete memory data by a first processor after that memory data has been updated by a second processor. The first attempts at solving this problem tended to use logic to lock processors out of memory spaces being updated. Though this is appropriate for rudimentary applications, as systems become more complex, the additional hardware and/or operating time required for the setting and releasing of locks can not be justified, except for security purposes. Furthermore, reliance on such locks directly prohibits certain types of applications such as parallel processing.
The use of hierarchical memory systems tends to further compound the problem of data obsolescence. U.S. Pat. No. 4,056,844 issued to Izumi shows a rather early approach to a solution. The system of Izumi utilizes a buffer memory dedicated to each of the processors in the system. Each processor accesses a buffer address array to determine if a particular data element is present in its buffer memory. An additional bit is added to the buffer address array to indicate invalidity of the corresponding data stored in the buffer memory. A set invalidity bit indicates that the main storage has been altered at that location since loading of the buffer memory. The validity bits are set in accordance with the memory store cycle of each processor.
U.S. Pat. No. 4,349,871 issued to Lary describes a bussed architecture having multiple processing elements, each having a dedicated cache memory. According to the Lary design, each processing unit manages its own cache by monitoring the memory bus. Any invalidation of locally stored data is tagged to prevent use of obsolete data. The overhead associated with this approach is partially mitigated by the use of special purpose hardware and through interleaving the validity determination with memory accesses within the pipeline. interleaving of invalidity determination is also employed in U.S. Pat. No. 4,525,777 issued to Webster et al.
Similar bussed approaches are shown in U.S. Pat. No. 4,843,542 issued to Dashiell et al, and in U.S. Pat. No. 4,755,930 issued to Wilson, Jr. et al. In employing each of these techniques, the individual processor has primary responsibility for monitoring the memory bus to maintain currency of its own cache data. U.S. Pat. No. 4,860,192 issued to Sachs et al, also employs a bussed architecture but partitions the local cache memory into instruction and operand modules.
U.S. Pat. No. 5,025,365 issued to Mathur et al, provides a much enhanced architecture for the basic bussed approach. In Mathur et al, as with the other bussed systems, each processing element has a dedicated cache resource. Similarly, the cache resource is responsible for monitoring the system bus for any collateral memory accesses which would invalidate local data. Mathur et al, provide a special snooping protocol which improves system throughput by updating local directories at times not necessarily coincident with cache accesses. Coherency is assured by the timing and protocol of the bus in conjunction with timing of the operation of the processing element.
An approach to the design of an integrated cache chip is shown in U.S. Pat. No. 5,025,366 issued to Baror. This device provides the cache memory and the control circuitry in a single package. The technique lends itself primarily to bussed architectures. U.S. Pat. No. 4,794,521 issued to Ziegler et al, shows a similar approach on a larger scale. The Ziegler et al, design permits an individual cache to interleave requests from multiple processors. This design resolves the data obsolescence issue by not dedicating cache memory to individual processors. Unfortunately, this provides a performance penalty in many applications because it tends to produce queuing of requests at a given cache module.
The use of a hierarchical memory system in a multiprocessor environment is also shown in U.S. Pat. No. 4,442,487 issued to Fletcher et al. In this approach, each processor has dedicated and shared caches at both the L1 or level closest to the processor and at the L2 or intermediate level. Memory is managed by permitting more than one processor to operate upon a single data block only when that data block is placed in shared cache. Data blocks in dedicated or private cache are essentially locked out until placed within a shared memory element. System level memory management is accomplished by a storage control element through which all requests to shared main memory (i.e. L3 level) are routed. An apparent improvement to this approach is shown in U.S. Pat. No. 4,807,110 issued to Pomerene et al. This improvement provides prefetching of data through the use of a shadow directory.
A further improvement to Fletcher et al, is seen in U.S. Pat. No. 5,023,776 issued to Gregor. In this system, performance can be enhanced through the use of store around L1 caches used along with special write buffers at the L2 intermediate level. This approach appears to require substantial additional hardware and entails yet more functions for the system storage controller.
Inherent in architectures which employ cache memory, is that the storage capacity is substantially less than the memory located at lower levels in the hierarchy. As a result, memory locations within the cache memory must often be cleared for use by other data quantities more recently needed by the instruction processor. For store-in cache memories, this means that those quantities modified by the instruction processor must first be rewritten to system memory before the corresponding location is available to store newly requested data. This xe2x80x9cflushingxe2x80x9d process tends to delay the availability of the newly requested data.
A problem very similar to the maintenance of cache memory coherency is the synchronization of dayclock information. In real time systems, an accurate dayclock is required to enable time tagging of data and events. Such time keeping is also necessary to provide certain priority determinations when these rest, at least in part, on relative timing. The use of a single clock by multiple processors tends to be slow and cumbersome. However, providing multiple dayclocks necessitates synchronization to ensure consistency to time tagging amongst processor.
The present invention overcomes the problems found in the prior art by providing a method of and apparatus for synchronizing multiple dayclocks within a multiple processor system having hierarchical memory. This synchronization process is basically provided by the cache memory coherency maintenance facilities.
The preferred mode of the present invention includes up to four main memory storage units. Each is coupled directly to each of up to four xe2x80x9cpodxe2x80x9ds. Each pod contains a level three cache memory coupled to each of the main memory storage units. Each pod may also accommodate up to two input/output modules.
Each pod may contain up to two sub-pods, wherein each sub-pod may contain up to two instruction processors. Each instruction processor has two separate level one cache memories (one for instructions and one for operands) coupled through a dedicated system controller, having a second level cache memory, to the level three cache memory of the pod.
Each instruction processor has a dedicated system controller associated therewith. A separate dayclock is located within each system controller.
Unlike many prior art systems, both level one and level two cache memories are dedicated to an instruction processor within the preferred mode of the present invention. The level one cache memories are of two types. Each instruction processor has an instruction cache memory and an operand cache memory. The instruction cache memory is a read-only cache memory primarily having sequential access. The level one operand cache memory has read/write capability. In the read mode, it functions much as the level one instruction cache memory. In the write mode, it is a semi-store-in cache memory, because the level two cache memory is also dedicated to the instruction processor.
In accordance with the present invention, the level two cache memory is of the store-in type. Therefore, the most current value of an operand which is modified by the corresponding instruction processor is first located within the level two cache memory. When the replacement algorithm for the level two cache memory determines that the location of that operand must be made available for newly requested data, that operand must be xe2x80x9cflushedxe2x80x9d into the lower level memory to avoid a loss of the most current value.
Waiting for flushing of the old data before requesting the new data induces unacceptable latency. Therefore, according to the present invention, a flush buffer is provided for temporary storage of the old data during the flushing process. Though this temporary storage appears at first to be a mere extension to the level two storage capacity, it greatly enhances efficiency because the flush process really does not need to utilize the level two cache memory.
The old data is moved from the level two cache memory to the flush buffer as soon as the replacement algorithm has determined which data to move, and the newly requested data is requested from the lower level memory. The flush process subsequently occurs from the flush buffer to the lower level of memory without further reference to the level two cache. Furthermore, locations within the level two cache memory are made available for the newly requested data well before that data has been made available from the lower level memory.
In accordance with the preferred mode of the present invention, each storage controller has a dedicated dayclock. With more than one storage controller in the system, the dayclocks run in a master/slave relationship. Each dayclock has a register within the corresponding system controller which is readily accessible by the associated instruction processor.
Each system controller increments its own dayclock register every microsecond. This incrementation is generated by the bus logic, thereby synchronizing it within the system controller. An individual dayclock may be have its contents invalidated and updated from the master in a manner similar to a dedicated cache memory location.