This is a parallel processing system for enabling efficient distributed processing by providing each processor with its own cache memory and additional hardware in the cache to prevent multiple copies of the data from being modified improperly by competing processors.
Cache memories are usually discussed in terms of a small highspeed "scratch-pad" memory being maintained near the associated processor, or on the same chip, to speed up the fetch or store time. Data is swapped between the cache and the main memory when needed, and there is no consistency problem. The consistency problem appears when there are multiple processors, each with its own cache memory. Then the data may become inconsistent between cache chips.
In a distributed processing system such as the electronic subsystem of a copier/printer, there may be one or more processors for each main unit such as the output scanner, input scanner and display, all processors communicating through a common data bus. Each processor may have its own local, or cache, memory, and operate independently of the others.
In some cases one processor is not enough to control a machine function. An example would be where there is a significant amount of image manipulation in the form of enhancement, compression, rotation and the like. These operations consume large amounts of computer power, and typically must be processed rapidly. In this case, either a larger processor, or several processors in parallel, must be used.
As a matter of economy, it is convenient to design one basic set of processor and memory chips, and use them for all applications. Therefore, it would be advantageous to have a system where each machine function uses one or several identical parallel, or concurrent, processors and cache memories, resulting in a significant number of parallel processors for the entire system.
In many of these cases there is information that must be shared by several processors, and this data must be stored in a central memory. In the case where common data is needed by several processors, there is a possibility that two processors may access and modify the same data packet, resulting in two versions. Some software or hardware system must be put in place to prevent this. One obvious way is to flag any data in main memory that has been accessed by one processor, and inhibit any other processor from using that data until the first processor returns the updated version to main memory. This results in a transfer of the problem to the software to an extent, and may slow the system speed which, of course, defeats the purpose of having parallel processing in the first place.
An alternative is for any cache memory that has updated a line of data to invalidate the same data wherever it is stored in any other cache, and at the same time, write the new data into main memory. Then, any other cache must reload that data from memory before using it. This system takes a certain amount of time, and degrades system performance.
In virtual memory systems, in addition to the problem of maintaining consistency, there is also the problem of translating virtual to real addresses. This occurs because the entire data base is typically stored on a disk as virtual memory, the pages currently being accessed are stored in main memory as real memory, and the processors ask for data by using the virtual address. When a processor issues a virtual address, a look-up of some kind is required to translate that into a real address to determine if it is available in the real memory.
As a numerical example, assume that the virtual address is 24 bits, and each word is 16 bits, resulting in a total memory capacity of 256 Mbits. There will be 512 bytes to a page, so the total number of pages is 64K. Then, if there is one entry for each page, there would be 64K entries, which would be a reasonable map table if kept in main memory. Then, for every virtual address received, the memory could get a page address, and use that to address itself to get the data. Of course this is a slow process because there are two address cycles per fetch, and that is assuming that the data is in real memory. If not, it must be fetched from virtual memory.
There are some chip sets that get around this problem to some extent by storing the most significant bits of the most recently used addresses, to limit the size of the conversion table, and to speed up the process. The effect is to cache a part of the address map as well as the data.
Ultimately, the goal is to create a system which will simultaneously maintain data consistency and high speed. Stated another way, the problem is to optimize the caching of both the address map and the data.
There are two main types of systems used in the prior art. In one system, the processors will ask for data by specifiying the real address, and the cache, which contains a CAM (content addressable memory) containing the real addresses of the data in RAM (randon access memory) will directly supply the data if it is available. The other type is where the processor asks for data by specifying the virtual address and the real memory's memory management unit must convert that to a real address before supplying the data. What is needed is a system where the processor can ask for the data by its virtual address, and the cache will supply it directly. Of course, the system must also maintain high speed and consistency of data.
A review of the prior art was published by Katz et al, entitled "Implementing a Cache Consistency Protocol, International Symposium on Computer Architecture (12th: 1985: Boston, Mass). There is described "The Berkeley Ownership Protocol" which advances the concept of Ownership of data blocks. Owning the block allows the Owner to update the data, and obligates the Owner to provide data to requesting caches, and to update the memory when the block is replaced in the cache. If the data is Owned Exclusively then the cache can write into it without informing other caches. The UnOwned state carries no responsibilities or privileges, the data can not be written into without acquiring Ownership, which takes a special set of instructions. In Nonexclusive Ownership, data is written into the local block, and the shared data in all other caches is invalidated. The main difference between this prior system and the invention described below is that in this invention in addition to providing virtual memory support any processor can write into its cache at any time, whether the data is shared or not, and automatically become the "master" of that data without the requirement of acquiring the Ownership first. Detection of "shared" state and doing a write-thru to update other caches is done only when necessary in an optimal way that is transparent to the processor.