1. Technical Field
The present invention relates to data processing systems and, more particularly, to data processing systems that use relatively high speed cache memory in addition to relatively low speed main memory. More particularly still, the present invention relates to a data processing system having a cache memory that gates off interrupts until a specific number of instruction occur using an quantization execution protocol.
2. Description of the Related Art
Cache memory has long been used in data processing systems to decrease the memory access time for the central processing unit (CPU) thereof. A cache memory is typically a relatively high speed, relatively small memory in which active portions of a program and/or data are placed. The cache memory is typically faster than main memory by a factor of five to ten and typically approaches the speed of the CPU itself. By keeping the most frequently accessed instruction or data or both, in the high speed cache memory, the average memory access will approach the access time of the cache.
The active program instructions and data may be kept in a cache memory by utilizing the phenomena known as "locality of reference". The locality of reference phenomena recognizes that most computer program instruction processing proceeds in a sequential fashion with multiple loops, and with the CPU repeatedly referring to a set of instructions in a particular localized area of memory. Thus, loops and subroutines tend to localize the references to memory for fetching instruction. Similarly, memory references to data also tend to be localized, because table look-up routines or other iterative routines typically repeatedly refer to a small portion of memory.
In view of the phenomena of locality of reference, a small, high speed cache memory may be provided for storing a block of memory containing data and/or instructions which are presently being processed. Although the cache is only a small fraction of the size of main memory, a large fraction of memory requests over a given period of time will be found in the cache memory because of the locality of reference property of programs.
In a CPU which has a relatively small, relatively high speed cache memory and a relatively large, relatively low speed main memory, the CPU examines the cache when a memory access instruction is processed. If the desired word if found in cache, it is read from the cache. If the word is not found in cache, the main memory is accessed to read that word, and a block of words containing that word is transferred from main memory to cache memory. Accordingly, future references to memory are likely to find the required words in the cache memory because of the locality of reference property.
The performance of cache memory is frequently measured in terms of a "hit ratio". When the CPU refers to memory and finds the word in cache, it produces a "hit". If the work is not found in cache, then it is in main memory and it counts as a "miss". The ratio of the number of hits divided by the total CPU references to memory (i.e. hits plus misses) is the hit ratio. Experimental data obtained by running representative programs has indicated that hit ratios of 0.9 (90%) and higher may be obtained. With such high hit ratios, the memory access time of the overall data processing system approaches the memory access time of the cache memory, and may improve the memory access time of main memory by a factor of five to ten or more. Accordingly, the average memory access time of the data processing system can be improved considerably by the use of a cache.
Data processing systems are typically used to perform many independent tasks. When a task is first begun, the hit ratio of the cache is typically low because the instructions and/or data to be performed will not be found in the cache. Such a cache is known as a "cold" cache. Then, as processing of a task continues, more and more of the instructions and/or data which are needed may be found in the cache. The cache is then referred to as a "warm" cache because the hit ratio becomes very high.
In order to maximize the hit ratio, many data processing system architectures allow system control over the use of the cache. For example, the cache may be controlled to store instruction only, data only, or both instructions and data. Similarly, the cache may be controlled to lock a particular line or page in the cache, without allowing overwrites.
Cache memory is often used in high speed data processing system architectures which also often include multiple interrupt levels. As is well known to those having skill in the art, an interrupt may be an "external" interrupt, for example from a keyboard, disk drive, or other peripheral unit, or may be an "internal" interrupt from an internally generated timer. Upon occurrence of an interrupt, a first (interrupting) task is performed. The interrupted task may be resumed after completion of the interrupting task.
One example of a cache operating system that operates with interrupt requests is a visible caching operating system (VCOS, which is a trademark of AT&T) and is a real time system having a visible cache. A visible cache is a cache that is visible to the users of the system, unlike conventionally hidden caches, which are typically only visible to the operating system. In VCOS, there is only one execution frame and in that frame, tasks are executed sequentially. VCOS cache systems are typically small, around one to two k-words, and store program code and data in the host system's memory since the system does not have its own memory beyond the cache. The caches are kept small so that data can be easily transferred from the host memory to the cache without significantly degrading the system's real time performance. All tasks must fit within the cache, as the miss penalty is severe if data has to be fetched from the host memory. The cache load and unload takes place sequentially before and after a task's execution. This is moderated by use of large frame sizes, typically 10 milliseconds (ms), so that the tasks are correspondingly larger, as they must operate on a larger number of samples. The load and unload time is, therefore, not as large a percentage of the execution time as it otherwise would be.
The tasks that can execute on this type of a system are limited by both the cache size and the frame size. Tasks must be able to function in the given execution frame. If they need a shorter frame, they must do multiple frames' worth of work in the one larger frame, and if they need a larger frame, they must either split up the work among multiple frames, or do nothing in some frames. Whenever a task is idle in some frames is a waste of real time resources. Also, unless a task is executing in its desired frame, or possibly a multiple thereof, it will be inefficient; however, this solution leads to a functioning caching system, which is not trivial. Tasks are limited to the cache size because the data signal processor executes out of host memory space, so there is no effective way to service a cache miss without forfeiting all of the system's real time capabilities.
Yet another solution to real time caching problem is the strategic memory allocation for real time (smart) cache. The smart cache was developed for use in real time systems that use rate monotonic scheduling; however, the smart cache should work in systems using any preemptive scheduling algorithm. A system using the smart cache can achieve processor utilization within 10% of a system using a normal cache, and the smart cache is deterministic.
The smart caching scheme entails dividing the cache into partitions. These partitions are then allocated among the tasks that are to be run on the processor. One larger partition is reserved as the shared pool and acts as a normal cache. The shared pool allows shared memory to be cached without complicated cache coherency schemes. The shared pool can also be used to cache tasks when there are no available private partitions remaining.
The allocation of partitions is static since, if it were dynamic, some tasks would miss their deadlines while the cache was being reallocated. This means that tasks meant to be run on the processor must be known ahead of time, so that a static analysis can be made to determine which tasks can use which partitions. Partitions can be reallocated dynamically, but it is time consuming. When reallocating partitions, this can lead to tasks missing their deadline before completion. The allocation scheme, however, is useful in that it allocates partitions in such a manner that the utility of each cache partition is maximized. Unfortunately, to accomplish this, it is necessary to have detailed information about each task's hit and miss rates given an arbitrary sized private cache. Once the partitions are allocated, the tasks own their partitions. Essentially, this means that only the task to which a partition is allocated can use that partition, so that data is preserved across preemptions.
Another drawback to the smart cache is that it requires additional instructions in the instruction set to control which partitions are being accessed, either the private ones or the shared ones. This requires that the programmer or the compiler know too much about the actual hardware and implementation, which may change. This makes it difficult to write software that can run on a variety of systems with different caches, as well as systems that use Static Random Access Memory or SRAM and no cache.
Accordingly, what is needed is a caching scheme that minimizes preemptive priority driven scheduling so that the preemption does not occur at a random time, but a critical time, such as when a cache is loaded with a line or page ready for execution.