Ever since the first computers, programmers have wanted unlimited amounts of fast memory. The processing times have decreased considerably during recent years, but the access times to different kinds of memories have not developed with the same rate. Already from the beginning of computer science, one has been aware of that certain large advantages can be achieved by organising the memory system in a hierarchy. The use of caches is one of the major performance enhancements of modern microprocessors. The term “cache” is here intended to be used for “the level of the memory hierarchy between the central processing unit and the main memory”. One important feature of cache memories is fast storage of certain data, taking advantage of locality of access. The basic principle is thus that important and/or frequently used data should be available in as fast memories as possible.
Most processors today include at least one cache at chip level and many also include multi-level cache systems with external caches built from e.g. one to ten static random access memory (SRAM) chips. The use of multi-level cache systems and the use of large external caches are very well established for processors running general-purpose applications and commercial workloads. However, the use of multi-level cache systems has not been employed for embedded processors (i.e. processors not “visible” for any specific user) and real-time systems to the same degree.
When working with embedded processors and/or real-time systems, some of the disadvantages with cache systems of the prior art become disturbing. There is a lack of determinism, which is particularly troublesome in real-time applications. There are also problems in system maintenance. First, standard multi-level caches present an unpredictable and varying behaviour concerning performance and/or delays. In general-purpose processors this is not noticeable, since there are normally so many different tasks processed, that a comparison form one occasion to the next does not become obvious. Furthermore, in contrary to real-time processors, general-purpose processors do not have any absolute deadline in processing time to meet. However, for embedded processors, a few processes are typically executed repeatedly, and the operation of the system controlled by the embedded processor often relies on the reliability of performance. Well predictable processing times may therefore be of crucial importance for many applications. Since the behaviour of cache systems according to the state of the art typically depends on the recent history of memory use, one and the same process, operated at two different occasions, may present varying processing times. The processing times depend on the recent processing history before the process was started. The performance and delay of a process have to be predictable to a certain extent.
Furthermore, interactions between the cache system and the maintenance of the system, such as background tests or updates in a fault tolerant computer, may considerably change the performance of the cache system. For instance, the execution of a memory test program or the copying of a large memory area for a backup or hardware re-integration can invalidate all content in a cache that is used by the ordinary applications. In real time applications, the performance of a system has to be guaranteed also when maintenance activities of this type are going on.
In the state of the art, there are two main solutions to overcome or reduce the drawbacks described above. One way is to implement the use of a static random access memory (SRAM), and make a division of data between the fast SRAM and the slower memories. The division is visible for the application. Thus, the application developer has to select the data areas that should go into the fast memory, either when writing the code or when configuring the system. This solution might be acceptable for a small application provided that there are few changes in the software or the underlying hardware. For large applications with continuous development and several hardware platforms with different memory configurations, it is in practice impossible to keep up and create optimal configurations for each application and hardware combination.
Furthermore, the introduction of run-time linking for supporting dynamic changes of software in a system, makes it even more difficult for an application developer to select appropriate data for storage in the fast and slow memory areas, respectively. The process of run-time linking or dynamic linking supports program updates during the operation of a processor. In such systems, the program routines and variables are not tied to any specific memory addresses in connection with the compilation. The linking is performed dynamically, in order to allow for updating of program sequences. The actual linking is performed by table look-ups at program calling or access of variables.
Another approach to solve problems with cache unpredictability is to lock entries into the cache. A real-time critical routine is executed and the cache is then locked for keeping the real-time critical routine in the cache. This works well for real-time applications with a single critical routine or a few critical routines, but does not allow for scaling to large applications. In large applications, the worst case behaviour must be guaranteed for a larger code.
In the APZ processor in exchanges from Telefonaktiebolaget Ericsson, SRAM techniques are used for achieving a faster memory access. Entire selected program blocks and associated variable data are moved to an SRAM in the program memory system, depending on the frequency of use. Furthermore, SRAM and DRAM memory boards are mixed in the data memory system, in order to support a division of performance critical and less performance critical data blocks. However, the benefit of such a configuration is limited, since the division is based on a coarse granularity. Program blocks have often sizes in the order of 100 kB and variable/record data blocks could even reach to MB-sizes, which reduces the efficiency of the memory division, since only a few blocks can be accommodated in the faster memory.
The U.S. Pat. No. 6,003,115 discloses a method and apparatus for predictive loading of a cache. This document describes a method of preloading a disk cache by the blocks being most frequently used. The preloading is initiated at certain disk access procedures, e.g. at launch of an executable program code. The preloading takes place in pieces of blocks, the size of which is determined by the size of the disk blocks. Processes of accessing main memories of a RAM type are not applicable.