1. Field of the Invention
This invention relates to a addressable memory interface. More particularly, it relates to a method and apparatus to adaptively overlay a group of memory addresses to provide an efficient and flexible processor/memory interface.
2. Background of Related Art
Processors nowadays are more powerful and faster than ever. So much so that even memory access time, typically in tens of nanoseconds, is seen as an impediment to a processor running at its full speed. Typical CPU time of a processor is the sum of the clock cycles executing instructions and the clock cycles used for memory access. While modern day processors have improved greatly in the Instruction execution time, access times of reasonably priced memory devices have not similarly improved.
Thus, rather than relying on improvements in access speed of memory devices themselves, improved memory accessing methods and processor/memory interface architectures are employed in modern computer systems to minimize the above described bottleneck effect of memory access time.
For example, some processor/memory architectures take advantage of a memory-interleaving scheme in which consecutive data segments are stored across a number of banks of memory to allow parallel access to multiple memory locations and a large segment of data. Another particularly common memory access time enhancing method is memory caching. Caching takes advantage of the antithetical nature of the capacity and speed of a memory device. That is, a bigger (or larger storage capacity) memory is generally slower than a small memory. Also, slower memories are less costly, thus are more suitable for use as a portion of mass storage than are more expensive, smaller and faster memories.
In a caching system, memory is arranged in a hierarchical order of different speeds, sizes and costs. For example, as shown in FIG. 6, a smaller and faster memory, usually referred to as a cache memory 603 is placed between a processor 604 and larger, slower main memory 601. Typically, a hierarchical division is made even within a cache memory, so that there ends up being two levels of cache memories in the system. In this layered cache system, the smaller and faster of the two levels of cache memories, typically called level one or L1, may be a small amount of memory embedded in the processor 604. The second level or L2 cache is typically a larger amount of memory external to the processor 604.
The cache memory may hold a small subset of data stored in the main memory. The processor needs only a certain a small amount of the data in the main memory to execute individual instructions for a particular application. The subset of memory is chosen based on an immediate relevance, e.g., likely to be used in near future. This is much like borrowing only a few books at a time from a large collection of books in a library to carry out a large research project. Just as research may be just as effective and even more efficient if only a few books at a time were borrowed, processing of an application program is efficient if a small portion of the data was selected and stored in the cache memory at any one time.
A cache controller 602 monitors (i.e., xe2x80x9csnoopsxe2x80x9d) the address lines of the bus 605 to the processor 604, and whenever a memory access is made by the processor 604, compares the address being accessed by the processor 604 with addresses of the small amount of data that is stored in the cache memory 603. If data needed by the processor 604 is found in the cache memory 603, a xe2x80x9ccache hitxe2x80x9d is said to have occurred, and the processor 604 is provided the required data from the faster cache memory 603, analogous to finding the necessary information in the small number of books that were borrowed. If the information needed by the processor 604 is not stored in the cache memory 603, a xe2x80x9ccache missxe2x80x9d is said to have occurred, and an access to the slower main memory 601 must be made, analogous to making another trip to the library. As can be expected, a cache miss in the L2 cache memory, which requires access to slower main memory 601, is more detrimental than a cache miss in the L1 cache memory, which only requires aa subsequent access to slightly slower L2 cache memory.
Obviously, the goal is to increase cache hits (or to reduce cache misses). Typically, this goal is achieved by following what is called the xe2x80x9clocalityxe2x80x9d theory. According to this theory, a temporal locality is based on the general axiom that if a particular piece of information was used, the same information is likely to be used again. Thus, data that was once accessed by the processor 604 is brought into the cache 603 to provide faster access during probable subsequent reference by the processor 604. According to a second locality theory known as the spatial locality theory, when information is accessed by the processor 604, information whose addresses are nearby the accessed information tend to be accessed as well. Thus, rather than storing only the once accessed data into the cache, a block of data, e.g. a page i, in the vicinity including the once accessed data is brought into the cache memory.
With every memory access by the processor 604, these locality theories are used to decide which new page or pages of data are to be stored in the cache memory 603. The new page replaces an existing page of data in cache 603 using a block (or page) replacement strategy, e.g., FIFO, random, or least recently used (LRU) methods, well known to designers and architects of computer systems.
While the use of cache memory in a memory/processor interface described above has provided a significant improvement in avoiding memory access time bottlenecks, and in preventing slow down of a processor otherwise capable of running at higher speed, the caching system described above suffers from significant drawbacks.
For example, cache thrashing occurs when a frequently used block of data is replaced by another frequently used block, thus causing a repeated fetching and displacement of the same block of data to and from the cache memory 603. The thrashing may occur when the processor 604 is processing a set of instructions that has too many variables (and/or is simply too large) to fit into the cache memory. In this case, for example, when one particular variable is referenced by the processor 604 and is not present in the cache memory 603, a cache miss would occur. If so, the variable must be retrieved from the main memory 601 and stored in the cache memory 603 for access by the processor 604. However, because the cache memory 603 may already be full due to the storage of the large code segment, another variable must be removed to make room for the variable currently being referenced. Then when the processor 604 subsequently references the variable that was removed from the cache memory 603, the above cache miss process is repeated. Thus, in this scenario, it is likely that blocks of data may be constantly fetched and replaced whenever the processor 604 references a particular variable.
The user may be aware of a particular set of information, e.g., common global variables, or set of common program codes, which are frequently referenced by the processor or are referenced by various components or applications in a particular computer system. Unfortunately, conventional processor/memory interface architectures are fixedly defined by a system designer, thus a user cannot remedy the above described problem even if the user is aware of a set of information that is expected to be frequently referenced by the processor.
The size of a large set of instructions (or programs) can be reduced significantly by use of common code segments that are shared with other sets of instructions. The program may include only a reference, e.g., jump or call instructions, to the common code segment that is stored separate from the program, thus is reduced in size. The reduced sized program may then fit in the available cache memory space, thus avoiding the above described thrashing of cache memory. Aside from avoiding thrashing, smaller code size generally provides faster execution speed. Thus, a reduction in size (i.e., code compression) in and of itself, even if still too large for the cache memory, increases speed, and thus is generally desirable.
Unfortunately, faster speed cannot be easily realized in conventional processor/memory architectures because when the reduced sized program is referenced by the processor of the conventional system, portions of the program which may reference the common code segment are loaded into the cache. Unfortunately, conventional architecture schemes do not account for the storage of the common code segment in faster memory, e.g., the cache memory. When reference is made to the common code segment during execution of the size reduced program, the segment must be brought from the slower main memory, incurring a cache miss. Thus, even though the user may be aware of the speed advantages of providing common code segments in faster memory, conventional processor/memory architectures do not allow them to fully realize the benefit of the size reduction of programs.
Furthermore, conventional processor/memory interfaces do not provide efficient context switching, e.g., when an interrupt is triggered. For instance, when an interrupt is requested, the operating system of the computer system preserves the state of the processor 604 by storing the current contents of the registers and the program counter of the processor 604, and allows the processor 604 to run a routine to service the particular interrupt that had occurred. Typically, the interrupt service routine (ISR) is fetched from the main memory 601 or from another memory storage area, i.e., ROM or the BIOS memory.
However, because the service routine was not found in the cache memory when the processor 604 attempted to execute the ISR, a cache miss will occur. Another cache miss (or even an error due to in ability to return to the same data set) may occur when the processor 604 tries to access the page after the completion of the interrupt service routine. This is because the routine may replace the current page in the cache memory (the page that was being accessed by the processor 604 just prior to the occurrence of the interrupt).
Furthermore, in a multi-tasking environment, e.g., when multiple copies of an application are running simultaneously, each running copy of the application has its own global variable space, each storing global variables which may be common between the two running applications. Such redundant storage of common global variables is wasteful of memory, and causes the size of the application program to become unnecessarily large, and makes it more likely that cache thrashing will be caused.
There is a need for more efficient processor/memory architecture to provide a guard against cache misses, page replacement and/or thrashing during an access to a globally used routine or variable, or during context switching, e.g., during an invocation of an interrupt service routine.
There is also a need for more efficient and faster processor/memory architecture to allow code size reduction and/or memory space savings.
In accordance with the principles of the present invention, a memory aliasing (or overlay) apparatus comprises at least one spare addressable circuit having repeatedly referenced information persistently stored therein, and an overlay control module intercepting a data path between a processor and a plurality of addressable circuits. The overlay control module is adapted to redirect access to said repeatedly referenced information by said processor from said at least one of said plurality of addressable circuits to the at least one spare addressable circuit.
In accordance with the principles of the present invention, a method of providing overlay of at least one location in a plurality of addressable circuits for access by a processor comprises, providing at least one spare addressable circuit; persistently storing repeatedly referenced information in at least one spare addressable circuit, intercepting a data path between the processor and the plurality of addressable circuits; and redirecting access to the repeatedly referenced information from at least one location of the plurality of addressable circuits by said processor to at least one spare addressable circuit.