In a computer system, the central processing unit (CPU) processes information that is stored in a main memory (RAM or DRAM) within the computer. Typically, the CPU processes the information faster than the information can be accessed and transferred from the main memory to the CPU. As a result, when the CPU requests the information from the main memory, the CPU may remain idle for a substantial time while waiting for the information.
To reduce the amount of time the CPU remains idle, a fast but typically expensive type of memory is used as a cache. The cache is an intermediary storage area between the CPU and the main memory in order to store recently or frequently used information. Access times for the cache are typically substantially faster than access times for the main memory. However, the cache is substantially more costly than the main memory. Therefore, for cost effectiveness, the cache is often utilized in smaller sizes, relative to the less costly main memory.
FIG. 1 illustrates a generic cache mechanism in a computer, including a CPU (10), a main memory (12), a cache (14), and a data bus (16). The data bus (16) provides a means for transferring information between the main memory (12) and the CPU (10). As noted earlier, it is desirable for the CPU (10) to access the information as fast as possible. Therefore, when a piece of information is requested by the CPU (10), the cache (14) is searched for the piece of information first. If the piece of information is stored in the cache (14), then the piece information is quickly provided to the CPU (10). Otherwise, the piece of information is retrieved from the (relatively slower) main memory (12) and provided to the CPU (10). Also, the piece of information is stored in the cache (14), so that next time the piece of information is requested, the piece of information can be accessed quickly from the cache (14).
The granularity of storage in a cache is called a line. A cache line is a collection of bytes, generally at sequential memory locations, which are stored as a unit in the cache. Transfers between a main memory and the cache are usually accomplished via cache line granularity (i.e., one or more cache lines at a time).
When information requested is not found in the cache, a xe2x80x9ccache missxe2x80x9d occurs. Conversely, if the information is found, there is a xe2x80x9ccache hit.xe2x80x9d Due to the limited size of the cache, information stored in the cache may need to be replaced or swapped out, in order to make room for newly requested information that is not stored in the cache. Various cache mechanisms have been developed with an objective to maximize caching efficiency using methods or cache architectures that result in a low volume of cache misses and/or cache swaps.
Two currently used cache mechanisms are an associative cache, and a direct-mapped cache. The direct-mapped cache is effective and relatively inexpensive, in comparison to the associative cache. Sometimes, one or more locations in cache memory (i.e., cache lines) may be xe2x80x9clocked,xe2x80x9d in order to avoid a set of information from being swapped out. A traditional direct-mapped cache mechanism may become less effective when used in a locking scenario because a locked cache line prohibits a portion of memory from being present in the cache. Some associative caches, however, can more effectively operate in a locking scenario, due to a more complex design. Therefore, when locking is required in a design architecture, a more complex and expensive associative cache mechanism may be preferred. Due to a relatively simple design, a direct-mapped cache can often be economically developed and manufactured.
An associative cache includes a number of information storage locations called cache lines. In addition to information retrieved from the main memory, each cache line of an associative cache typically includes a tag bit that indicates whether a particular cache line contains any information at all. FIG. 2, for example, illustrates an associative cache (15), having cache line (30A) through cache line (30Z) that correspond to the main memory (12), including multiple memory blocks (38A) through (38Z). Each memory block represents a unique main memory address. Each cache line can reference one or more memory addresses, depending on cache design.
Typically, when a computer is reset, each tag bit in a cache is set to 0, for example, to indicate that no cache lines are in use. Each time the CPU (10) makes a memory request, one or more cache lines is filled with data, and tag bits for the cache lines that are filled with data are changed to 1, for example, to indicate that the cache lines are in use. For example, referring to FIG. 2, if the CPU (10) makes a request for information stored at memory block (38A) in the main memory (12), a search is performed to determine whether any used cache lines in the associative cache (15) include the information requested by the CPU (10). Failing to find the information, a bus request is issued to fetch the information stored in memory block (38A) from the main memory (12), and store the information in an unused cache line in the associative cache (15). In the associative cache (15), cache lines may be filled in random order.
If information stored in the memory block (38A) is needed later, the information is quickly fetched from the associative cache (15), eliminating the need for a bus operation across the data bus (16) to fetch the information from the (relatively slower) main memory (12). Eventually, more cache lines are filled with data from the main memory (12), as the CPU (10) continues to request retrieval of additional information. If information needed by the CPU (10) appears in the associative cache (15), the CPU (10) can access the information from the associative cache (15) quickly, without making any memory references.
However, when the associative cache (15) is full, a previous cache entry i.e., information stored in a used cache line) is discarded to make room for a new entry from the main memory (12). In order to avoid performing a linear search on cache lines, an associative cache has special hardware that can search each cache line simultaneously for requested information. However, the special hardware makes the associative cache relatively costly.
A direct-mapped cache is a relatively less costly alternative to the associative cache, in that the direct-mapped cache avoids use of special hardware, by storing information in a cache line that is directly associated with a memory block from which the information is retrieved. For example, referring to the associative cache (15) illustrated in FIG. 2, the cache line (30) may be directly associated with the memory block (38). Accordingly, if the CPU (10) requests information stored at the memory block (38), the information can be retrieved directly from the cache line (30) associated with the memory block (38).
Alternatively, more than one memory block may be associated with (i.e., mapped to) a cache line. For example, if the main memory includes a 4xc3x974 memory block pattern for a total of 16 memory blocks, a cache with four cache lines may suffice to directly map a group of four memory blocks into each cache line. Direct-mapped caching has a one-to-one association between groups of memory blocks and cache lines. Even though a cache line may be associated with more than one memory block, the cache line can only store information retained in one memory block at a time. Thus, when multiple memory blocks map onto a particular cache line, determining which memory block is currently occupying the particular cache line is impossible. Thus, each cache line also includes a tag field that can be used to identify the particular memory block currently stored in the cache line. The tag field can be represented by a binary number, for example.
FIG. 3 illustrates a direct-mapped cache (17) associated with a main memory (12). Memory block (38A) through memory block (38D) are mapped to cache line (30A), memory block (38E) through memory block (38H) are mapped to cache line (30B), and so on. If the CPU (10) requests information in memory block (38B), then cache line (30A) is targeted, as memory block (38A) through memory block (38D) are cached onto cache line (30A). A binary tag field, such as xe2x80x9c00xe2x80x9d,xe2x80x9c01xe2x80x9d,xe2x80x9c10xe2x80x9d, or xe2x80x9c11xe2x80x9d can be used to denote which of the four memory blocks (memory block (38A) through memory block (38D)) are currently stored in cache line (30A), if any. For example, field xe2x80x9c00xe2x80x9d may denote that cache line (30A) references information currently stored in memory block (38A). Depending on the particular size of a particular cache line or main memory size, the tag field may vary in width. A tag field with a larger number of bits can be used to distinguish between a larger number of memory blocks.
Collision occurs with direct-mapped caches when multiple memory blocks that map onto a particular cache line collide. Collision, in this context, refers to a scenario where memory block addresses referencing currently requested information happen to map to the same particular cache line. For example, referring to FIG. 3, consider a scenario in which the CPU (10) requests information stored in memory block (38A) and memory block (38B) simultaneously. Both memory block (38A) and memory block (38B) map onto cache line (30A). Thus, there is sufficient room for only one of the memory block (38A) or memory block (38B) to be fetched into the direct-mapped cache (17), and not both. Excessive collision can substantially degrade the performance of cache memory. Therefore, alternative cache designs, such as those discussed below, have been implemented.
To minimize collision between memory addresses that map onto the same cache line, a direct-mapped cache can be expanded to include more than one entry per cache line. A direct-mapped cache with multiple entries per cache line is called a xe2x80x9cset associative cache.xe2x80x9d A set associative cache is a hybrid between the direct mapped cache and the associative cache, in that a set associative cache supports direct mapping, but also requires additional hardware to quickly search multiple entries in a cache line.
FIG. 4 illustrates a set associative cache that has two entries per cache line. For example, in set associative cache (80), cache line (30A) has two entries: entry (90) and entry (92). Cache line (30B) through cache line (30Z) have entry (94) though entry (108). In a set associative cache, such as illustrated in FIG. 4, a collision is avoided even when requested information is stored in memory blocks mapped onto a particular cache line because each cache line includes an additional entry to accommodate the storage of more than one set of information.
For various reasons, including efficiency and predictability, maintaining certain data in cache memory in fixed, easily accessible locations is desirable. For example, one or more cache lines in a cache memory may be exclusively dedicated to or xe2x80x9clockedxe2x80x9d with data that are used often, which allows for fast and predictable access to the data. Therefore, processing efficiency is increased, because a need for exhaustive searching of cache memory prior to retrieval of information is eliminated. Thus, a portion of cache memory is converted into local memory for a CPU. Usually, cache locking is done by a programmer based on a detailed knowledge of a program""s access patterns. For example, a signal processing program in which a fixed filter is applied to a large body of data may involve using filter coefficients of the fixed filter that are used often by the program, whereas the data are used less often. Sharpening filters for image processing, often uses a fixed filter and filter coefficients, such as described above, often. Where sharpening filters are used as described above, data memory is usually much larger than cache memory, so therefore there is typically little long-term locality of access. Thus, locking the filter coefficients in place can produce a large speedup.
A notable aspect of locking schemes is that the locking scheme may reduce cache storage capacity, because a locked region of cache memory may become inaccessible for swapping information to and from main memory. Specifically, when the locking scheme is applied to a direct-mapped cache, the locking scheme creates a repeating xe2x80x9cholexe2x80x9d in the particular block of main memory that is mapped onto the particular cache lines that are locked. For example, referring to the direct-mapped cache illustrated in FIG. 3, if cache line (30A) and cache line (30B) are locked, then a cache request for information residing in memory block (38A) through memory block (38D), and memory block (38E) through memory block (38H) is rendered xe2x80x9cillegalxe2x80x9d because cache line (30A) and cache line (30B) are locked to prevent current information stored there from being swapped out. This scenario has a low cache hit rate and may lower caching efficiency, especially if information stored in the hole (i.e., memory block (38A) through memory block (38D) and memory block (38E) through memory block (38H)) is frequently needed.
Many circuit designers have abandoned the use of direct-mapped caches in favor of set associative caches in scenarios where cache line locking is needed. Because a set associative cache includes more than one set of entries per cache line, one set of the entries can be used to lock necessary information, and the other sets can be used for caching memory information. However, due to overhead and costs associated with design and manufacture of set associative caches, set associative caches are often relatively more expensive than direct-mapped caches.
In general, in one aspect, the invention relates to a method of managing data in a cache memory. The method comprises mapping a member of a plurality of memory addresses in a main memory onto a first member of a plurality of cache lines, locking the first member of the plurality of cache lines creating a locked cache region and an unlocked cache region, remapping the member of the plurality of memory addresses from the first member of the plurality of cache lines onto a second member of the plurality of cache lines within the unlocked cache region, requesting data stored in the main memory, fetching the data from the locked cache region, if available in the locked cache region, fetching the data from the unlocked cache region, if not available in the locked cache region and available in the unlocked cache region, and fetching the data from the main memory, if not available in the locked cache region and not available in the unlocked cache region.
In general, in one aspect, the invention relates to a method of managing data in a cache memory. The method comprises mapping a member of a plurality of memory addresses in a main memory onto a first member of a plurality of cache lines, locking the first member of the plurality of cache lines creating a locked cache region and an unlocked cache region, remapping the member of the plurality of memory addresses from the first member of the plurality of cache lines onto a second member of the plurality of cache lines within the unlocked cache region, requesting data stored in the main memory, fetching the data from the locked cache region, if available in the locked cache region, fetching the data from the unlocked cache region, if not available in the locked cache region and available in the unlocked cache region, fetching the data from the main memory, if not available in the locked cache region and not available in the unlocked cache region, detecting whether the data is within the unlocked cache region, using a register that maintains a size of the unlocked cache region, and associating with the second member of the plurality of cache lines, a tag field to identify information stored in the second member of the plurality of cache lines.
In general, in one aspect, the invention relates to a cache memory management system. The cache memory management system comprises a main memory comprising a plurality of memory addresses, and a cache memory comprising a plurality of cache lines, the plurality of cache lines comprising a locked cache region and an unlocked cache region, wherein a member of the plurality of memory addresses is mapped onto a first member of the plurality of cache lines, wherein the member of the plurality of memory addresses is configured to remap from the first member of the plurality of cache lines onto a second member of the plurality of cache lines, wherein the first member of the plurality of cache lines is within the locked cache region, and the second member of the plurality of cache lines is within the unlocked cache region.
In general, in one aspect, the invention relates to a cache memory management system. The cache memory management system comprises a main memory comprising a plurality of memory addresses, a cache memory comprising a plurality of cache lines, the cache memory comprising a locked cache region and an unlocked cache region, a register maintaining a size of the unlocked cache region, and an address detection mechanism configured to detect whether a referenced member of the plurality of memory addresses is within the locked cache region, wherein a member of the plurality of memory addresses is mapped onto a first member of the plurality of cache lines, wherein the member of the plurality of memory addresses is configured to remap from the first member of the plurality of cache lines onto a second member of the plurality of cache lines, wherein the first member of the plurality of cache lines is within the locked cache region, and the second member of the plurality of cache lines is within the unlocked cache region.
In general, in one aspect, the invention relates to a computer system for managing data in a cache memory. The computer system comprises a processor, a memory, software instructions stored in the memory to cause the computer system to perform mapping a member of a plurality of memory addresses in a main memory onto a first member of a plurality of cache lines, locking the first member of the plurality of cache lines creating a locked cache region and an unlocked cache region, remapping the member of the plurality of memory addresses from the first member of the plurality of cache lines onto a second member of the plurality of cache lines within the unlocked cache region, requesting data stored in the main memory, fetching the data from the locked cache region, if available in the locked cache region, fetching the data from the unlocked cache region, if not available in the locked cache region and available in the unlocked cache region, and fetching the data from the main memory, if not available in the locked cache region and not available in the unlocked cache region.
In general, in one aspect, the invention relates to an apparatus managing data in a cache memory. The apparatus comprises means for mapping a member of a plurality of memory addresses in a main memory onto a first member of a plurality of cache lines, means for locking the first member of the plurality of cache lines creating a locked cache region and an unlocked cache region, means for remapping the member of the plurality of memory addresses from the first member of the plurality of cache lines onto a second member of the plurality of cache lines within the unlocked cache region, means for requesting data stored in the main memory, fetching the data from the locked cache region, if available in the locked cache region, means for fetching the data from the unlocked cache region, if not available in the locked cache region and available in the unlocked cache region, and means for fetching the data from the main memory, if not available in the locked cache region and not available in the unlocked cache region.
Other aspects and advantages of the invention will be apparent from the following description and the appended claims.