The arrangement of the memory for a microprocessor has an important role in establishing the overall performance of the computer system. Memory is typically slower than internal Central Processing Unit (CPU) operations particularly when the CPU is on one chip while the main memory is distributed among a number of other chips. Access to resources on the same chip is much faster than access to resources outside the chip. For this reason, most modern microprocessors have a cache memory formed on the same integrated circuit.
FIG. 1 is a prior art illustration of the memory higherarchy for a microprocessor which has been adapted from the book, Advanced Microprocessors, Second Edition, D. Tabak, McGraw-Hill, Inc. 1995. The microprocessor 10 includes the CPU 12 with its associated registers. Chip 10 also contains the primary cache 14. An optional cache 16 can be positioned off chip. The main memory 18 is the one actually addressed by the CPU. It contains the code and data of the currently running program. Some of this information may also temporarily be stored in the cache. The secondary memory 20, such as magnetic disks and hard drives, is much larger than the main memory. The closer the memory is to the CPU, the more expensive per bit of storage it is, the faster it can be accessed by the CPU, and the smaller its size in bytes.
The high cost of the cache memory is the primary reason why the cache size is limited. Another factor limiting the size of the primary cache is the finite number of resources that can be placed upon the integrated circuit.
Cache operation is based upon the principal of locality. There are two main types of locality:
(1) Temporal locality. If an information item is accessed by the CPU, there is a high probability that it will be accessed again in the near future. PA1 (2) Spacial locality. If an information item is accessed, there is a high probability that other items near by in the program will be accessed in the near future.
The cache takes advantage of these two levels of locality. When an information item is obtained by the CPU from the main memory, it is stored into the cache. It remains in the cache until it is written over by another information item brought in from the main memory. This means that recently accessed data will be in the fast cache. Additionally, when the data is brought in from the main memory, typically a block of data or a "line" is brought in to be stored in the cache. For example, if the CPU operates on 32 bits, a line can be much larger, such as a 256 bits or 32 bytes. This means that when data is loaded into the cache, neighboring data in the same "line" is loaded into the cache.
In most existing systems, the cache is subdivided into sets. Each set may contain a number of lines. The mapping between the main memory and a cache containing K sets is shown in FIG. 2. Line 0 from main memory is stored in set 0 in the cache, line 1 into set 1, line 2 into set 2, and so on. Note that line .O slashed., 1K and 2K are all stored into set .O slashed.. Note that line X from the main memory would be stored into set X MOD K in the cache.
This method of mapping, practiced in most existing systems, is called "Set Associative Mapping." Each set in the cache may contain several lines. The set associative mapping that allows L lines to be stored in a set is called the "L-Way Set Associative Mapping." For example, Pentium chips have 2-way set associative mapping.
FIG. 3 is a diagram of an internal cache for the MC680X0 Motorola architecture. This architecture is 4-way set associative. Thus, four lines from the main memory which have the same set number can be stored in the cache. As shown in FIG. 3, the least significant bits of the logical address consists of the page offset. Page offset corresponds to the set number. Lower bits of the offset are used to select the desired word within the lines stored in the cache. The page frame data from the logical address is sent to the address translation cache 24. The address translation cache 24 is usually called a translation look-aside buffer and is used to translate between physical and logical addresses. This translated page data is sent to the comparator 26. The page offset data from the logical address is used to select one of the sets. The tag data for this set in all four of the storage regions is sent to the corresponding comparator. The comparator tells whether the data from a line in main memory is stored in the cache. If there is data stored in the cache, a hit signal, hit 0 to hit 3, is generated. These signals are sent to the "logical or" unit 30 to produce the main hit signal sent to a CPU and a line select signal sent to the multiplexer 34.
The system shown in FIG. 3 is 4-way associative. The higher the level of associativity, the more complex the logic, while the hit ratio is improved. Only a part of a program or data can fit into the cache. The cache is much smaller than the main memory. When the CPU attempts to access any item of information, the item can be either in the cache; which is called a hit, or not in the cache, which is called a miss. When a miss occurs the line containing the missing item is loaded into the cache, replacing another line. In L-way associative mapping, there is L candidates one of which can be replaced in the cache. For example, in a 4-way associative mapping the line can replace one of the four lines having the same set number. The replacement algorithm can be at random; first in first out; or, least recently used.
A brief description of the cache operation is as follows: If there is a hit during the read operation the accessed item is transferred from the cache into the CPU. The main memory is not involved. If there is a hit during the write operation, there are two options. In the writethrough method, a main memory location is updated together with the cache. This method assures data integrity, but results in frequent bus transfers and memory write operations. In the writeback method only the cache is updated on hit, memory is updated only when the updated line is replaced. This method reduces memory bus traffic but may have lengthy periods where there is different values for the same address in memory and in the cache. The writeback method typically uses a bit, sometimes called a "dirty bit" in the tag RAM to indicate that the memory location has not been updated yet or cache has more updated data than the memory. In many systems, the writethrough and the writeback methods are offered as options to the user. If there is a miss during a read operation, the line containing the missing item is transferred from the memory to the cache, replacing another line. If there is a miss during a write operation, the line is either loaded into the cache or not loaded into the cache, depending upon whether the system is designated as write allocate or no-write allocate.
One problem with cache memories is that they are typically harder to test than the main memory. This is because of the relatively complicated addressing logic used with cache memories. Each of the locations of main memory can be accessed with a unique address and test data written-in and read-out to determine the operation of the memory bit. The testing of the cache memory is more complicated because the data addressed are not necessarily in the cache memory and the addresses of data in the cache are stored in a portion of the cache, called the tag RAM, and are only updated as the consequence of the cache miss.
Three main testing methods have been used with cache memory. One method is direct memory access. Additional logic is provided and hardware paths created to provide access to the cache memory directly from input/output (IO) pins. An example of such a system is given in Keeley, U.S. Pat. No. 4,575,792. A problem with this type of method is that a substantial amount of additional hardware paths and control logic is needed. Additionally, the IO timing is often degraded.
The second method is a the built-in self test (the BIST). The problem with this method is that there is typically poor visibility for the test. The built-in self test is usually a go/no-go type of test in which errors can be detected but the location and data patterns of these errors are not identified. This reduces the usefulness of the built-in self test as a debugging tool. An additional problem with the built-in self test is it has poor flexibility because the test pattern or vectors used are fixed.
The third type of test is performed under the programmed control of the CPU. The test pattern and test sequences are flexible and can be modified by the test software. A disadvantage of the functional test method is that there is typically poor test coverage and the tag RAM portion cannot directly be tested. Typically the main memory used with microprocessor systems are significantly smaller than the largest possible logical address for the microprocessor. This means that in order to test the higher significant digits in the TAG RAM field of the cache, a very large tester memory must be used. For example, to test a TAG RAM, first a read would occur causing a miss. The data must be read in from the main memory to be stored in the cache. In order to test the higher level bits of the TAG RAM cache, a very high address main memory is used.
It is desired to have an improved method and apparatus for functional testing of the cache of the microprocessor.