The main memory (especially DRAM) of a computer system is generally separate from the microprocessor integrated circuit chip and organized with memory cells disposed in an array structure for quick access. It is extremely difficult to mass produce memory chips that do not contain at least some defective memory cells thereon. Scrapping memory chips with defective memory cells can be expensive, and therefore methods of bypassing those defects have been developed.
One popular technique for bypassing memory chip defects is adding redundant rows and/or columns, along with a reallocation circuit, to the memory chip. After manufacture, each memory cell, row and/or column on the chip is tested for defects. If a defect is found, the reallocation circuit is used to substitute one of the redundant rows/columns for the row/column containing the defect. This substitution is performed by reallocating addresses of defective memory rows/columns with addresses of redundant memory rows/columns. This redundancy technique has the advantage of greatly enhancing chip yield which lowers chip cost because chips with relatively few defects are usable and need not be scrapped. However, the drawback to this redundancy technique is that the added reallocation circuit and redundant memory rows/columns take up valuable space on the memory chip.
In addition to system main memory, cache memory (especially SRAM) is typically used in computer systems for increasing memory access speeds. Cache memory is used to temporarily retain data that the processor will likely require and access in executing current operations, especially data that is repeatedly and/or sequentially accessed. There is a substantial increase in the execution speed of individual processes, and therefore of the overall computer system, by providing cache memory which can be accessed much faster than the slower system main memory. This speed improvement is provided by SRAM, which is faster than DRAM, and by the cache being closer to the processor. Cache memory is typically accessed by using memory addresses of data requested by the processor which correspond to cache contents. If the requested data is contained in the cache memory, that data is supplied from the cache memory to the processor. If the requested data is not contained in the cache memory, then that data is supplied to the processor from the system main memory.
Typically, cache memory resides between the processor and the system main memory. More recently, cache memory (such as SRAM) has been formed on-board the microprocessor chip itself in order to increase the speed of cache memory access. One such microprocessor chip 1 is partially illustrated in FIG. 1, and includes a processor 2, a data cache memory 3, a decoder 4, a store buffer 5 and a comparator 6.
The processor 2 sends fetch instructions to retrieve data from the cache memory 3, as well as authorizing data to be written to cache memory 3. Reading data from, and writing data back to, cache memory 3 takes several steps. For example, in a 6 stage pipeline microprocessor cache, there are six steps for cache memory access and rewrite: 1) the fetch instruction step where the processor 2 requests data, 2) a decode step where the decoder of the processor decodes the fetched instruction, 3) an address generation step where an address of the requested data is generated, 4) a memory access step where the requested data is read from cache memory 3 using cache address decoder 4 and is sent to the processor 2, 5) an execution step where the processor 2 executes an operation using the requested data, and 6) a write step where data generated by the execution step of the processor 2 is written back to the cache memory 3.
For maximum processor speed performance, it is important that the data retrieval operations (the fetch, decode, address generation and memory access steps) be performed as quickly as possible. If the processor is forced to wait to retrieve data from cache memory, e.g. because other data is currently being written to cache memory 3, then the processor 2 is in a wait state which decreases processor performance. In order to prevent processor 2 from being in a wait state, store buffer 5 is provided to temporarily store data to be written to cache memory 3. The data stored in store buffer 5 are written to cache memory 3 only when data retrieval from cache memory 3 is not taking place. By giving cache memory 3 data writing a low priority relative to cache memory 3 data retrieval, performance of the processor is increased.
In the example illustrated in FIG. 1, a multi-entry, FIFO (first in, first out) store buffer 5 is used. Each buffer entry S1, S2, S3 and S4 stores data to be written to cache memory 3, and has a corresponding store buffer tag entry 7 with the address of the stored data. Data to be written to cache memory 3 is initially received and stored in the S1 buffer entry with the corresponding address stored in the first entry of the store buffer tag 7. When the next data is sent for cache memory storage, it is received and stored in the S1 buffer entry, and the former contents of the S1 buffer entry are written into the S2 buffer entry, and so on. When cache memory access is available (e.g. when the processor is not retrieving data from cache memory), data stored in store buffer 5 is written to cache memory on a first in, first out basis (i.e. highest S level buffer entry first). Therefore, even though data may be continually sent to cache memory for storage therein, writing such data to cache memory does not interfere with or delay data retrieval from cache memory by the processor, and processor waiting is avoided.
Comparator 6 gives microprocessor 1 both bypass and merge capabilities by comparing addresses in tag entries 7 with addresses of data being fetched by, and sent from, processor 2. More specifically, the bypass function allows access to data stored in the store buffer 5 that has not yet been written to the cache memory. If the address of the data being fetched by processor 2 matches the address of data stored in the store buffer 5 (as determined by comparator 6), then that data is read out directly from the appropriate entry of the store buffer 5 to the processor 2. Thus, the processor 2 is not forced to wait until the requested data is written to cache memory 3 from the store buffer 5 before retrieving that data. The merge function allows the rewriting of data in the store buffer 5 before it has been written to cache memory 3. If the address of new data sent from processor 2 for cache memory storage matches the address of data still stored in the store buffer 5, then instead of creating another buffer entry, the buffer entry containing the old data is simply rewritten with the new data.
The problem with the microprocessor chip 1 described above is that it does not provide a means for bypassing defective memory rows and/or columns in the cache memory 3. If cache memory defects exist, the microprocessor has to be scrapped in its entirety, which is expensive given the complexity and cost of microprocessor chips. There is a need for a microprocessor that can bypass defects in the cache memory area thereof.
One solution is to re-direct data requests for defective cache memory portions to the computer's main memory, as is disclosed in U.S. Pat. Nos. 5,551,004 and 5,708,789. However, processor speed and performance suffers because of the relatively long time it takes to access data from system main memory as compared to cache memory, especially if there are many defective cache memory portions.