1. Field of Invention
This invention relates generally to computerized data processors and more specifically to the memory subsystems of such processors.
2. Discussion of Related Art
Computer data processors are widely used in modern electronic systems.
Some are designed for specialized functions. One example is a digital signal processor (DSP). A digital signal processor is configured to quickly perform complex mathematical operations used in processing of digital signals.
One important use of digital signal processors is in chips that control cellular telephones and other portable electronic devices. Fast computation is important in these applications. However, because these data processors are used in devices that derive power from a battery, it is desirable for the data processors to use as little power as possible.
FIG. 1 shows a high level a block diagram of a computerized data processor. FIG. 1 may represent a general purpose computerized data processor or it could represent a special purpose data processor, such as a digital signal processor. FIG. 1 illustrates a processor chip 100. Within processor chip 100 is a processor core 110. In operation, processor core 110 reads instructions from memory and then performs functions dictated by the instructions. In many cases, these instructions operate on data that is also stored. When an operation performed by processor core 110 manipulates data, the data is read from memory and results are generally stored in memory after the instruction is executed.
FIG. 1 shows that processor chip 100 includes a level 1 instruction memory 112 and an level 1 data memory 116. Both the instruction memory 112 and data memory 116 are controlled by a memory management unit 114. Instruction memory 112 and data memory unit 116 each contain memory that stores information accessed by processor core 110 as instructions or data, respectively.
The level 1 memory is the fastest memory in a computerized system. The area required on an integrated circuit chip to implement large amounts of level 1 memory generally makes it impossible to build a processor chip with enough level 1 memory to store all the instructions or all the data needed to run a program. Therefore, a computer system includes level 2 or level 3 memory. Level 3 memory is generally very slow. Disk drives or tapes or other bulk storage devices are generally used to implement level 3 memory. Level 2 memory is typically semiconductor memory that is slower than level 1 memory. Level 2 memory might be located off-chip. In some cases, level 2 memory is implemented on processor chip 100, but is slower than level 1 memory. For example, level 1 memory might be static random access memory (SRAM) and level 2 memory might be dynamic random access memory (DRAM).
The computer system of FIG. 1 shows off-chip memory 150, which could be level 2 or level 3 memory. Integrated circuit 100 includes a memory interface 132 through which instructions or data can be read from or written into memory 150. Memory 150 is off-chip memory.
In designing a computerized data processing system where speed of operation is a concern, an effort is made to use level 1 memory as much as possible. Semiconductor chip 100 is configured so that memory operations involving instructions or data pass first through instruction memory 112 or data memory 116, respectively. If the needed instruction or data is not located within those units, those units can access memory interface 132 through internal bus interface 130. In this way, processor core 110 receives the required instruction or data regardless of whether it is stored on-chip or off-chip.
To make maximum use of on-chip memory, a memory architecture called a cache is often used. A cache stores a small amount of information in comparison to what can be stored in level 2 or level 3 memory. The cache stores a copy of information contained in certain level 2 or level 3 memory locations.
In the following description, a cache operating in connection with level 2 off-chip memory will be explained. However, a cache can also be used with on-chip memories or off-chip level 3 memories. Also a cache will be explained in terms of data read from memory. It should be appreciated, though, that a cache can store information to be written into off-chip memory and in operation of a computer system, a cache would be used for both read and write operations.
FIG. 2 shows in block diagram form a cache 200. Control circuitry is not explicitly shown. However, it is well know in the art that semiconductor circuits, including those relating to memories, contain timing and control circuits so that the circuitry achieves the desired operation.
Cache 200 may represent a cache within instruction memory unit 112 or a cache storing data within data memory unit 116. The physical architecture of the cache does not depend on the type of data stored in the cache. In operation, processor core 110 generates an address on address line 202. The address is shown to have an X portion and a Y portion. Each portion of the address is made up of some number of the total bits in the address. The X portion and the Y portion of the address together define the address of the smallest “item” of information that cache 200 stores.
An “item” of information in a cache may be an individual word or byte. However, most semiconductor memories are organized in rows. Time is required to set up the memory to access any row. Once the memory is set up to access the row, the incremental time to read another location in the row is relatively small. For this reason, when information is read from off-chip memory to store in a cache, an entire row is often read from the memory and stored in the cache. Little additional time is required to store an entire row, but significant time savings results if a subsequent memory operation needs to access another location in the row. In this case, the “item” stored in the cache corresponds to an entire row in the off-chip memory. Additional address bits are applied to the cache 200 to select a particular piece of information from the item. For simplicity, FIG. 2 shows address lines to access an “item” but does not show additional circuitry or address lines that may be present to access a particular memory location within any item.
FIG. 2 shows that cache 200 contains a tag array 210 and a data array 220. Each location 2221 . . . 222N in data array 220 can store an “item”. Tag array 210 contains corresponding locations 2121 . . . 212N. The locations in tag array 210 indicate whether an item is stored in the corresponding location in data array 220 and, if so, which memory address the item is associated with. Each of the locations 2121 . . . 212N. has two fields (not numbered). A first field stores an indication of whether valid data is stored in the corresponding location in data array 220. This field is sometimes called the “data valid” field. The second field in each of the locations 2121 . . . 212N identifies the address in level 2 memory that is stored in the cache. This field is sometimes called the “tag” field.
To simplify the construction and increase the speed of operation of the cache 200, the locations within cache 200 in which the information for any level 2 off-chip memory location may be stored are constrained. As shown, the Y portion of the address bits of each external memory address are applied to tag array 210 and data array 220. The Y portion of the address bits are used to select one of the locations within these arrays. If information from an a level 2 memory location having those Y portions is stored in the cache, it is be stored at the selected location. To indicate that information has been stored in the data array, the data valid field in the corresponding location in the tag array is set.
Because many external addresses have the same values for their Y bits but different values for the X bits, the information stored in the data array may correspond to any one of these external addresses. The tag field in the tag array stores the X bits of the address that is being represented by the information stored in the cache.
To determine whether cache 200 stores information for a specific address in level 2 memory, the Y bits are used to access a particular location in tag array 210. If the data valid field in that location is set, the tag field in the location addressed by the Y address bits is applied to comparator 230. A second input to comparator 230 comes from the X bits on address line 202. If the X bits match, then the location within data array 220 addressed by the same Y bits can be used in place of making an access to external memory.
Where information already stored in cache 200 can be used in place of making an access to level 2 memory, it is said that the access resulted in a cache “hit.” Conversely, where the cache does not store information corresponding to the external address being accessed, a “miss” is said to occur.
To increase the chance of a “hit,” cache 200 is constructed with multiple “ways.” A way is sometimes also called a bank. In the illustration of FIG. 2, two ways 210A and 210B are shown in tag array 210 and a corresponding two ways, 220A and 220B, are shown for data array 220. Each way is addressed by the Y bits of the address as described above. However, because the tag array can store a different tag in each way for the same Y values, having two ways allows two locations with the same Y bits to be stored in the cache. Being able to store twice as many values nearly doubles the chances of a “hit” and therefore reduces the time required for memory access.
A cache can have any number of ways. Adding more ways decreases average memory access time, but also increases the amount of high speed memory related to implement the cache.
To ensure that adding ways does not increase memory access time, comparator 230 contains circuitry to simultaneously compare the values in the tag fields in all the ways with the X address bits of the applied address. The output of comparator 230 indicates whether there is a match between the X bits of the applied address and the X bits at the location in any of the ways of the tag array addressed by the Y bits.
The output of comparator 230 also indicates in which way the match was found. The output of comparator 230 is provided to multiplexer 240. Multiplexer selects the output of the appropriate way when there is a cache hit.
It would be desirable to provide a cache from which items can be quickly read with low power.