1. Field of the Invention
This invention relates in general to memories and more specifically to memory pipelining in an integrated circuit.
2. Description of the Related Art
A memory array typically provides output data bits during a read access that correspond to an incoming address. The output data bits can be organized in a number of different ways including individual bits, bytes, words, and lines. In embedded memory systems, it is possible to have applications where the processor (or another user) requires some portion of the output data bits sooner than other portions of the output data bits. For example, the user might prefer to have a small portion of the bits in the first phase of the clock cycle and the remainder of the bits in the second phase of the clock cycle. Hence, there is a need for a memory that will optimally fulfill the needs of the user in a manner that offers high performance while minimizing power and area.
Additionally, some cache memories are organized into an associative structure. In an associative structure, the blocks of storage locations are accessed as arrays having rows (often referred to as “sets”) and columns (often referred to as “ways”). When a cache is searched for bytes residing at an address, a number of bits from the address are used as an “index” into the cache. The index selects a particular set within the array, and therefore the number of address bits required for the index is determined by the number of sets configured into the cache. The act of selecting a set via an index is referred to as “indexing.” The addresses associated with bytes stored in the multiple ways of a set are examined to determine if any of the addresses stored in the set match the requested address. If a match is found, the access is said to be a “hit,” and the cache provides the associated bytes. If a match is not found, the access is said to be a “miss.” When a miss is detected, the bytes are transferred from the memory system into the cache. The addresses associated with bytes stored in the cache are also stored. These stored addresses are referred to as “tags” or “tag addresses.”
The blocks of memory configured into a set form the columns of the set. Each block of memory is referred to as a “way”; multiple ways comprise a set. The way is selected by providing a way value to the cache. The way value is determined by examining the tags for a set and finding a match between one of the tags and the requested address. A cache designed with one way per set is referred to as a “direct-mapped cache.” In a direct-mapped cache, the tag must be examined to determine if an access is a cache hit, but the tag examination is not required to select which bytes are transferred to the outputs of the cache. Since only an index is required to select bytes from a direct-mapped cache, the direct-mapped cache is a “linear array” requiring only a single value to select a storage location within it.
The hit rate in a data cache is important to the performance of a data processing system because when a miss is detected the data must be fetched from the memory system. The microprocessor will quickly become idle while waiting for the data to be provided. Set-associative caches require more access time than direct-mapped caches since the tags must be compared to the requested address and the resulting hit information must then be used to select which data bytes should be conveyed out of the data cache. As the clock frequencies of data processing systems increase, there is less time to perform the tag comparison and way selection. The problem is further compounded for processors using a wider data-path (for example, 64 bit versus 32 bit). In order to reduce the time to perform the tag comparison and way selection, some cache memories use a speculative way prediction scheme for way selection. In these schemes, the predicted way depends on a lookup and comparison of a portion of the entire tag. For instance, a tag way array may be configured as an array having an upper tag portion and a lower tag portion. The lower tag might be initially used to determined the predicted way. The prediction would be validated later once the upper tag is accessed and compared. The organization and timing of such a tag array can have a clear impact on speed, area, and power consumption. Thus, there is a need for an improved structure and method for accessing the upper tag portion and the lower tag portion of a tag way array.
Additionally, there are applications where the entire address is not immediately available at the beginning of the memory access. For example, the entire address may be known except for the least significant address bit(s). Furthermore, it is possible that only a portion of the output data bits are required in the first clock phase and the rest are required at a later time when the entire address is known including the least significant address bit(s). Hence, there is a need for an improved structure and method for accessing memories in such applications.
Skilled artisans appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help improve the understanding of the embodiments of the present invention.