Among the characteristics of a memory system or subsystem which is often of importance is what may be termed "memory performance" including, in this context, a measure of how long it takes to obtain from memory, a desired portion of the data stored therein. The appropriate metric for assessing such performance can depend on, or be affected by, certain characteristics of the data of a typical data request. For this reason, a memory system or subsystem which might be appropriate for some uses (e.g. as general purpose memory in a personal computer) may be less optimal for another purpose (such as frame storage in a network switch). Often a memory designed for a particular purpose represents a balance among numerous factors in addition to performance, such as component costs, flexibility of use, complexity of design, time to market and similar factors. Many previous memory systems or subsystems designs have involved selection of dynamic random access memories (DRAM), static random access memories (SRAM) or a combination of the two. While DRAM is typically a relatively less expensive option, it generally provides slower access and poorer performance than the (relatively more expensive) SRAM or "fast" memories. In typical previous uncached DRAM systems, all portions of data in a given data request were read directly from the DRAM. Many DRAMs are configured to reduce the access time and/or performance disadvantages, such as by using fast page mode or other data access techniques. Even with such techniques, however, there are access time and/or performance disadvantages to DRAM, compared to, e.g., SRAM. One such disadvantage arises, at least in part, from the period of time required to access or open a row (or "page") of DRAM memory. Thus, when DRAM receives an instruction to provide memory from a "new" row, there will typically be a delay ("page-opening" delay or "row access" delay) before memory words in that row are output from the memory. Typically, once such delay is past, access to each successive word in row ("access cycle" times) is relatively rapid (and on the order of the performance found in many SRAMs). By way of example, a typical DRAM may have a page opening delay equal to about five access cycles, i.e. about equal to the time required to output five data words (or access five successive columns) once the page-opening delay has past.
Although the page-opening delay represents, in general, a disadvantage of DRAM, if a typical data request accesses, e.g. all or a substantial portion of a row (which may have, for example, 1024 8-bit words or more) the five access cycle delay represents a relatively small portion of the typical data request. However, in applications where the typical data request is much smaller, a relatively greater portion of all data access operations will be devoted to the page access delays. For example, of each accesses for a 1024 word row, only about 0.5% (5/1024) of each access is used for "unproductive" row access delays. However, if the average access is for, e.g., 100 words, over a period of time, about 5% of total memory access time is devoted to "unproductive" page access delays. Thus, a memory or a memory subsystem design which may be suitable for one purpose may result in significant performance penalties when applied to a different use. Accordingly, it would be useful to provide a memory system or memory subsystem which can reduce the effect of DRAM page access delays, particularly when average data accesses are substantially less than a row.
One previous approach to improving overall memory system or subsystem performance has been data caching. In data caching, selected portions of the data stored in the (slower) DRAM are duplicated in a smaller, but faster SRAM cache. In typical previous DRAM/SRAM "cache" systems, each data access directly obtained data only from the cache. Typically, in such systems, if the requested data did not reside in the cache, the row of DRAM containing the desired data was loaded into the cache (over-writing, e.g., the least--recently--used row of SRAM) and then the desired data was obtained from the cache. Thus, in many previous DRAM/SRAM systems, all data for a given request was read (directly) from the SRAM cache. Such a system is useful when there is a way of predicting which of the data in the DRAM is likely to be used in the future (so that this data can be loaded into the SRAM "ahead of time"). Such predictions are often based on data-co-location, i.e. a finding that, in many systems, access of certain data is followed by requests for data which is relatively closely--located, often successive data. Many cache systems routinely duplicate, in SRAM, entire rows or "pages" of DRAM, since the request for data in a given row of DRAM is very often followed by request for other data in the same row.
Although SRAM caching can be useful in improving performance in situations where such "co-location" of data is common, its benefits are diminished (or eliminated) when the degree of co-location is relatively low (or when successive data requests are for locations which are relatively more random). Furthermore, data caches typically involve certain overhead costs including the cost of assuring data coherency, i.e. assuring that the data in a cache accurately duplicates the most recent or valid data in the DRAM (e.g. in light of memory write operations that may have been performed). Accordingly, it would be useful to provide for improvements in memory or memory subsystem performance which are effective even when there is little or no data co-location and in which the overhead involved in maintaining coherency is reduced or eliminated.
One example of a device in which previous or standard memory systems or subsystems have relatively lower performance is a network switch. Typically a network switches must store and access a plurality of "frames". A frame may have, e.g., between 512 and 12,144 bits and will typically be substantially less than a DRAM row in length. Also, it is common to request frames from memory in a relatively random order, such that co-location of data is substantially low (as compared to many other uses of memory systems such as main memory of a typical personal computer). Accordingly, it would be useful to provide a memory system or a memory subsystem which provides relatively high performance when used in a network switch.