1. Field of the Invention
This invention relates to computer systems, and, more particularly, to memory subsystem hardware to support flash memory.
2. Description of the Related Art
In order to increase the performance of computer systems, system designers may use a hierarchical arrangement of storage devices to take advantage of the memory locality typically exhibited by computer programs. Memory locality, as used herein, refers to the tendency of computer programs to frequently access the same or related storage locations, either in a relatively short time duration or within close proximity. For example, paging may be used in a virtual memory implementation to bring frequently accessed data into main memory from an auxiliary storage device one page at a time. It is assumed that accessing main memory is faster than accessing an auxiliary device, although the auxiliary device may store data for less cost per unit of storage. Alternatively, or in addition, a copy of data that is frequently accessed may be stored in a cache made up of faster devices having a relatively small total capacity. Data may be stored in a cache, one cache line at a time, where a cache line is typically smaller than a page.
Generally speaking, modern computer systems use a hierarchy of memory devices including one or more levels of cache memory in which data is duplicated one cache line at a time, a region of main memory in which data is stored one page at a time, and various levels of auxiliary storage. Cache memory may be coupled closely to a processing unit and/or included within a processing unit. Main memory is generally coupled directly to one or more processing units via a fast system bus. Auxiliary storage may be coupled to one or more processing units via a general purpose I/O interface system such as Parallel ATA (PATA), Serial ATA (SATA), Small Computer System Interface (SCSI), Universal Serial Bus (USB), Peripheral Component Interconnect (PCI), and the like as part of an I/O system hierarchy rather than the more directly coupled memory devices. When a desired page is not found in main memory (referred to as a page fault), the page may be retrieved from auxiliary storage using a technique know as Direct Memory Access (DMA). In DMA, an auxiliary device may directly access memory, transferring a page to or from main memory independently from other processing units. However, most data transfers within the memory hierarchy require the involvement of a central processing unit.
In addition to the above considerations, computer systems may implement various types of parallel processing, for example by providing multiple processing units within the system (also referred to as multi-core processors), or by integrating multiple discrete systems or subsystems together via a network or other type of interconnect to create a still more complex parallel system. In multi-core systems that provide access to shared memory, the possibility exists that two or more independent, concurrently executing processor tasks may attempt to concurrently access the same addressable location in memory. For example, one task may attempt to write the location at the same time the other attempts to read it. Absent some technique to predictably order or regulate such concurrent memory accesses, commonly referred to as a coherence protocol, unpredictable or erroneous execution behavior may result. For example, the two tasks mentioned may produce different computational results depending on the order in which the write occurs relative to the read, which otherwise might be completely random. Similar problems may occur if different processing units in a multi-core system attempt to locally cache shared data.
Coherence issues may also be present when auxiliary devices have access to memory via DMA or otherwise. When a processing unit accesses a particular memory location, it stores the current value from the particular location in cache. If updates to the cache are not propagated to the auxiliary device, the next time the particular location is accessed, a stale version of the data may be loaded from the auxiliary device, overwriting the updates. Computer systems use various coherence protocols to ensure that no processing unit is operating with an out-of-date copy of data. In DMA implementations, cache-coherence may be maintained in hardware or software that flushes cache lines that have been invalidated by a DMA access.
It may be desirable to add low cost storage devices such as flash memory to a computer system's memory hierarchy to lower overall system cost and increase performance compared to auxiliary devices such as hard disk storage. Unfortunately, without DMA access between these storage devices and main memory, cache coherence issues may arise. In addition, these storage devices may require specialized, multi-instruction access operations. Implementing these access operations using a central processing unit may be inefficient. Consequently, what are needed are systems and methods for incorporating flash memory-type storage devices in a computer system's memory hierarchy that account for these issues.