1. Field of the Invention
This invention relates generally to semiconductor devices. More particularly, this invention relates to techniques for performing pipelined memory operations in memory devices.
2. Description of the Related Art
The need for high performance memory systems has increased due to the demand for increased performance central processor units and graphics processing units. High performance has two aspects that are important in memory system design. The first aspect is high throughput (sometimes termed effective or sustainable bandwidth). Many processor and graphics units perform a large number of operations per second and put a proportionally high rate of memory requests upon the memory system. For example, a graphics system may require that a large number of pixels in a display be updated in a frame time. Commonly, a graphics display may have a million pixels and require an update 70 to 100 times per second. If each pixel requires computation on about 10 to 16 bytes of memory for every frame, this translates to a throughput requirement of about 0.7 to 1.6 Gigabytes/second. Thus, a memory subsystem in a graphics application must be able to handle a high rate of memory requests. Another aspect of these memory requests is that they have a reference pattern that exhibits poor locality. This leads to a requirement that the requests from the graphics application be specifiable at the required throughput for the requests.
The second aspect of high performance is low service time for the application, where service time is the time for the memory system to receive and service a request under the load of the given application. An example of an application where service time is important is the case of a processor making a memory request that misses its cache and requires a memory operation to service the miss in the midst of other memory traffic. During the time of the miss, the processor may be stalled waiting for the response. A processor with a 4 ns cycle time may have to wait 20 cycles or more to receive a response to its request depending on the service time of the memory system, thus slowing down the processor. Memory requests from the processor also have poor locality of reference due to the use of processor caches. This implies a requirement that the request be filly specifiable at the time the request is made so that the request can enter the memory system without delay. Thus, there is a need for low service time for a memory request.
Another important factor for improving memory speed is memory core technology. Memory systems that support high performance applications do so with a given memory core technology where the term memory core refers to the portion of the memory device comprising the storage array and support circuitry. An example of a memory core 100 is shown in FIG. 1 and is discussed in more detail below. One of the more important properties of the memory core is the row cycle time (tRC), which is shown in FIG. 4. Typically, the row cycle time is fairly slow, being on the order of 60 to 80 ns. However, a large amount of data, on the order of 1 KBytes or more, is accessed from the storage array in this time, implying that the storage array is capable of high throughput. However, the reference streams for the applications discussed above do not need large amounts of data with fairly slow cycle times. Instead, the pattern is to access small amounts of data with very short cycle times. Another important property is the column cycle time (tPC), which is shown in FIG. 7. Once a memory core has performed a row access and obtained the 1 Kbytes or so of row data, one or more column cycles is required to obtain some or all of the data. The construction of the core is such that a reference stream that sequentially accessed some or all of the row data is best, rather than a reference stream that moved to another row and then returned to the first row. Again the reference streams of practical applications do not fit this pattern. The application reference stream has very poor spatial locality, moving from row to row, only accessing some small portion of the data in the row, making poor use of the relatively high column cycle rate that is possible. Thus, an interface system is required in the memory device to help adapt the high throughput and low service time demands of the application reference stream to the properties of the memory core. One of the primary limitations in current memory technology to adapt to the application reference stream is not enough resources, including bank and control resources, in a memory device. By introducing enough resources into the device and operating these resources in a concurrent or pipelined fashion, such a memory device can meet or exceed the current demands without substantially increasing the cost of the memory device.
Another property of memory cores is that they have greatly increased in capacity with 256 Megabit or larger devices being feasible in current and foreseeable technology. For cost and other reasons, it is desirable to deliver the high performance demanded from a single memory device. The benefits of using a single memory device are that the performance of the memory system does not depend so much on the presence of multiple devices, which increase cost, increase the size of incremental additions to the memory system (granularity), increase the total power required for the memory system and decrease reliability due to multiple points of failure. Total power in the memory system is reduced with a single memory device because power is dissipated only in the single device which responds to a memory request, whereas, in a memory system with multiple devices responding to a memory request, many devices dissipate power. For example, for a fixed size application access and fixed memory core technology, a multiple device system with N components will access N times as many memory bits, consuming N times the power to access a row.
In view of the foregoing, it would be highly desirable to provide improved memory systems. Ideally, the improved memory systems would provide high performance and improved memory core technology.
A single high performance memory device having a large number of concurrently operated resources is described. The concurrently operated resources include bank resources and control resources. Added bank resources in the memory device permit multiple banks to be operated concurrently to both reduce service time and increase throughput for many applications, especially ones with poor locality of reference. Added control resources operating concurrently in a high frequency pipeline break up a memory operation into steps, thus allowing the memory device to have high throughput without an adverse effect on service time. A single memory device delivering high performance may be combined with additional memory devices to increase the storage capacity of the memory system, while maintaining or improving performance compared to that of the single memory device.