As memory devices of all types have evolved, continuous strides have been made in improving their performance in a variety of respects. For example, the storage capacity of memory devices has continued to increase at geometric proportions. This increased capacity, coupled with the geometrically higher operating speeds of electronic systems containing memory devices, has made high memory device bandwidth ever more critical. One application in which memory devices, such as dynamic random access memory (“DRAM”) devices, require a higher bandwidth is their use as system memory in computer systems. As the operating speed of processors has increased, processors are able to read and write data at correspondingly higher speeds. Yet conventional DRAM devices often do not have the bandwidth to read and write data at these higher speeds, thereby slowing the performance of conventional computer systems. This problem is exacerbated by the trend toward multi-core processors and multiple processor computer systems. It is currently estimated that computer systems operating as high-end servers are idle as many as 3 out of every 4 clock cycles because of the limited data bandwidth of system memory devices. In fact, the limited bandwidth of DRAM devices operating as system memory can reduce the performance of computer systems to as low as 10% of the performance of which they would otherwise be capable.
Various attempts have been made to increase the data bandwidth of memory devices. For example, wider internal data buses have been used to transfer data to and from arrays with a higher bandwidth. However, doing so usually requires that write data be serialized and read data deserialized at the memory device interface. Another approach has been to simply scale up the size of memory devices or conversely shrink their feature sizes, but, for a variety of reasons, scaling has been incapable of keeping up with the geometric increase in the demand for higher data bandwidths. Proposals have also been made to stack several integrated circuit memory device dice in the same package, but doing so threatens to create a large number of other problems that must be overcome.
One potential problem with stacking memory device dice on top of each other is that it may create signal timing skews between the signals transmitted to or from each of the memory devices. Insofar as the distances between each of the memory devices and an interface for the packaged memory devices will vary for each memory device, the time required for signals to be transmitted to and from each of the memory devices will inherently vary. This can be a considerable problem because there may be a large number of memory device dice in the stack, such as, for example, eight memory devices. Additionally, because of process, temperature and supply voltage variations, the timing performances of the memory devices may vary even if they are fabricated on the same wafer. An example of such signal timing skews is illustrated in FIG. 1, which shows the period during which read data signals are considered valid at a package interface for each of 4 stacked dynamic random access memory (“DRAM”) device dice DRAM0-DRAM1. This data valid period is sometimes referred to as a data “eye.” As shown therein, the read data for DRAM2 is valid first, followed by DRAM0, DRAM1 and finally DRAM3. The period during which all of the read data, i.e., the composite eye 8 for all of the DRAM, is almost nonexistent. Therefore, it would be very difficult for a memory access device, such as a memory controller or processor, to capture the read data using a single clock signal, particularly as the operating speeds and resulting data transfer rates of memory devices continue to increase.
In the past, the problem of signal skews from different memory devices has been greatly alleviated by transmitting respective read strobe signals from the memory devices along with the respective read data. The strobe signal is then used by the memory access device to capture the read data. Insofar as differences in the timing of read data from each of the memory devices are substantially matched by differences in the timing of the strobe signals, transitions of the strobe signals are substantially centered in the data eye from each memory device, thereby allowing the memory access device to successfully capture the read data from each of the memory devices. As the operating speed of memory devices have continued to increase, even this approach has not been sufficient. As a result, techniques have been developed to adjust the timing of the strobe signals by either adjusting their transmit times at the memory devices or by delaying them by adjustable amounts in the memory access device. Alternatively, the timing of each of the bits of the read data can be adjusted relative to the timing of the read strobe signal. An example of a memory device that adjusts the timing of each bit of read data in this manner is described in U.S. Pat. No. 6,882,304.
The conventional approach of adjusting the timing between a read strobe signal and read data signals could be used for stacked memory device dice. However, doing so would require an extensive amount of timing adjustment circuitry in each memory device, thereby reducing the area of each memory device dice available for providing memory capacity. Adjusting the timing between a read strobe signal and read data signal in each memory device die would also require sending a read strobe signal from each memory device. Furthermore, although the timing problems have been discussed with respect to read data signals, essentially the same type of problems can exist with write data signals, command signals and address signals. If a separate strobe signal was transmitted to or from each memory device for each of these types of signals, the strobe signals would require that the packaged memory devices include a large number of strobe terminals. For example, if eight memory device dice were stacked, 32 terminals would be required to transfer a strobe signal to or from the memory devices of all these types of signals. Yet, it is generally considered undesirable to unduly increase the number of terminals in a memory device because of the lack of available area in a memory device package and the large number of conductors that would be required in the bus or circuit board on which the memory device was mounted.
Therefore, a need exists for a method and apparatus to minimize problems and limitations resulting from timing skews between signals transmitted to or from stacked memory device dice in a manner that maximizes the area of a die available for memory capacity and does not unduly increase the number of required terminals.