1. Field of the Invention
The invention relates to semiconductors and more particularly to memory devices such as Synchronous Dynamic Random Access Memory devices.
2. Discussion of Related Art
Conventional Dynamic Random Access Memory (DRAM), of the type that has been used in PCs since the original IBM PC, is said to be asynchronous. This refers to the fact that the operation of the memory is not synchronized to the system clock but depends entirely on the timing inherent in the memory device regardless of the frequency of the system clock.
For example, referring to FIG. 1, a system 100 has a processor 101 that is coupled to a memory controller 104 by way of an address bus 106 and a bi-directional data bus 108. The memory controller 104 is, in turn, coupled to an asynchronous type memory device 110 by way of both the address bus 106 and the data bus 108. In order to access the memory device 110 in what is referred to as either a READ or a WRITE operation, a specific procedure must be followed. Typically, the processor 101 generates a specific memory address request (also referred to as a memory page request) corresponding to the location in the memory device 110 where data (or memory page) required by the processor 101 is stored. The memory address request is passed to the memory controller 104 by way of the address bus 106.
In conventional memory systems, the memory controller 104 generates the appropriate memory access signals that are decoded by the memory device 110 identifying the memory location in the memory device 110 where the requested data is stored. Once accessed, the stored data is output to the data bus 108 to be read by the processor 101 or whatever other device requested it. It should be noted that since the above-described operations are performed asynchronously with regard to the system clock, the processor 101 is usually required to wait for the appropriate data to be made available. These wait states degrade effective processor performance since the processor 101 cannot complete a desired operation without the requisite data from the memory device 110.
More specifically, during, for example, a READ operation, the processor 101 generates an address request corresponding to the memory location in the memory device 110 at which the required data is stored. Since all memory chips hold their contents in a logical xe2x80x9csquarexe2x80x9d of memory cells 112 in the form of rows 114 and columns 116, reading data stored in, for example, the memory cell 112a, requires that first, a row 114a be activated using what is referred to as a xe2x80x9cRow Address Selectxe2x80x9d (or xe2x80x9cRow Address Strobexe2x80x9d, xe2x80x9c/RASxe2x80x9d) signal that is provided by the memory controller 104. Specifically, the RAS is a signal sent to a DRAM that tells it that an associated address is a row address. Typically, the /RAS signal is based upon a xe2x80x9clower halfxe2x80x9d of the address request provided by the processor 101. When received and properly decoded, the /RAS signal causes the data in the entire row 114a to be transferred to a sense amp 118 after a period of time required for the selected row to stabilize.
Once the selected row has stabilized and the data in the selected row is transferred to the sense amp 118, the memory controller 104 further decodes the address request forming what is referred to as a xe2x80x9cColumn Address Selectxe2x80x9d (xe2x80x9c/CASxe2x80x9d) signal which when sent to a DRAM tells it that an associated address is a column address. The /CAS signal causes column select circuitry (not shown) to select the specific cell (in this case 112a) in the memory array that contains the desired data. The data stored in the cell 112a is then sent out to the data bus 108 from the sense amp 118 where the processor 101 or other device that requested the data can read it. It should be noted that the data bus 108 is a bi-directional data bus since during a WRITE operation, the processor 101 provides data to be stored in the memory device 110.
FIG. 2 is a timing diagram 200 illustrating the above-described READ operation. The performance of the memory device 110 is based upon several critical timing paths that includes the duration of time between the acquisition of data at the data bus 108 and the falling edge of the /RAS signal (referred to as access time from /RAS, or trac). Another critical timing path is referred to as access time to column address tcac is defined as the duration of time from the falling edge /CAS to the data out to data bus 110. Any, and all, of these delays, also referred to as memory latency, degrades system performance since the speed of the DRAM is directly related to the slowest critical path.
Usually, the worst case latency in any DRAM is specified by the row access time tRAC that is itself composed of several components, at least two of which are directly related to data line length (and therefore chip size and bit density) and the associated capacitive loading coupled thereto (referred to as RC delay). One such component is referred to as bit line sensing latency which is defined as the time for the data stored in a memory cell to be detected by the corresponding sense amp. This bit line sensing latency is affected by many factors, including bit line architecture, the RC of the sense amp drive line, cell-to-bit line capacitance ratio, as well as sense amp topology. Another component which substantially contributes to overall memory latency is referred to as output driving latency. Output driving latency is defined as the time required for the data to be propagated from the sense amp to the output node (again an RC-type delay).
Conventional attempts to reduce tRAC generally strive to reduce these two components by way of various circuit and layout techniques. In the case of bit line sensing latency, since the cell-to-bit line capacitance ratio directly impacts the bit line sensing delay, increasing this ratio reduces the bit line sensing latency (by providing a higher memory cell drive current). Typically, this approach is practiced by either increasing memory cell capacitance (by increasing cell size) or by putting fewer memory cells on a single bit line. Unfortunately, however, both of these approaches increase overall cell area which reduces cell density resulting in larger chips with lower bit density and a concomitant increase in cost.
Fortunately, even with these circuit delays, the asynchronous DRAM memory device 110 works well in lower speed memory bus systems, it is not nearly as suitable for use in high-speed ( greater than 66 MHz) memory systems since each READ operation and WRITE operation can not be any faster than the memory latency which is typically on the order of 5-7 clock cycles. In order to service these high-speed systems, therefore, a relatively new and different kind of RAM, referred to as Synchronous DRAM, or SDRAM, has been developed. The SDRAM differs from earlier types of DRAM in that it is tied to the system clock and therefore does not run asynchronously as do standard DRAMs. Since SDRAM is tied to the system clock and is designed to be able to READ or WRITE from memory in what is referred to as a burst mode (after the initial READ or WRITE latency) at 1 clock cycle per access (zero wait states), the SDRAM is able to operate at bus speeds up to 100 MHz or even higher. By running at the system clock, no wait states are typically required (after initial set up) by the processor resulting in the higher system speeds.
SDRAM accomplishes its faster access using a number of internal performance improvements that include a xe2x80x9cburst modexe2x80x9d capability, which allows the SDRAM to transfer multiple cells without cycling the /CAS line thereby limiting the CAS latency to the first few clock cycles of the burst read. This operation is what makes SDRAM xe2x80x9cfasterxe2x80x9d than conventional DRAM even though the actual internal operations are essentially the same. By way of example, a 4 cycle burst READ can be accomplished in 8 clock cycles (5,1,1,1) where xe2x80x9c5xe2x80x9d represents the initial READ latency of 5 clock cycles, whereas to read the same data, a standard DRAM would require 20 clock cycles (5,5,5,5). Another internal improvement is related to the organization of the SDRAM memory core. Using what is referred to as a multi-bank architecture, the memory cells that constitute the storage elements of the SDRAM are grouped in what is referred to as memory banks each of which is selected (or not) based upon a bank select signal. In this way, while one bank one of the multi-bank SDRAM is outputting data during a READ, for example, another bank is being activated such that there is effectively no latency in accessing any banks after initial startup.
Referring to FIG. 3 showing a prior art multi-bank SDRAM 300. The SDRAM 300 includes a number of memory banks, bank 0, bank 1, bank 2, and bank 3, each of which must be capable of supplying a full data word to each of the I/O""s 302, 304, 306, and 308. In the case of, for example, a xc3x9732 architecture, (i.e., the SDRAM 300 is coupled to a 32 bit data bus) each of the memory banks bank 0 through bank 3, when selected, must provide an 8 bit data word to each of the outputs 302-308, for example, during a READ operation, in order for a total of 32 bits to be transferred to at 32 bit data bus.
For example, the bank 0 has stored therein a requested 32 bit data word D the form of 8 bit data words D1, D2, D3, and D4. During an exemplary READ operation, a bank select signal activates the bank 0 and (after appropriately accessing the memory location at which the data word D1 is stored), a sense amp 310 coupled to the bank 0 outputs the data word D1 to the output 302 by way of a data line 312. In a similar manner, after appropriate decoding, the sense amp 310 outputs the data word D2 to the output 304 by way of a data line 314, the data word D3 to the output 306 by way of the data line 316, and the data word D4 by way of the data line 318 to the output 308. In all cases, the period of time between the output of the data word Di from the sense amp 310 and its receipt at the corresponding output is referred to as output drive delay time ti. For example, the time it takes the data word D3 to travel from the sense amp 310 to the output 308 is referred to as the output drive delay time t3.
Since each of the output drive delay times ti are directly dependent on line length, it is important that the line lengths of the data lines 312-318 be as short as possible. For example, the output drive delay time t4 will be substantially greater than the output drive delay time t1 simply due to the fact that the length of the data line 318 is substantially greater than that of the data line 312. Since the overall performance of the SDRAM 300 is dictated by the slowest critical path, the longest data line will effectively dictate the overall speed performance of the SDRAM 300. Additionally, the skew between the various outputs 302-308 will be also directly related to the relative output drive delay times ti.
Therefore, what is required is a compact, memory architecture suitable for providing high-speed memory access.
According to the present invention, methods, apparatus, and systems are disclosed for providing a high performance multi-bank synchronous dynamic random access memory. In one aspect of the invention, a memory device is described having a plurality of input/outputs (I/Os) coupled to a memory core having a plurality of input/outputs (I/Os). The memory core includes a plurality of memory cells coupled to the plurality of I/Os arranged to store data in the form of at least a first data word and a second data word. The memory core also includes a first bank segment arranged to store a first portion of the first data word, a second bank segment arranged to store a first portion of the second data word. The first bank segment and the second bank segment are logically separate and distinct such that the first bank segment is activated only in response to a first bank select signal and wherein the second bank segment is activated only in response to a second bank select signal.
The memory core also includes a first segmented sense amp coupled to the first bank segment by way of a first bit line and to the second bank segment by way of a second bit line, wherein the first bit line is shorter than the second bit line, and wherein the segmented sense amp responds to the first bank select signal by sensing the first portion of the data word using the first bit line and responds to a second bank select signal by sensing the first portion of the second data word using the second bit line such that a first bit line sensing delay associated with the first bit line is less than a second bit line sensing delay associated with the second bit line, and a first data line coupling the segmented sense amp to a nearest located one of the plurality of I/Os.
In one embodiment, a computing system is disclosed. The computing system includes a memory device having distributed memory bank segments coupled to associated segmented sense amps. The segmented sense amps provide reduced bit line sensing delays due to shortened bit lines corresponding to each of the bank segments. The computing system also includes a processor for performing executable instructions stored in the memory device coupled to a memory controller by way of a bi-directional data bus and an address bus.
In another embodiment, a method of accessing requested data from a memory device having a first bank segment and a second bank segment each of which are proximally located from their associated I/Os is provided. As a method a memory address request based upon requested data is generated. A first bank select signal is provided to the memory device based upon the memory address request. In response to the bank select signal, the segmented sense amp is enabled to sense the first portion of the data word while in response to a second bank select signal, the segmented sense amp is enabled to sense the second portion of the data word. The first portion of the data word is sensed from a memory cell in the first bank segment such that a first bit line sensing delay is commensurably reduced.