Not Applicable
Not Applicable
1. Field of the Invention
A computer system needs memory to store instructions and data that are processed by the central processing unit (CPU). In a typical computer, the CPU communicates with the memory via a bus, that sets a limit on the amount of parallel information (width) that can move in a single time unit. Memory comes in special chips known as DRAM (dynamic random access memory), which are packaged together in industry-standard modules. The chips are arranged in a line on a printed circuit board (PCB), with the memory chips either on a single side (SIMM, or Single In-Line Memory Module) or on both sides (DIMM, or Dual In-Line Memory Module). There are industry-standard machines for connecting memory chips to a board and then soldering the connections between the chips"" input/output (I/O) pins and the metal circuits printed on the board, because the cost and yield respectively increase and decrease when the same work is done by hand, even by skilled (yet thus more expensive) labor. The size of the memory chip on a SIMM or DIMM has become standardized to increase the manufacturability and lower the cost of the completed memory module. A memory module PCB, also known as a xe2x80x98memory bankxe2x80x99, will have a set of connectors between its on-board circuitry and the computer system bus additionally, it often has subordinate governing and special logic circuits to help manage the U/O flow between the memory module and the rest of the computer system.
There are two goals that designers, manufacturers, and users of memory modules strive for While the amount of memory that can be stored on a single memory chip has been increasing steadily the amount of memory necessary for smooth and fast operation of a computer system has also been increasing. Individual memory banks have gone through several generations in the past decade alone, moving from 8 megabytes to 16 megabytes to 32 megabytes to 64 megabytes per module, with 128 megabyte modules on the commercial horizon. Over roughly the same time, however, the minimum system RAM for a normal personal computer (PC) has shifted from 640K to 65MB. So one goal is to maximize, for the capacity of a given generation of memory chip, the total amount of memory that can be contained on a memory bank, when the memory bank is a SIMM (or DIMM) incorporating a number of such memory chips, within the same geometric limits for the memory banks.
At the same time that the amount of memory desired has increased, the price for memory has decreased (almost exponentially), just as the price paid for a given power computer has also decreased. This has created some unpleasant trade-offs for computer manufacturers, and for memory bank manufacturers. For the same amount of memory each year they will get less money, meaning that they constantly need ways to provide more memory for less cost. For this reason, anything that increases the manufacturing yield of a memory bank with a particular memory capacity, find anything that decreases the cost of manufacturing a memory bank, is a useful and valuable advance. Thus, decreasing the density of on-board circuitry for a memory bank necessary to provide memory capacity is a desirable result, since it decreases the cost and increases the yield of production of such. Two design elements that increase the cost, and decrease the yield, of producing a memory bank are, first, increasing the number of pin connectors needed to attain a given number of I/O connections between the memory bank and the bus; and second, requiring manual assembly and soldering of a given number of pin connections between the memory chips and the PCB the memory bank. The more pins that must be connected, the fewer banks can be manufactured in a given amount of time, and the lower the manufacturing yield will be (since there are that many more chances for a pin connection to be inaccurately made). If the number of connections required can be cut by a significant percentage, then the overall productivity for a given memory bank capacity will be increased.
Technological advances are costly to implement, and for certain implementations it is desirable to use less costly technology interchangeably with system configurations which can use higher technology. For example, sometimes it is desirable to use 16-meg chips with systems that support 64-meg technology. In such a configuration, a 64 or 72 bit wide data bus using 64-megabit (8M times 8) chips can be used. If the system is designed for 8M times 8 chips, the JEDEC standard is for a 12 times 11 address scheme (i.e., 12 row address bits and 11 column address bits). In such a scheme, only one bank is required to read all 64 or 72 bits, and thus only a single RAS signal is needed. However, 64 megabit chips all utilize 3.3 volt technology which for several reasons is quite expensive and thus, while fewer chips can be used to store the same information, these fewer chips in the aggregate are more expensive than using 16 megabit chips manufactured in 5-volt technology. For example. 8, 8 times 8 chips can be used to store the same amount of information as is stored in 32, 4 times 4 chips. However, the 32 4 times 4 chips are much cheaper in the aggregate than the 8.8 times 8 chips, and thus for many applications, even though more chips are involved, it is desirable to use the 5-volt technology and 32, 4 times 4 chips. As a result, computer designers have struggled to increase the amount of memory accessible within a given physical format or module even though there has been a more costly alternative of increasing the memory bank""s capacity by buying higher-priced but higher-capacity individual memory chips for that bank.
Moreover, bus limitations are now one of the bottleneck points to computer designs. The more that off-bus smarts can be designed into the memory bank, the less the load on the system for overhead, in managing memory addressing, becomes. A number of methods have already been designed to deal with part of this problem.
For example, a computer""s memory typically includes one or more memory banks (or memory components) connected in parallel such that each memory component stores one set of data, such as a word or double word, per memory address. The memory controller communicates with, and interprets commands from, the CPU to the memory modules. For example, the CPU issues a command and an address which are received and translated by the memory controller. The memory controller, in turn, applies appropriate command signals and row and column addresses to the memory modules. Examples of such commands include a row address strobe (RAS), column address strobe (CTS), write enable (WEE), and possibly a clock signal (CLK). (The line or bar over the acronym for a symbol generally indicates, that the active state for the particular signal is a logical low value.) In response to the commands and addresses, data is transferred between the CPU and the memory modules. Each time a memory changing update instruction must be issued, however, the computer""s cycle overhead is increased. Techniques that do not require the CPU to manage all details of memory storage decrease this overhead and thus indirectly increase performance; therefore, design elements of a memory bank that decrease the demand on the CPU to issue memory change updates indirectly increase performance.
Secondly, because the majority of program execution through the CPU is sequential in nature (operation 1, operation 2, operation 3 . . . , operation 50), program execution very often proceeds along a row of memory. When in page mode, the memory controller compares the row address of the memory location currently being accessed with the row address for the next memory access. If the row addresses are the same (known as a xe2x80x9cpage hitxe2x80x9d), then the memory controller continues asserting the RAS control signal at the end of a current bus cycle. Because the memory already has the correct row address, the new column address can be immediately transferred to the memory without requiring a RAS/CAS delay. Design elements on memory banks that support an approach of automatically flowing along the memory addresses without requiring CPU activity or RAS/CAS delays, again decrease system overhead costs and thereby indirectly increase performance. Thirdly, methods that can use extended data out (EDO) DRAMs (which are faster) improve upon the FPM DRAMs. In FPM DRAMs, the CAS high-to-low transition latches the column address, while the CAS low-to-high transition turns off an output buffer of the RAM. EDO DRAMs instead separate the two functions of the CAS signal. The low-to-high transition of CAS no longer turns off the output buffer. This change provides an extended time during which the output data is valid, hence the xe2x80x9cextended data outxe2x80x9d name. EDO memory allows the CPU to sample the output data even while an address for a subsequent data transfer operation is being set up for the next read cycle. Any design method that supports a memory bank approach that allows further extension of this approach will be valuable.
Fourthly, Burst EDO (BEDO) DRAMs improve upon the good idea used in EDO DRAMs (leaving data valid even after CAS goes high). Most current CPUs typically access BEDO DRAMs in four-cycle bursts at four adjacent memory locations to fill a cache memory. Recognizing this typical access operation. BEDO DRAMs quickly provide the following three addresses itself after receiving the first address. BEDO DRAMs; typically include a two-bit counter which provides three column addresses after the first received column address. The memory controller, and CPU, thus avoid the tight timing requirements of providing multiple addresses at appropriate times to the DRAM device. As a result, a xe2x80x9cdeadxe2x80x9d time occurring between the appearance of each bit, byte, word, set or xe2x80x9cgroupxe2x80x9d of valid data at the output pins of the BEDO DRAM device is reduced, as compared with respect to EDO and FPM DRAMs. For example, where an FPM DRAM requires an initial five clock cycles to provide a first data group, and three clock cycles for each of three subsequent data groups (i.e., xe2x80x9c5-3-3-3 burstingxe2x80x9d). BEDO DRAMs can provide bursting at rates of up to 5-1-1-1 or less. The longer that a xe2x80x98burstxe2x80x99 can be extended, the longer, of course, before the memory update instruction overhead is incurred. Memory bank design that supports such approaches will be valued.
Finally, an even faster form of DRAM is synchronous DRAM (SDRAM). FPM, EDO, and BEDO DRAM are asynchronous DRAM devices because they do not require a clock input signal. The memory controller for asynchronous devices receives the system clock signal and operates as a synchronous interface with the CPU so that data is exchanged with the CPU at appropriate edges of the clock signal. Memory controllers for SDRAM devices are necessarily simpler because the SDRAM devices and the CPU both operate based on a clock signal. To achieve optimum performance with a CPU, the SDRAM, device must be synchronized with the CPU. But the more memory that can be connected with the same clock device, the higher the SDRAM performance overall can be, again due to the decreased overhead.
As the speed of DRAM devices increase, other bottlenecks within computer systems arise. For example, as DRAM devices are operated at faster clock rates, the memory controllers to which they are coupled often cannot exchange data between the CPU and the memory device quickly, enough. Additionally, both BEDO DRAM and SDRAM devices required comparatively strict timing requirements compared to FPM and EDO DRAM devices. The strict timing requirements of BEDO DRAM requires a strict relation between generating an edge of CAS and when data is valid for reading or writing to the memory device. During each read cycle, CAS must fall during the middle of the period when data is to be read from the BEDO DRAM. For example, there is very little room for time delay or skew between the system clock and the CAS control signal supplied to the BEDO DRAM when the BEDO DRAM is operated in the 5-1-1-1 burst mode. As a result, designers must design their computer systems, or other applications, with minimum trace lengths on circuit boards to reduce propagation delays, and employ other methods to minimize skew between the system clock and command signals based on the clock. One memory controller chip set by Intel is believed to accommodate BEDO DRAM; however, such a chip set likely still requires the designer to be subject to the strict timing requirements of BEDO memory. Similarly, SDRAM devices require strict timing of data transfers with the SDRAM device in relation to the system clock signal. As a result of such strict requirements of BEDO DRAM and SDRAM devices, computer system designers and other users of DRAM devices have difficulty implementing such higher speed DRAM devices into their applications, despite the increased performance of such devices. As a result, system designers have accepted and employed lower speed DRAM devices in exchange for looser timing requirements in their designs, despite the speed and other benefits of BEDO and SDRAM devices. Another problem has been fitting the increasing density of memory within given physical constraints. A related problem has been providing memory within close enough physical location that timing difficulties, as detailed above, do not arise from greatly differing circuit lengths between differing memory banks. As the speed of memory access increases, this latter problem has worsened. As a result, designers must designs their computer systems, or other applications, with minimum trace lengths on circuit boards to reduce propagation delays, and employ other methods to minimize skew between the system clock and command signals based on the clock. With the new Burst Extended Data Out chips (BEDO), where memory access can automatically shift along more than one memory chip or module, comparatively strict timing (and thus circuit length) requirements exist.
Moreover, the denser the connecting lines between a memory bank and the bus, and between the PCB and the individual memory chips have to be, the greater the heat density, and the more difficult the manufacturability, of a particular memory bank becomes. The more that a design allows the memory banks to share I/O ports, the fewer pin connectors are required, decreasing manufacturing difficulty and cost, and increasing yield, for a given memory density.
The invention described below meets both of the goals mentioned above, and thus provides a significant advantage of the current state of the art for memory modules. First, for a given generation of memory chip, it doubles or quadruples the amount of memory that can be addressed in a single memory module using a single edge connector to the computer system bus, by allowing simultaneous access to a stepped fractional portion of the shared memory capacity aboard the module for each I/O operation or cycle, while requiring substantially less pin connectors for a given PCB. Instead of using 168 pins for a standard bank, it need only use 90 pins: and only half (32) the number of control connectors as on the standard approach (64), because the control connectors are shared between memory banks.
Secondly, because the control lines are, shared between banks, and thus only half the number are needed, the length of the lines is shortened between the I/O pins connecting to the system bus and the I/O pins connecting to the individual memory chips.
Thirdly, and again for a given generation of memory chip, the memory bank can be manufactured using industry-standard PCB assembly machinery rather than manual assembly to connect the memory chips"" pins to the PCB. Because the banks are flexibly connected with supporting elements, the completed memory unit is sturdier and thus easier to use in further assembly onto the computer motherboard.
Fourthly, because the memory chips are mounted onto standard PCBs rather than directly on top of each other, the yield is increased. Instead of using 100 memory chips to produce 40 directly-double-stacked memory banks, 100 memory chips are used to produce 50 double-bank memory banks. Furthermore, the number of steps required is a quarter that for other memory-bank production methods using stacking technology.
Finally, because the method allows the usage of industry-standard, surface mount machinery for automated assembly onto standard PCBs, rather than requiring dual-pass or manual mount approaches, the manufacturing costs and yields are comparable to those of memory banks with half the given capacity, for a particular generation of memory chip.
2. Description of the Related Art
Various means to deal with the memory constraints have been provided. Some memory modules have built in memory delays which copy the CAS signal (as seen by the BEDO) to control memory. Others have built separate memory controllers to control the signals from the motherboard to the memory board. None of these have addressed the need for increased memory density in a given physical locality, i.e. in a particular, single memory card attached to the motherboard and bus of the personal computer.
The concept of a stackable memory card, as seen in U.S. Pat. No. 5,963,464 ignores two significant, and one less, problems which are solved by the subject of this application. First, in that patent the orientation of the initial memory bank must be horizontal, and there must be space above the motherboard sufficient to contain the entire stack of xe2x80x98secondaryxe2x80x99 boards. Secondly, the additional wiring length (and distance) increases linearly with the addition of each memory card, which will create timing and control problems dependent upon the memory addressing used by the CPU. Finally, as the heat of the memory chips rises, there risk of chip failure from overheating increases with the addition of each stacked memory card.
The use of a Field Effect Transistor (FET) or other switching device to isolate- individual memory modules, and even memory chips, from each other on multiple-banks of memory in U.S. Pat. No. 5.953,215 addresses solely the desire to reduce the perceived capacitance and resistance from the presence of multiple memory chips. The switching devices in this invention require additional control signals out of the addressing methods to turn the switches on or off, and limit the memory throughput of the entire memory module to the lower limit set by the capacitance limit chosen by the module designer.
In the preferred embodiment of this invention at least two chipsets of industry-standard SIMM or DIMM RAM memory chips are first each mounted, using industry-standard surface mount technology, on a PCB with a chip and bank selecting controller chip, wherein the first bank contains an edge connector for linking the combined memory banks to the computer system, and the two banks share the data, address, and control lines from the edge connector to the memory chipsets. Address commands directed from the CPU are decoded to direct the memory I/O flow to the proper bank and row(s) to be addressed, with automatic updating through the capacity of, both banks to accommodate the programmed sequence. Burst lengths may be for 1, 2, 4 or 8, locations, or a full page, with a burst terminate option. Bank (i.e. column) addresses may be changed on each clock cycle to achieve a high-speed, fully random access. Moreover, while one bank is being addressed another, connected bank can be precharged to hide or eliminate the need for address update control information from the CPU. The PCBs are physically linked together by snap connectors which support the combination at four points, providing physical stability and adequate separation for heat dispersion from the memory chips. Because the banks are filled simultaneously (rather than serially), for a given address flow the number of row- or chip-change instructions will be reduced to the extent that multiple operations do not overflow the capacity, which is effectively doubled over that of a single bank. Because the distance between one bank and the next will be less than the distance along the bus between one module and the next, the time delay for addressing memory and the overall capacity within a given motherboard""s layout will be decreased and increased respectively. Finally, because the two banks share addressing, control, and data lines (with the controller circuit coordinating the flow between the banks), the density of leads on each bank is reduced for a given memory capacity, thereby increasing the manufacturabilty, yield, and longevity of a module with a given memory capacity.