Dynamic random access memories (DRAM's) are used as main memory in many of today's computer systems. One of the factors that have led to the popularity of the DRAM has been the DRAM's simple cell structure. Because each DRAM storage cell consists of just a single capacitor, it is possible to pack a very large number of storage cells into a very small amount of chip space. Consequently, with DRAM technology, it is possible to manufacture very high-density, low cost memories.
With reference to FIG. 1, there is shown a functional diagram of a typical DRAM 100. As shown in FIG. 1, a DRAM 100 comprises a plurality of memory cells 102 arranged in a plurality of rows and columns. Each row of memory cells 102 is coupled to one of the wordlines 104 of the DRAM 100, and each column of memory cells 102 is coupled to one of the bitlines 106. By specifying a wordline 104 and a bitline 106, a particular memory cell 102 can be accessed.
To enable access to the various memory cells 102, there is provided a row decoder 112 and a column decoder 114. The row decoder 112 receives a row address on a set of address lines 116, and a row address strobe (RAS) signal on a control line 118, and in response to these signals, the row decoder 112 decodes the row address to select one of the wordlines 104 of the DRAM 100. The selection of one of the wordlines 104 causes the data stored in all of the memory cells 102 coupled to that wordline 104 to be loaded into the sense amplifiers (sense amps) 108. That data, or a portion thereof, may thereafter be placed onto the data bus 110 (for a read operation).
What portion of the data in the sense amps 108 is actually placed onto the data bus 110 in a read operation is determined by the column decoder 114. More specifically, the column decoder 114 receives a column address on the address lines 116, and a column address strobe (CAS) signal on a control line 120, and in response to these signals, the column decoder 114 decodes the column address to select one or more of the bitlines 106 of the DRAM 100. The number of bitlines 106 selected in response to a single column address may differ from implementation to implementation, and is referred to as the base granularity of the DRAM 100. For example, if each column address causes sixty-four bitlines 106 to be selected, then the base granularity of the DRAM 100 is eight bytes. Defined in this manner, the base granularity of the DRAM 100 refers to the amount of data that is read out of or written into the DRAM in response to each column address/CAS signal combination (i.e. each CAS or column command).
Once the appropriate bitlines 106 are selected, the data in the sense amps 108 associated with the selected bitlines 106 are loaded onto the data bus 110. Data is thus read out of the DRAM 100. Data may be written into the DRAM 110 in a similar fashion. A point to note here is that in a typical DRAM, the address lines 116 are multiplexed. Thus, the same lines 116 are used to carry both the row and column addresses to the row and column decoders 112, 114, respectively. That being the case, a typical memory access requires at least two steps: (1) sending a row address on the address lines 116, and a RAS on the control line 118; and (2) sending a column address on the address lines 116, and a CAS on the control line 120.
A timing diagram illustrating the various steps carried out during typical DRAM read cycles is shown in FIG. 2. As shown, to initiate a read cycle, a row address 208(1) is placed onto the address lines 116, and a RAS signal 202(1) is asserted on the RAS control line 118. Then, a column address 210(1) is sent onto the address lines 116, and a CAS signal 204(1) is asserted on the CAS control line 118. A short time thereafter, the data 206(1) stored at the locations indicated by the row address 208(1) and the column address 210(1) appear on the data bus 110. Data is thus extracted from the DRAM 100. After the first set of data 206(1) disappears from the data bus 110, a second read operation may be initiated. Like the first read operation, the second read operation begins with a row address 208(2) on the address lines 116, and a RAS signal 202(2) on the RAS control line 118. Then, a column address 210(2) is sent onto the address lines 116, and a CAS signal 204(2) is asserted on the CAS control line 118. A short time thereafter, the data 206(2) stored at the locations indicated by the row address 208(2) and the column address 210(2) appear on the data bus 110. The second read operation is thus completed. Additional successive reads may be carried out in a similar fashion.
Notice from the timing diagram of FIG. 2 that, for individual read cycles, there is substantial idle time between successive data sets 206 on the data bus 110. During this idle time, the data bus 110 is not utilized and no data is being transferred. The more idle time there is, the lower the utilization rate of the data bus 110, and the lower the utilization rate, the longer it will take for an external component (such as a CPU) to extract data from the DRAM 100. Since almost all operations of a computer require the use of memory, the longer it takes to get data from a memory, the slower the performance of the overall computer system. Thus, low bus utilization can have a direct negative impact on the overall performance of a computer system.
To improve data bus utilization, several techniques have been developed. One such technique involves the use of a “burst” mode of operation. Basically, in burst mode, rather than implementing just one column access for each RAS command, a plurality of column accesses are carried out for each RAS command. This results in consecutively accessing multiple sets of data from the same row of a DRAM 100. A timing diagram illustrating the operation of burst mode is shown in FIG. 3. More specifically, FIG. 3 depicts two burst mode read cycles, with each read cycle being directed to a different row of the DRAM.
To initiate a burst mode read cycle, a row address 308(1) is placed onto the address lines 116, and a RAS signal 302(1) is asserted on the RAS control line 118. Then, a column address 310(1) is sent onto the address lines 116, and a CAS signal 304(1) is asserted on the CAS control line 118. In response to the column address 310(1) and the CAS signal 304(1), the DRAM internally generates a plurality of additional column addresses. These additional column addresses are generated based upon the column address 310(1) that is provided, and a predetermined scheme. For example, the additional column addresses may be generated by incrementing the provided column address 310(1), decrementing the column address 310(1), or by manipulating the column address 310(1) in some other manner. The number of additional column addresses generated by the DRAM depends upon the burst length that the DRAM is implementing. In the example shown in FIG. 3, the burst length is four; thus, three additional column addresses are generated by the DRAM.
As the provided column address 310(1) is received, and as each additional column address is generated, they are applied by the DRAM to access a particular set of data. These addresses are applied in succession so that multiple sets of data are accessed from the same row of the DRAM. A short time after the application of each column address, data 306 stored at the locations indicated by the row address 308(1) and the applied column address starts to appear on the data bus 110. Because this data 306 is extracted from the DRAM 100 in response to consecutive applications of column addresses, there is no idle time between the sets of data 306(1)-306(4) on the data bus 110. As a result, data bus utilization is improved.
After the first set of data 306(1)-306(4) disappears from the data bus 110, a second burst mode read operation may be initiated. Like the first read operation, the second read operation begins with a row address 308(2) on the address lines 116, and a RAS signal 302(2) on the RAS control line 118. Then, a column address 310(2) is sent onto the address lines 116, and a CAS signal 304(2) is asserted on the CAS control line 118. In response, the DRAM generates three additional column addresses, and applies the provided column address 310(2) and the additional column addresses in succession to access multiples set of data from the same row. Shortly after each column address is applied, data 306 stored at the locations indicated by the row address 308(2) and the applied column address appears on the data bus 110. Because this data 306 is extracted from the DRAM 100 in response to consecutive applications of column addresses, there is again no idle time between the sets of data 306(5)-306(8) on the data bus 110. The second read operation is thus completed. Additional successive reads may be carried out in a similar fashion.
Several aspects of burst mode operation should be noted. First, notice that burst mode significantly increases output data granularity. More specifically, because a burst mode memory request involves multiple column accesses, the data extracted from the DRAM in response to a burst mode request is not just one base granularity in size, but rather is a multiple of the base granularity, where the multiple is equal to the burst length. Thus, in the example shown in FIG. 3, the output data granularity of the DRAM is four times the base granularity. This may pose a problem in some implementations. For example, in some applications, a CPU may wish to access only one base granularity of data at a time. If burst mode is implemented in such an application, then all of the data after the first granularity will be dropped by the CPU. In such a case, even though data bus utilization is improved by the use of burst mode, overall system efficiency is not improved because the extra data from the memory is not used. From an efficiency point of view, the end result is the same as if burst mode were not implemented at all. In such applications, burst mode does not provide a useful solution.
A second aspect to note is that burst mode eliminates data bus idle time only so long as the same row is being accessed. As soon as a different row is accessed, a significant amount of idle time is introduced on the data bus 110, as shown in FIG. 3. Thus, in applications where access of the DRAM switches from row to row on a regular basis (as is often the case), there is a substantial amount of idle time on the data bus 110, even if burst mode is implemented.
To further improve data bus utilization, burst mode may be implemented in conjunction with a multi-bank DRAM to achieve full data bus utilization. In a multi-bank DRAM, the DRAM is divided into multiple “banks”, which may be viewed as “virtual memories” within the DRAM. Each bank may be accessed individually, and each bank has its own set of sense amps. However, all banks share the same data bus. A block diagram of a sample multi-bank DRAM 400 is shown in FIG. 4. While FIG. 4 shows a DRAM 400 having four banks 404, it should be noted that more or fewer banks may be implemented if so desired. The basic concept behind a multi-bank DRAM 400 is that higher bus utilization may be achieved by “interleaving” or alternating memory requests between the different banks 404. By interleaving the memory requests, it is possible to initiate memory access to one bank (e.g. 404(1)) while another bank (e.g. 404(2)) is busy delivering data onto the data bus. By doing so, the data bus idle time of one bank is used advantageously by the other bank to put data onto the data bus 410. Because all banks 404 are using the same data bus 410, interleaving the memory requests in this way makes it possible to keep the data bus 410 constantly filled, even when different rows are being accessed.
To illustrate how burst mode and interleaving may be used to achieve full data bus utilization, reference will now be made to the timing diagram of FIG. 5. In FIG. 5, a RAS1 signal is used to indicate a RAS signal applied to bank 1 404(1), while a RAS2 signal is used to indicate a RAS signal applied to bank 2 404(2), and so on. Likewise, a CAS1 signal indicates a CAS signal being applied to bank 1 404(1), while a CAS2 signal indicates a CAS signal being applied to bank 2 404(2), and so on.
As shown in FIG. 5, a read operation from bank 1 404(1) of the DRAM 400 is initiated by first sending an asserted RAS signal 502(1) and a row address 508(1) to bank 1 404(1). Then, at a later time, an asserted CAS signal 504(1) and a column address 510(1) are sent to bank 1 404(1). A short time thereafter, data associated with the row address 508(1) and the column address 510(1) are sent onto the data bus 410 (FIG. 4) by the sense amps 408(1) of bank 1 404(1). In the timing diagram shown in FIG. 5, it is assumed that bank 1 404(1) implements a burst mode length of two. Thus, in response to the one column address 510(1), two sets of data 506(1), 506(2) are outputted onto the data bus 410 by bank 1 404(1).
After the RAS signal 502(1) is sent to bank 1 404(1) but before the CAS signal 504(1) is sent to bank 1 404(1), an asserted RAS signal 502(2) and a row address 508(2) are sent to bank 2 404(2) of the DRAM 400. In addition, an asserted CAS signal 504(2) and a column address 510(2) are sent to bank 2 404(2) at a later time. In response to these signals, bank 2 404(2) outputs data associated with the row address 508(2) and the column address 510(2) onto the data bus 410 using the sense amps 408(2). As was the case with bank 1 404(1), bank 2 404(2) also implements a burst mode length of two. As a result, two sets of data 506(3), 506(4) are outputted onto the data bus 410 by bank 2 404(2) in response to the one column address 510(2). These sets of data 506(3), 506(4) immediately follow the sets of data 506(1), 506(2) outputted by bank 1; thus, there is no idle time between the data sets.
After the RAS signal 502(2) is sent to bank 2 404(2) but before the CAS signal 504(2) is sent to bank 2 404(2), an asserted RAS signal 502(3) and a row address 508(3) are sent to bank 3 404(3). A short time thereafter, an asserted CAS signal 504(3) and a column address 510(3) are sent to bank 3 404(3). In response, bank 3 404(3) outputs data associated with row address 508(3) and column address 510(3) onto the data bus 410 using the sense amps 408(3). As was the case with bank 1 and bank 2 404(2), bank 3 404(3) also implements a burst mode length of two. As a result, two sets of data 506(5), 506(6) are outputted onto the data bus 410 by bank 3 404(3) in response to the one column address 510(3). These sets of data 506(5), 506(6) immediately follow the sets of data 506(3), 506(4) outputted by bank 2; thus, there is no idle time between the data sets.
To finish the example, after the RAS signal 502(3) is sent to bank 3 404(3) but before the CAS signal 504(3) is sent to bank 3 404(3), an asserted RAS signal 502(4) and a row address 508(4) are sent to bank 4 404(4). A short time later, an asserted CAS signal 504(4) and a column address 510(4) are sent to bank 4 404(4). In response, bank 4 404(4) outputs data associated with row address 508(4) and column address 510(4) onto the data bus 410 using the sense amps 408(4). As was the case with the other banks, bank 4 404(4) implements a burst mode length of two. As a result, two sets of data 506(7), 506(8) are outputted onto the data bus 410 by bank 4 404(4) in response to the one column address 510(4). These sets of data 506(7), 506(8) immediately follow the sets of data 506(5), 506(6) outputted by bank 3; thus, there is no idle time between the data sets.
While bank 4 404(4) is being accessed, access of bank 1 404(1) may again be initiated with a RAS signal and a row address, as shown, to extract more data from that bank. This process of interleaving accesses between the various banks 404 may continue indefinitely to continually access data from the DRAM 400. By combining burst mode operation with a multi-bank DRAM 400 as shown in this example, it is possible to achieve full data bus utilization.
In practice, a certain number of banks are needed to achieve full data bus utilization in a particular DRAM, where the number of banks needed is determined by certain timing parameters of the DRAM. One relevant timing parameter is the minimum time required between consecutive RAS signals to the same bank. This parameter, denoted herein as Trc, is often referred to as the RAS cycle time. Another relevant parameter is the amount of time it takes to place one base granularity of data onto the data bus. This parameter is denoted herein as Dt. To determine the number of banks needed to achieve full data bus utilization, Trc is divided by n*Dt where n is the burst length. If, for example, Trc is 80 ns and Dt is 10 ns and the DRAM is implementing a burst length of two, then the number of banks needed is 80 ns/20 ns or four. With these timing parameters (which are typical) and four banks, a DRAM can achieve full data bus utilization.
While the combination of burst mode and a multi-bank DRAM 400 makes it possible to achieve 100% data bus utilization in a memory, this implementation does not come without its drawbacks. One significant drawback is that it still relies upon burst mode to achieve full data bus utilization. Because of this reliance, this implementation suffers from the same shortcoming as that experienced in regular burst mode. Namely, it increases the data granularity of the DRAM 400. Notice from the timing diagram of FIG. 5 that instead of outputting just one base granularity of data per access to each bank, the DRAM 400 outputs two (it is two in the example shown in FIG. 5 but it could more than two in other implementations). This increase in data granularity can lead to inefficiency.
As noted previously, in some applications, an external component (such as a CPU) may wish to access only one base granularity of data at a time. In such applications, any data provided after the first base granularity will be dropped. If burst mode is implemented in such an application, at least half of the data provided by the DRAM 400 will be dropped, which means that at most 50% efficiency can be achieved. Thus, even though burst mode combined with a multi-bank DRAM 400 may achieve 100% data bus utilization in such an application, it does not improve overall system efficiency because the extra data is not used. Consequently, the burst mode/multi-bank DRAM combination does not provide a complete solution for all possible applications.
As an alternative to the burst mode/multi-bank DRAM combination, a plurality of separate DRAM's may be implemented to achieve full data bus utilization. By interleaving memory requests between separate DRAM's instead of between separate banks within a single DRAM, it is possible to achieve full data bus utilization without requiring an increase in data granularity. This result comes with extra cost and complexity, however. To illustrate how separate DRAM's may be used to achieve full data bus utilization, reference will be made to FIGS. 6 and 7. Specifically, FIG. 6 shows a block diagram of a sample multi-DRAM implementation, while FIG. 7 shows a timing diagram for several read cycles of the implementation of FIG. 6.
As shown in FIG. 6, the sample implementation comprises a plurality of separate DRAM's 604(1), 604(2) (more than two may be implemented if so desired), with each DRAM 604 having its own separate command lines 608(1), 608(2), and address lines 606(1), 606(2). Both DRAM's share the same data bus 620, and both have 2 banks. In addition to the DRAM's 604(1), 604(2), the implementation further comprises a controller 602 for controlling the interleaving of requests between the DRAM's 604(1), 604(2). It is the responsibility of this controller 602 to manage the interleaving of memory requests such that: (1) the data bus 620 is used as fully as possible; and (2) there is no bus contention on the data bus 620.
To illustrate how the implementation of FIG. 6 can be used to achieve full data bus utilization, reference will be made to the timing diagram of FIG. 7. For clarity purposes, RAS1,1 is used in FIG. 7 to indicate a RAS command applied to DRAM 1 604(1), bank 1, while RAS2,1 is used to indicate a RAS command applied to DRAM 2 604(2), bank 1, and so on. Likewise, CAS1,1 is used to indicate a CAS command being applied to DRAM 1 604(1), bank 1, while CAS2,1 is used to indicate a CAS command being applied to DRAM 2 604(2), bank 1.
As shown in FIG. 7, to extract data from the DRAM's 604(1), 604(2), the controller 602 first initiates a read operation on DRAM 1 604(1), bank 1. This is carried out by sending a RAS command 702(1) and a row address 706(1) to DRAM 1 604(1), bank 1. Then, a CAS command 704(1) and a column address 706(2) are sent to DRAM 1 604(1), bank 1. A short time thereafter, data 720(1) associated with the row address 706(1) and the column address 706(2) are outputted onto the data bus 620 (FIG. 6) by DRAM 1 604(1), bank 1. While the CAS command 704(1) and the column address 706(2) are being sent to DRAM 1 604(1), bank 1, the controller 602 also sends a RAS command 712(1) and a row address 716(1) to DRAM 2 604(2), bank 1. Thereafter, a CAS command 714(1) and a column address 716(2) are sent to DRAM 2 604(2), bank 1. In response to these signals, DRAM 2 604(2), bank 1, outputs data 720(2) associated with the row address 716(1) and the column address 716(2) onto the data bus 620. This set of data 720(2) immediately follows the set of data 720(1) outputted by DRAM 1 604(1), bank 1; thus, there is no idle time between the data sets.
While the CAS signal 714(1) and the column address 716(2) are being sent to DRAM 2 604(2), bank 1, a RAS command 702(2) and row address 706(3) are sent to DRAM 1 604(1), bank 2, to initiate another read cycle. Thereafter, a CAS command 704(2) and a column address 706(4) are sent to DRAM 1 604(1), bank 2. In response, DRAM 1 604(1), bank 2, outputs data 720(3) associated with the row address 706(3) and the column address 706(4) onto the data bus 620. This set of data 720(3) immediately follows the set of data 720(2) outputted by DRAM 2 604(2), bank 1; thus, there is again no idle time between the data sets. To finish the example, while the CAS command 704(2) and the column address 706(4) are being sent to DRAM 1 604(1), bank 2, the controller 602 initiates a read cycle on DRAM 2 604(2), bank 2, by sending a RAS command 712(2) and a row address 716(2) to DRAM 2 604(2), bank 2. Thereafter, a CAS command 714(2) and a column address 716(4) are sent to DRAM 2 604(2), bank 2. In response to these signals, DRAM 2 604(2), bank 2, outputs data 720(4) associated with the row address 716(3) and the column address 716(4) onto the data bus 620. This set of data 720(4) immediately follows the set of data 720(3) outputted by DRAM 1 604(1), bank 2; hence, there is no idle time between the data sets. Additional read operations may be carried out in this manner to continue extracting data from the DRAM's 604(1), 604(2). As this example illustrates, by interleaving memory requests between multiple banks of multiple DRAM's, it is possible to achieve full data bus utilization, and because no burst mode is implemented, data granularity is not increased.
While the multi-DRAM implementation is able to achieve full data bus utilization without increasing data granularity, it does so at a significant cost. First, due to the tight timing constraints, the DRAM's shown in FIG. 6 are very fast and very expensive. With multiple DRAM's being required to implement the system of FIG. 6, the cost of the memory system can be prohibitive. Compared to single DRAM memory systems, the cost of this multi-DRAM system can be several-fold. Also, the multi-DRAM implementation is limited in its application. By its very nature, it can be implemented only in a multi-DRAM environment. In the many applications in which it is desirable to implement just one DRAM, the multi-DRAM implementation cannot be used. In addition, this implementation can add substantial complexity to the memory access process. Because the controller 602 must concurrently control multiple DRAM's, the memory access process is much more complex and difficult to manage than in a single DRAM implementation. Overall, there is a significant price to pay for the functionality provided by the multi-DRAM configuration, and in many implementations, this price is prohibitive. Hence, the multiple DRAM approach does not provide a viable solution for all applications.