The present invention relates to a semiconductor integrated circuit having memory blocks; and, the invention relates more particularly to a technique for improving the throughput of a data read operation invoked in response to a read access request, which is useful for the application to a semiconductor integrated circuit used as a cache memory, including DRAMs mounted along with logic circuits.
A memory hierarchy of a storage device, when viewed in terms of the temporal and spatial locality of an information reference, typically comprises memories of a plurality of levels having different access speeds and capacities. Typically, a main memory is provided in the form of a DRAM (Dynamic Random Access Memory) having a low per-bit cost; and, in a memory level closer to the processor or CPU (Central Processing unit), there is a cache memory comprising a SRAM (Static Random Access Memory), or the like. A cache memory is a memory for holding data that has been temporally or spatially localized for data recently used by the processor to provide an improved throughput that is better than the throughput of a data read action obtained from a lower level memory.
After the completion of the present invention, the inventor of the present invention became aware of the presence of Japanese laid-open patents JP-A-2-297791 and JP-A-6-195261. The descriptions provided in these specifications are directed to a dynamic-type memory (DRAM) and a static-type memory (SRAM) on a single chip semiconductor substrate, and to the use of the DRAM and SRAM together as a cache memory. However, the objects and the configuration thereof are not described in those specifications.
The present inventor considered the possibility of mounting a large number of DRAM modules having a relatively low access speed along with logic circuits, and using the arrangement as a cache memory. The discussion included for example, a semiconductor integrated circuit-mounted with DRAM modules, which can be used as a level 3 (L3) cache memory for a microprocessor to which level 1 (L1) and level 2 (L2) cache memories are built in.
According to the investigation by the present inventor, when an attempt is made to reduce an apparent memory read cycle by mounting a large number of DRAM modules together and making them capable of parallel operation, consideration has to be given to providing some way of preventing competition among data output actions caused by the parallel operation. In such a case, when a data buffer is employed in order to prevent data competition, it has been found that it is inefficient in performing data buffering where there is no data competition.
When the data processing efficiency of a processor is considered, the most significant object would be the improvement in the throughput of read operations invoked in response to a read access by the processor. Here, a read operation of a cache memory may sometimes involve a copy-back operation (or write-back) necessitated by a write access by the processor, and such a read operation would not be required to have a high throughput in most cases. That is so because the copy-back operation is an operation for accommodating data into a main memory for replacing a dirty cache line in the case of a cache miss. Accordingly, it has been found by the present inventor that, when considering the use of the invention as a cache memory, it is necessary to avoid an excessive expansion of the logic scale of the logic circuitry differentially weighting the importance of the improvement in the throughputs of read data according to the purposes of the read data.
For a write access by a processor, there is not much significance in accelerating a write operation which has occurred in response to a write access request; however, when the data processing efficiency of the processor is of concern, it is necessary to allow the processor to be released from the write operation within a short period of time after the reception of the write access request. Especially, in the case of a DRAM, a refreshing action of the stored data is required at every refreshing interval, and the reception of the write access request should not be delayed by such a refreshing action.
An object of the present invention is to provide a semiconductor integrated circuit having a configuration in which data buffers are employed for avoiding data competition caused by the parallel operation of plural memory blocks thereby improving the throughput of read operations.
Another object of the present invention is to provide a semiconductor integrated circuit which can improve the throughput of read operations without entailing excessive expansion in the logic scale of its logic circuitry.
Still another object of the present invention is to provide a semiconductor integrated circuit which can readily accept write access requests regardless of the internal memory operation state.
The above and further objects and novel features of present invention will be more clearly understood by reading the detailed description of the present invention in conjunction with the attached figures.
The following briefly sets forth a summary of representative embodiments of the present invention among those covered herein.
[1] In order to avoid data competition caused by the parallel operation of plural memory blocks, read buffers are employed to improve the throughput of read operations. To this end, a semiconductor integrated circuit has a configuration comprising a plurality of memory blocks (BNK0-BNK7) capable of parallel operation, an external interface means (I/F1) capable of externally inputting write data and externally outputting read data, read buffers (RB0-RB3), each capable of retaining read data read out from a memory block in response to an external output-incapable state in which the read data cannot be externally outputted from the external interface means, and selecting means for selecting either read data read out from a memory block or read data read out from a read buffer and for feeding it to the external interface means, while the external output-incapable state is not present.
According to the above configuration, if a read operation is performed from one of the memory blocks that are capable of parallel operation while read data from another memory block is being externally outputted from the external interface means, this read data would cause a resource competition at the point of its external output, so that it is temporarily stored in a read buffer, and then the read data is enabled for external output from the read buffer after the prior data outputting action terminates. Therefore, even if there is a read access request that would cause resource competition during the read data output operation, a read operation may be started without having the later request wait, and this read data may be externally outputted as soon as the risk of the resource competition is resolved; thus, the throughput of the read data outputting operations may be improved.
If there is no resource competition when data is read out from a memory block, the read data is externally outputted directly from the external interface means without the intervention of a read buffer, so that useless temporary buffering of the data may be avoided when there is no data competition; and, in this point, the present invention contributes to the improvement in the throughput of the read data outputting operations.
A read buffer may be constituted by a memory having a smaller capacity and a higher speed than that of the memory blocks. For example, when the memory blocks are formed by DRAM modules, then the read buffers may be constituted by SRAM modules.
When the above configuration is viewed in terms of control, the semiconductor integrated circuit comprises a plurality of memory blocks (BNK0xcx9cBNK7) capable of parallel operation, read buffers (RB0-RB3) capable of holding read data read out from the aforementioned memory blocks, an external interface means (I/F1) capable of externally outputting the read data outputted from the read buffers or the read data outputted from the memory blocks, and a controlling means (MCNT) to control the read buffers to hold the read data in response to an external-output-incapable state, in which the read data read out from the memory block cannot be externally outputted from the external interface means, and to control either the read data read out from the memory block or the read data read out from the read buffer to be outputted from the external interface means when the above output-incapable state is resolved.
(2) In order to allow the reception of an external write access request regardless of the internal memory operation state, the semiconductor integrated circuit comprises a plurality of memory blocks (BNK0-BNK7) capable of parallel operation, an external interface means (I/F1) capable of externally inputting write data, and write buffers (WB0-WB3) for receiving and holding the write data inputted to the external interface means, and for supplying the write data to the memory blocks after the memory blocks are write-enabled.
During the internal operation of a memory block, such as the refreshing of stored data or a data read operation etc., even if there is a write access request thereto, the write data may be buffered into a write buffer beforehand, so that the processor attempting the write access can be released from the write access operation within a short period of time. Where the data processing efficiency by a processor etc. is concerned, the acceleration of the speed of write processing on the memory side in response to that write access is not so important; however, the above feature contributes to the improvement in the data processing efficiency of the whole system, since it does not hold the write access request by the processor waiting.
A write buffer may be formed by a memory etc. having a smaller capacity and higher speed than that of the memory blocks; and, in a similar manner as the prior case, when the memory blocks are formed by, for example, DRAM modules, then the write buffers may be formed by SRAM modules.
When the above configuration is viewed in terms of control, the semiconductor integrated circuit comprises an external interface means (I/F1) capable of externally inputting write data, write buffers (WB0-WB3) for receiving the write data inputted to the external interface means, a plurality of memory blocks (BNK0-BNK7) to which the write data is supplied from the write buffers, and controlling means (MCNT) to control the write buffer to store the write data supplied to the external interface means in response to an external access request, and to have the write data from the write buffer supplied to a memory block once that target memory block is write-enabled.
[3] A semiconductor integrated circuit having a combined configuration of both read and write buffers comprises a plurality of memory blocks (BNK0-BNK7) capable of parallel operation, an external interface means (I/F1) capable of externally inputting write data and externally outputting read data, write buffers (WB0-WB3) for receiving and holding the write data inputted to the external interface means and for supplying the write data to the respective memory blocks after the memory blocks are write-enabled, read buffers (RB0-RB3) capable of holding read data read out from the memory blocks in response to an external-output-incapable competition state in which the data cannot be externally outputted from the external interface means, and selection means for selecting either read data read out from a memory block or read data read out from a read buffer and for feeding it to the external interface means.
[4] An application as a cache memory connectable to both the lower level and higher level of the memory hierarchy is assumed. In this case, a semiconductor integrated circuit comprises a plurality of memory blocks (BNK0-BNK7) capable of parallel operation, a first external interface means (I/F1) capable of externally inputting write data and externally outputting read data, and a second external interface means (I/F2) capable of externally inputting write data and externally outputting read data. Furthermore, the semiconductor integrated circuit also comprises write buffers (WB0-WB3) for receiving and holding the write data inputted to the first or second external interface means and for supplying the write data to the respective memory blocks after the memory blocks are write-enabled, read buffers (RB0-RB3) for holding the read data to be outputted from the second external interface and the read data to be outputted from the first external interface means, which is in competition so that it cannot be outputted from the first external interface, and selection means for selecting either read data read out from a memory block or the read data read out from the read buffer and for supplying it to the first external interface means when the output-incapable competition state is resolved.
In this configuration, the first external interface means is connected to the higher level of the memory hierarchy, and the second external interface means is connected to the lower level of the memory hierarchy. The basic operations of the read buffers and write buffers in response to a read/write access request by a processor are identical to the prior description. It should be noted that the read data outputted to the lower level of the memory hierarchy via the second external interface means would be provided only through the read buffers. This is because all the outputs of read data to the lower level of the hierarchy are assumed to involve reading operations for copy-back (or write-back) associated with write access by the processor. Since a copy-back operation is an operation to have data stored into the main memory in order to replace a dirty cache line in the case of a cache miss, and since, in many cases, a high throughput is not demanded for such reading actions, data paths that bypass the read buffers and logic circuits thereto for enabling the direct output of read data from the second external interface means are omitted so as to prevent a meaningless expansion of the logic scale of the circuitry.
When the above semiconductor integrated circuit is applidd to a multi-processor system, another processor would be connected to the lower level of the memory hierarchy, so that it is possible that the semiconductor integrated circuit operates also in response to the access by this other processor. In order to allow this, the first and the second external interface means may be made capable of externally inputting access requests and access addresses for the memory blocks individually.
In addition, in consideration of the resource competition when read data is supplied from the lower level to the higher level of the memory hierarchy via the semiconductor integrated circuit, the utility of the semiconductor integrated circuit as a cache memory would be maximized when it further includes a memory buffer capable of receiving and holding the data from the second external interface means, and of externally outputting the data it held from the second external interface means.
[5] When the memory blocks are formed by DRAMs, for example, the minimization of the access time of the DRAMs may be achieved also by a known page mode or static column mode. Moreover, in order to reduce the apparent access time of a memory block constituted by a DRAM, an input of data is parallel-converted and an output of data is serial-converted. That is, the semiconductor integrated circuit includes memory blocks each comprising a memory cell array, a row selection circuit, column selection circuits and, a serial-parallel converter circuit, a write amplifier, a main amplifier, a parallel-serial converter circuit. The memory cell array includes a plurality of memory cells, each including a selection terminal connected to a word line and a data input/output terminal connected to a bit line. The row selection circuit selects a word line specified by a row address signal in synchronization with a clock signal in response to the change in a row address strobe signal. The column selection circuit selects a plurality of bit lines specified by a column address signal in a parallel manner in synchronization with a clock signal in response to a change in a column address strobe signal. The serial-parallel converter circuit converts the write data serially inputted from the write buffer into parallel data in synchronization with the clock signal. The write amplifier outputs in parallel the output of the serial-parallel converter circuit to the plurality of bit lines selected by the column selection circuit. The main amplifier amplifies the parallel data outputted in parallel from the plurality of bit lines selected by the column selection circuit. The parallel-serial converter circuit converts the parallel data supplied from the main amplifier into serial data in synchronization with the clock signal and outputs it to the read buffer and the selection means.
The column address strobe signal, which changes in a cycle period that is n times (n is an integer equal to or greater than 2) the cycle of the clock signal, is inputted to the memory block and, during every cycle in which the column address signal changes, a plurality of serial data that have been read out from the memory cell array and parallel-serial-converted in synchronization with the clock signal are outputted from the memory block; or, the parallel data that have been inputted into the memory block in synchronization with the clock signal and serial-parallel-converted are written into the memory cell array. In this way, by the use of this access specification in which the column address strobe signal is changed once in n-cycles of the clock signal, the acceleration of the operation speed of the memory may be attempted.
The serial data input path for the serial-parallel-converter circuit, and the serial data output path for the parallel-serial converter circuit are preferably separately provided. In a read operation, after the data is read out from the memory cell array in response to the change in the column address strobe signal, serial data is outputted from the memory block after a time period required for the parallel-serial conversion; however, for a write operation, the conversion of the serial data inputted to the memory block into parallel data must be completed in advance prior to the writing of the parallel data into the memory cell array in response to the change in the column-address strobe signal. At this point, when the write operation is instructed sequentially after a read operation, it is likely that the sequential input operation of the serial data into the memory block for the write operation has to be performed in parallel with the output operation of the serial data from the memory block for the read operation. In other words, there is a high probability that the serial data output timing from the memory block and the serial data input timing into the memory block overlap. The separate provision of a serial data input path and a serial data output path for a memory block as previously mentioned makes it possible to prevent the collision of data even when such overlapping of the operations occurs, thus, efficient processing can be achieved.
[6] Where the propagation delay of read data is of concern, the semiconductor integrated circuit may employ the following layout. For example, a center-pad type chip is assumed, in which the bonding pads for signal input/output, or external connection electrodes, such as bump electrodes, are provided at the center region of a chip. In this case, memory blocks are disposed on the opposing sides of the semiconductor chip with a spacing therebetween. Provided between the opposing memory blocks are read buffers capable of holding read data read out from the respective memory blocks and write buffers capable of holding write data to be fed into the respective memory blocks. In the proximity of the read and write buffers, an external interface means is provided. External connection electrodes are provided in proximity to the external interface means. A write buffer receives and holds write data inputted to the external interface means, and when a corresponding memory block is write-enabled, supplies the write data to the memory block. A read buffer is capable of holding read data read out from a corresponding memory block in response to an external-output-incapable state in which the data cannot be externally outputted from the external interface means.