(1) Field of the Invention
The present invention relates to a semiconductor memory device, and more particularly to a reading system used in a high speed semiconductor memory device having a burst mode.
(2) Description of the Related Art
In recent years, speed differences between DRAMs (Dynamic Random Access Memories) used in memories and CPUs have been presenting problems. DRAMs whose performances have not been sufficiently improved cannot meet the requirements of CPUs in which speeds have been dramatically improved (that is, such DRAMs are unable to overcome the speed differences). For this reason, in computer systems using high speed CPUs, although its memory capacity is small as compared with the memory capacity of the main memory, a high speed cache memory is connected either internally or externally of the CPU chip for absorbing the speed difference mentioned above.
The cache memory holds a copy of a part of the data of the data stored in the main memory. The copy of the partial data is in a unit of a plurality of data having consecutive addresses, where the unit is called a "page".
Normally the CPU makes access to a cache memory. In the absence of the desired data in the cache memory, the CPU copies new data from the main memory to the cache memory. In this case, the copy is made on a unit of a page.
Thus, it is required that, for the cache memory, the memory element constituting the main memory of the system has a function of making a high speed input/output of lines of data having consecutive addresses.
In the above case, a method is employed wherein, in the memory, the designation of only a leading address enables the inputting/outputting of the data stream containing such an address in synchronization with the reference clock signal inputted from the outside. This method is called a "burst transfer" method, and the length of the data that can be inputted/outputted by the designation of one address is called a "burst length". A typical example of the memory with which the burst transfer is performed is a synchronous DRAM.
Normally, with the general purpose DRAM (a DRAM having a first page mode), the time interval from the designation of an address to the time when the data is outputted to the outside, that is, the address access time, is on the order of 20 ns (=50 MHz) as being a high speed time.
In the synchronous DRAM, the time required for the processing of one data is basically the same as that for the general purpose DRAM. However, this is speeded up to above 100 MHz by multiplexing the internal processes and internally processing a plurality of data simultaneously whereby the apparent data processing time of one data is shortened. The frequency of the data input/output under such a state, that is, the frequency of the reference clock, is called a "burst transfer frequency".
However, in both the synchronous DRAM and the general purpose DRAM, the address access times are basically the same, that is, for a single data, the internal processing time is basically the same as that for the general purpose DRAM, which means that the time required from the inputting of the command of a request for read-out to the outputting of the data is normally a plurality of reference clock cycles.
The number of reference clocks from the time when the command input of the request for read-out is made to the time when the output data is outputted to the outside is called a "/CAS (CAS (Column Address Strobe) bar) latency" (throughout the specification, the symbol "/" indicates an inversion, and "/CAS" indicates an inverse of CAS and the CAS is at a low active level).
The synchronous DRAM is normally provided with a memory circuit for setting performance conditions which is called a "mode register" and which enables the setting of a condition such as /CAS latency by a predetermined mode register setting command inputted from the outside.
The reason that the /CAS latency is so arranged as to be set from the outside is that, in the case where the burst transfer frequency of the synchronous DRAM, that is, the frequency of the reference clocks, is used after its frequency is lowered because, with the use of the reference clocks of the highest burst transfer frequency of the synchronous DRAM, other circuits or substrate wiring do not operate, it is possible to shorten the time up to the outputting of the first data by lowering the /CAS latency within the extent that the relation between the reference clock period and the /CAS latency satisfies the address access time (In the case where the reference clock frequency is low, there is no necessity of increasing the /CAS latency. Conversely, for increasing the highest burst transfer frequency when the address access time is under the predetermined conditions, it is necessary to increase the /CAS latency).
Examples of prior art multiplex internal processing technologies include a pipeline method and a prefetch method.
FIG. 1 is a timing chart for use in explaining a read operation in a typical conventional pipe line method. FIG. 1 shows timing waveforms in an example wherein the read operation is performed twice with the number of pipe line stages being four, the /CAS latency being four, and the burst length being four.
In the pipeline method, a series of the internal processes is divided into a plurality of stages, and the information relating to one data is processed at each of the stages sequentially in accordance with the reference clocks.
The conventional example shown in FIG. 1 relates to a four stage pipeline method, consisting of a first stage in which an internal column address YADD is generated, a second stage in which a pre-decode column address signal generated is generated by pre-decoding the internal column address YADD, a third stage in which an address data designated by the signal PYADD is read out to a data input/output bus IOBUS, and a fourth stage in which a data on the data input/output bus IOBUS is outputted to outside the chip from a DQ pin, thus the number of the stages totaling to four.
That is, in the clock cycles T1-T2 of the reference clock ICLK (internal clocks generated from external clock signals CLK), the address Aa0 (refer to the inputted address signal ADD) of the first data processed at the first stage for the generation of the internal column address YADD signals, is processed at the second stage during the next cycle T2-T3 of the reference clock ICLK. Simultaneously therewith, the address Aa1 of the second data is processed at the second stage. In each of the stages, the processes are simultaneously carried out so that data corresponding to the number of stages are subjected to the parallel processes.
Since all stages are being controlled by respective reference clocks ICLK, there is no possibility for information relating to a plurality of data to be present simultaneously in one stage so that, as a result, each data is outputted in synchronization with the reference clock ICLK without involving any internal collisions.
FIG. 2 is a timing chart for use in explaining a read operation in a conventional prefetch method. FIG. 2 shows timing waveforms in an example wherein the read operation is performed twice with the number of parallel processes (prefetch number) being two, the /CAS latency being three, and the burst length being four.
In the prefetch method, the internal processes are carried out in parallel, the data are prefetched by inputs/outs, and a parallel-to-serial data conversion is performed. That is, a plurality of paths for the internal processing of the data are provided, and the same process is carried out essentially at the same time for a plurality of data. However, since the data cannot be outputted simultaneously, a plurality of data simultaneously processed are first subjected to a parallel-to-serial conversion, and the resulting serial data are sequentially outputted in accordance with the reference clocks.
That is, for outputting the data after the conversion, the same number of the reference clocks as the number of the parallel processes before the conversion is required. Thus, where the processing before the parallel-to-serial conversion is carried out with the same number of the reference clocks as the number of parallel processes, it is possible to output the data uninterruptedly.
As seen in FIG. 2, with the reference clocks as being two cycles of T1-T3, the reading of the data out to the data input/out bus IOBUS is performed from the take-in of the external address ADD. At this time, the data read-out is of two bits of Da0 and Da1, out of which the data Da0 is outputted to the outside during a cycle of T3-T4 of the reference clocks, and the data Da1 is outputted during a cycle of T4-T5.
As explained above, the pipeline method and the prefetch method are methods for increasing the maximum performance frequency of a burst transfer frequency in a memory having a burst performance function such as a synchronous DRAM.
In the pipeline method, in order to enhance the maximum burst transfer frequency, the number of stages of the pipeline is increased, and the processing time in each stage is reduced thus allowing to increase the degree of parallel processes. However, because of the relationship with the internal processing in the DRAM, the places at which a given stage may be separated are limited. Also, it is necessary for the minimum reference clock cycle to be timed to the stage which requires the maximum time. Further, since the overhead of the circuit interconnecting the stages increases, the number of stages in practice is limited to three to four.
Also, for enhancing the maximum burst transfer frequency in the prefetch method, the number of data to be processed in parallel is increased. For this reason, the same number of identical circuits as the number of parallel processes are required, and this means that the circuit scale becomes large and, for realizing this, the chip area also becomes large.
In the prefetch method, the data must be outputted by the parallel processes as units so that any data less than one parallel process cannot be outputted.
For the above reasons, an increase in the degree of the parallel processes leads to the lowering of the functional freedom, and a computer system utilizing this suffers from the lowering of performance characteristics. Thus, the number of the parallel processes, that is, the multiplicity of the parallel processes, is limited to about two.
In the two methods explained above, the data processing can be speeded up by increasing the multiplicity of the data. However, there are limits to the degree of the multiplicity in each of the two methods. That is, the limits are in the maximum burst transfer frequency.