This application relies for priority on Korean Application No. 2002-18806, filed Apr. 6, 2002, the contents of which are incorporated herein in their entirety by reference.
The present invention relates in general to semiconductor devices and, in particular, to data output circuits and methods in high-speed synchronous semiconductor devices.
Recent high-speed graphic memories require a superhigh operation speed of about 500 MHz. Accordingly, in accessing memories, the recent trend has been towards adopting a Column Address Strobe (CAS) latency of 7 and a 4-bit pre-fetch technique instead of the existing 2-bit pre-fetch technique. CAS latency is defined as the number of clock cycle intervals from a read command or column address to data output. The data output appears in the number of clock cycles equal to the CAS latency after the issue of the read command. For the convenience of explanation, a CAS latency of n (n is a natural number greater than or equal to 1) is indicated as CLn.
Double Data Rate (DDR) memories for inputting and outputting two data groups during one clock cycle are widely in use to achieve high-speed data input and output. DDR memories process data on both the rising and falling edges of clock signals. 4-bit pre-fetching in DDR memories represents simultaneous preparation of 4 bits, which means the number of activated column select lines (CSLs) is doubled, and a CSL activation period is 2 clock cycles (tCK), where tCK is used to indicate units of a clock cycle.
In general, in 4-bit pre-fetch memories, a data output pin outputs four data groups during two clock cycles, i.e., 2 tCK. In the 4-bit pre-fetching systems, CSLs are active during two clock cycles, so a read command can be applied every two clock cycles. Thus, the minimum time interval (tCCD) between read commands is 2 tCK.
High-speed memories usually adopt a wave-pipeline system to achieve long CAS latency of about CL7. Typically, sixteen latches per data output pin are required to enable a CL7 system to operate properly even at a low frequency and to accomplish a 4-bit pre-fetching system. The number of latches for each data output pin per bit is calculated by dividing the maximum CAS latency by the minimum time interval between read commands (tCCD), i.e., from the formula xe2x80x98maximum CAS latency/tCCDxe2x80x99. If the maximum CAS latency is CL7 and tCCD is 2 tCK, 3.5 (CL7/2) latches are required. Since half a latch cannot be formed, four latches are required per bit. In 4-bit pre-fetching memories, each data output pin outputs four-bit data in response to a single read command, thus requiring a total of 16 latches per data output pin.
FIG. 1 is a circuit diagram of a conventional data output circuit 100 used in semiconductor devices. The data output circuit 100 uses the wave-pipeline system for realizing CAS latency of 7 (CL7), tCCD of 2 tCK and the 4-bit pre-fetching technique. The conventional data output circuit 100 includes a total of 16 latches 111 through 118 and 121 through 128. FIG. 1 shows bit line sense amplifiers B/L S/A, data sense amplifiers Data S/A and a burst data ordering unit 200, connected to the data output circuit 100. The data stored in a memory cell is carried on a bit line (not shown) when a word line (not shown) is active. The data is sensed and amplified by a bit line sense amplifier (B/L S/A). Data on an activated column select line CSLj (where j is a natural number from 1 to 4) among the data sensed by a bit line sense amplifier B/L S/A is transmitted to a data sense amplifier (Data S/A) and is amplified by the data sense amplifier (Data S/A). Since the data output circuit 100 adopts the 4-bit pre-fetching system, four CSLs are activated at the same time in response to a single read command. The data of the bit line sense amplifiers (B/L S/A) corresponding to the four activated column select lines CSL1, CSL2, CSL3 and CSL4 are amplified by data sense amplifiers (Data S/A) and ordered properly by the burst data ordering unit 200 and simultaneously output to respective four latches out of the latches 111 through 118 and 121 through 128 in the data output circuit 100.
The conventional data output circuit 100 of FIG. 1 adopts a 2-stage multiplexing scheme to multiplex the data output from the latches 111 to 118 and 121 through 128. That is, in the first stage 130, odd data and even data are multiplexed separately. Thereafter, two groups of data obtained after the multiplexing in the first stage are multiplexed in the second stage 140. Odd data denotes data output in association with the rising edge of a clock signal, and even data denotes data output in association with the falling edge of a clock signal.
According to the above-described 2-stage multiplexing of data, the number of junctions for each of multiplexing nodes DOFi and DOSi is reduced from 16 to 8 in the first stage 130. Compared to multiplexing of the outputs of 16 latches at one stage, the 2-stage data multiplexing as shown in FIG. 1 reduces the load on the multiplexing nodes DOFi and DOSi. However, the load on each of the multiplexing nodes DOFi and DOSi is still large, which results in a limit in bandwidth.
FIG. 2 is a data output timing diagram of the conventional data output circuit 100 of FIG. 1. The operation of the conventional data output circuit 100 will now be described with reference to FIGS. 1 and 2.
Four data bits SDIOF1, SDIOF2, SDIOS1 and SDIOS2 are simultaneously output from the burst data ordering unit 200 and sequentially received by their corresponding bit latches. The first data bit SDIOF1 is sequentially fed into the first to fourth latches 111 to 114 one latch at a time, the second data bit SDIOF2 is sequentially fed into the fifth to eighth latches 115 to 118 one latch at a time, the third data bit SDIOS1 is sequentially fed into the ninth to twelfth latches 121 to 124 one latch at a time, and the fourth data bit SDIOS2 is sequentially fed into the thirteenth to sixteenth latches 125 to 128 one latch at a time.
At this time, input control signals DLj (j is a natural number from 1 to 4) control the input of the first to fourth data bits SDIOF1, SDIOF2, SDIOS1 and SDIOS2 to the latches. Multiplexing control signals CDQFj and CDQSj (j is a natural number in the range of 1 to 8) determine the latch in which data is to be output to the odd multiplexing node DOFi and the even multiplexing node DOSi.
The data of the first to eighth latches 111 to 118 are output to the odd multiplexing node DOFi when their corresponding multiplexing control signals CDQFj are active. The data of the ninth to sixteenth latches 121 to 128 are output to the even multiplexing node DOSi when their corresponding multiplexing control signals CDQSj are active. The data in the odd multiplexing node DOFi and the data in the even multiplexing node DOSi are multiplexed to output data DOUT in response to an odd clock signal CLKDQF and an even clock signal CLKDQS, respectively.
Referring to FIG. 2, as the four multiplexing control signals CDQF1, CDQS1, CDQF2 and CDQS2 are sequentially activated, the data of the first latch 111 is output to the odd multiplexing node DOFi, then the data of the ninth latch 121 is output to the even multiplexing node DOSi, then the data of the fifth latch 115 is output to the odd multiplexing node DOFi, and then the data of the thirteenth latch 125 is output to the even multiplexing node DOSi. The data in the odd multiplexing node DOFi is multiplexed to output data DOUT in response to the odd clock signal CLKDQF, and the data in the even multiplexing node DOSi is multiplexed to output data DOUT in response to the even clock signal CLKDQS. Thus, 4-bit data is continuously output via each data output pin over two cycles of a clock signal CLK.
In the conventional data output circuit 100 as described above, the outputs of 8 latches 111 to 118 for odd data are multiplexed to one node DOFi, and the outputs of 8 latches 121 to 128 for even data are multiplexed to one node DOSi. Accordingly, a heavy load is put on each of the nodes DOFi and DOSi, thereby causing a limit in bandwidth. This large load on each of the nodes DOFi and DOSi lengthens the time during which data appears from latches to the nodes DOFi and DOSi.
Connection of eight junctions to each node degrades the developing speed for detecting data. This degradation of the data developing speed lengthens the period of time tDF from when data appears on the node DOFi shown in FIG. 2 to a rising edge of the clock signal CLKDQF and the period of time tDS from when data appears on the node DOSi shown in FIG. 2 to a rising edge of the clock signal CLKDQS.
Therefore, a large load on multiplexing nodes is a factor in delaying the data access time expressed as tAA, which denotes the period of time from a clock when a read command is applied to when output data appears on an output data pad, that is, the access time from the moment when a read command with a column address is applied to when output data is output on an output data pad.
To solve the above-described problems, it is an object of the present invention to provide a data output circuit in synchronous semiconductor devices that can improve the frequency characteristics and the data access time (tAA) by reducing the load on the internal nodes in a synchronous semiconductor device.
Another object of the present invention is to provide a data output method for synchronous semiconductor devices that can improve the frequency characteristics and the data access time (tAA) by reducing the load on the output nodes in a synchronous semiconductor device.
In one aspect, the invention is directed to a data output circuit and method in a synchronous semiconductor device for providing a set of data bits as an output. The data output circuit includes a first-stage latch unit for receiving a first bit of the data bits in response to a first control signal, a second-stage latch unit for receiving a second bit of the data bits in response to the first control signal, and a buffering latch unit interposed between the first-stage latch unit and the second-stage latch unit. The buffering latch unit receives the second bit from the second-stage latch unit and forwards the second bit to the first-stage latch unit in response to a second control signal.
In one embodiment, the synchronous semiconductor device is a wave pipelined operating device. The first-stage latch unit can receive the first bit of the data bits and the second-stage latch unit can receive the second bit of the data bits simultaneously. In one embodiment, the first control signal is enabled before the second control signal is enabled, and then the first control signal is disabled in response to the second control signal.
In one embodiment, the first-stage latch unit comprises a plurality of latches for receiving a plurality of the data bits. The second-stage latch unit can also comprise a plurality of latches for receiving a plurality of the data bits.
In one embodiment, the first-stage latch unit is coupled to an output node. The first-stage latch unit forwards the first bit of the data bits to the output node. The data output unit can also include a plurality of switches between the first-stage latch unit and the output node for switching the data bits to the output node.
The data output unit can also include a first plurality of switches between the buffering latch unit and the first-stage latch unit for enabling data to be forwarded from the buffering latch unit to the first-stage latch unit. A second plurality of switches can also be included between the second-stage latch unit and the buffering latch unit to enable data to be forwarded from the second-stage latch unit to the buffering latch unit. The switches can be controlled by the second control signal.