1. Field of the Invention
The present invention relates to a data receiver and, more particularly, to a data parallelizing receiver, which reduces a time taken to perform a data error correction process and output a command after the command is input when the number of input bits to be parallelized is increased in a system requiring a high data rate.
2. Description of Related Art
In recent years, in order to increase the transmission rate of signals with an increase in a chip-to-chip operating frequency, a data transmission system has adopted a point-to-point signal transmission method in which whenever data is input or output, the data transmission system forms its own transmission path to prevent other data transmission systems from using the transmission path. Also, in order to reduce the processing speed of internal commands, the data transmission system has employed serialization/parallelization methods.
Furthermore, an operating latency of a chip that outputs a signal in response to a command must be reduced in order to improve the performance of a system. In particular, high-speed data transmission systems are showing a tendency of adopting an error correction process of detecting and examining signal transmission errors when the errors occur during an interface operation.
A typical method of detecting data errors is based on cyclic redundancy check (CRC) calculation. According to the CRC calculation method, a redundancy bit is added to data to be transmitted through a data transmitting terminal to generate a system code, the system code is transmitted, and a CRC calculation is performed through a data receiving terminal in a data packet to detect a data error.
The CRC calculation method is performed using software or hardware. When the CRC calculation is performed using software, many process operations are required, thereby resulting in wasteful use of a microprocessor. For this reason, the CRC calculation using hardware is a trend that has been increasing lately.
The most common CRC calculation method using hardware is to adopt a shift register. Although it is efficient to use the shift register in a system having a low data rate, the use of the shift register has a technical limit for processing data at a high speed of several hundreds of Mbps to several tens of Gbps.
A data transmission system must process clock signals at a speed of several tens of Gbps so that each bit of data may be CRC-processed at a speed of several tens of Gbps as in a conventional method. Accordingly, as a data rate increases, a conventional system requires as an external line driver an expensive high-speed device operating at a speed of several tens of Gbps and processes data in parallel in the previous blocks. However, as the data transmission system processes more data in parallel, designing circuits becomes more difficult.
FIG. 1 is a circuit diagram of a conventional data parallelizing receiver.
Referring to FIG. 1, the conventional data parallelizing receiver includes a clock generator 5, a plurality of input signal receivers 10-1 to 10-6, a command decoder 30, a command queue 40, a multiplexer MUX, a plurality of CRC calculators 50-1 to 50-8, a data error determiner NOR, a delayer 20, and an error command selector 60.
Although the conventional data parallelizing receiver is illustrated as including a plurality of input signal receivers 10-1 to 10-6 and a plurality of CRC calculators 50-1 to 50-8, the data parallelizing receiver may include a single input signal receiver and a single CRC calculator. Therefore, for brevity, only a first input signal receiver 10-1 and a first CRC calculator 50-1 will be chiefly described in detail.
The input signal receiver 10-1 includes a demultiplexer 12-1 and a data parallelizer 14-1. The demultiplexer 12-1 includes a plurality of input buffers B1 to B4, and the data parallelizer 14-1 includes a plurality of flip-flops D1 to D4 and a data framer 16-1. The command decoder 30 includes a command decoding unit 32 and a first flip-flop D6. The CRC calculator 50-1 includes a CRC detector 52-1, which includes a plurality of XOR gates, and a second flip-flop D7-1. The error command selector 60 includes a single AND gate AND and a third flip-flop D9.
For brevity, it is assumed that 6 lanes for data transceiving paths are connected to an external interface, four commands may be fetched at one time, and 9 serial data are externally transmitted as 5 packets a0˜a8 to e0˜e8.
Functions of the respective blocks of the conventional data parallelizing receiver will now be described with reference to FIG. 1.
The clock generator 5 receives a system clock signal sys_ck for controlling operation of the entire data parallelizing receiver and generates a plurality of sampling clock signals s_ck[3:0] and a framing clock signal f_ck. The sampling clock signals s_ck[3:0] are quartered, delayed for predetermined times, and synchronized in order to sample a plurality of serial data, and the framing clock signal f_ck is divided into ninths and synchronized in order to frame a plurality of sampled parallel data.
The demultiplexer 12-1 of the input signal receiver 10-1 receives 9 serial data as five packets a0˜a8 to e0˜e8 through 6 lanes, demultiplexes the received data, delays the demultiplexed data by a predetermined time, and outputs four parallel data through each of the 6 lanes.
The data parallelizer 14-1 of the input signal receiver 10-1 receives the 4 parallel data through each of the 6 lanes. The flip-flops D1 to D4 sample data bits of the received parallel data in synchronization with the sampling clock signals s_ck[3:0] and output a parallel receiving data signal para_D[4:1]. The data framer 16-1 aligns the sampled data bits in packet units in synchronization with the framing clock signal f_ck and outputs a framing data signal frame_D[8:0].
In the command decoder 30, the command decoding unit 32 receives aligned data from the data parallelizer 14-1 and decodes the received data to detect the kinds of commands. Thus, the first flip-flop D6 outputs decoded commands in synchronization with the framing clock signal f_ck.
In the CRC calculator 50-1, the CRC detector 52-1 receives the aligned data from the data parallelizer 14-1, adds a redundancy bit to the received data to generate a system code, and transmits the system code. A CRC calculation is performed in packet units through a data receiving terminal to detect a data error so that the validity of an input command packet is inspected. Thereafter, the second flip-flop D7-1 outputs a CRC calculation data signal CRC[7:0] in synchronization with the framing clock signal f_ck.
In the CRC calculation, a logic XOR is performed on data to be transmitted through a data transmitting terminal, according to a predetermined generator polynomial to transform the data, a CRC code corresponding to the transformed data is written, the CRC code is added to the transformed data to generate final data, and the final data is transmitted. The final data is divided through a data receiving terminal to detect whether the received data has an error depending on the presence or absence of the remainder. Since the CRC calculation is known to one of ordinary skill, a further detailed description thereof will be omitted.
The data error determiner NOR receives plural-bit CRC calculation data signals CRC[7:0] from a plurality of CRC calculators 50-1 to 50-8 and performs a logic NOR on the plural-bit CRC calculation data signals CRC[7:0]. Thus, when a data error is detected and even a single-bit data signal is applied at a high level, the data error determiner NOR outputs an error determination signal /ERR at a low level.
As the bit number of the aligned data output by the data parallelizer 14-1 increases, the number of XOR gates required by CRC detectors 52-1 to 52-8 increases, thereby delaying a predetermined time. Since the delayed time is longer than a delay time taken for the command decoding unit 32 to decode a command, the delayer 20 delays the command by a predetermined time so that a sufficient data setup time and a sufficient data hold-time may be ensured after the command is decoded.
The command queue 40 stores commands output by the command decoder 30 according to functions of execution commands, puts the commands on standby for a predetermined time according to a data transmission protocol, and outputs the commands.
The multiplexer MUX receives the decoded commands from the command decoding unit 32 and the command queue 40 and directly outputs the commands decoded by the command decoding unit 32 or outputs the commands, which are put on standby in the command queue 40 for the predetermined time, in response to a selection signal “sel” according to the data transmission protocol.
In the error command selector 60, the AND gate AND receives the commands, which are directly output from the command decoding unit 32 through the multiplexer MUX or put on standby for the predetermined time in the command queue 40, and the error determination signal /ERR output from the data error determiner NOR, and performs a logic AND on the commands and the error determination signal /ERR, and the third flip-flop D3 outputs an AND result in synchronization with the framing clock signal f_ck. Thus, only when the first CRC calculator 50-1 inputs a valid command packet without a data error, does the error command selector 60 selectively output the AND result to a core logic circuit.
FIG. 2 is a circuit diagram of the plurality of CRC calculators 50-1 to 50-8 and the data error determiner NOR of the conventional data parallelizing receiver shown in FIG. 1.
Referring to FIG. 2, the CRC calculators 50-1 to 50-8 and the data error determiner NOR include multistage XOR gates 1X_11 to 8X_L1, a plurality of flip-flops D7-1 to D7-8, and a NOR gate NOR. In FIG. 2, input signals “fxy” applied to input terminals of first-stage XOR gates 1X_11 to 1X_1a denote y-th-bit data signals of 9-bit framing data signals frame_D[8:0] output through x-th lanes of the 6 lanes, respectively.
In the first CRC calculator 50-1, the first-stage XOR gates 1X_11 to 1X_1a of the multistage XOR gates receive combinations f00 to f57 of input signals obtained by a CRC calculation algorithm out of 54-bit framing merge signals f_merge[53:0]) of the 5 respective packets that are output through the 6 lanes, perform a logic XOR on the combinations f00 to f57, and output XOR results. Second-stage XOR gates 1X_21 to 1X_2b receive the output signals of the first-stage XOR gates 1X_11 to 1X_1a and perform a logic XOR on the received signals again. In this process, an L-stage XOR gate 1X_L1 finally outputs a first CRC calculation data signal CRC[0].
In the above-described method, the multistage XOR gates 1X_11 to 8X_L1 receive output signals of front-stage XOR gates, perform a logic XOR on the received signals, and output second to eighth CRC calculation data signals CRC[7:1].
The flip-flops D7-1 to D7-8 receive respective bits of the CRC calculation data signals from the multistage XOR gates 1X_11 to 8X_L1 and output the received bits of the CRC calculation data signals in synchronization with the framing clock signal f_ck.
The NOR gate NOR receives the CRC calculation data signals CRC[7:0], which are synchronized with the framing clock signal f_ck, from the flip-flops D7-1 to D7-8, performs a logic NOR on the received signals, and finally outputs the error determination signal /ERR.
Operation of the CRC calculators 50-1 to 50-8 and the data error determiner included in the conventional data parallelizing receiver will now be described with reference to FIG. 2.
In a conventional CRC detection process, data is transformed according to predetermined rules through a data transmitting terminal, a CRC code corresponding to the transformed data is written and added to the transformed data to generate final data, and the final data is transmitted. Thereafter, the final data is received and checked through a data receiving terminal to detect data errors.
A specific logic multiplication is performed on the data to be transmitted through the data transmitting terminal to transform the data, and the received data is divided through the data receiving terminal to determine whether the data has an error depending on the presence or absence of the remainder.
In this case, a generator polynomial for writing the CRC code is previously determined. The data to be transmitted is multiplied by the generator polynomial through the data transmitting terminal to transform the data, the transformed data is divided by the generator polynomial again through the data transmitting terminal to obtain a quotient and the remainder corresponding to the CRC code, and the CRC code is added to the transformed data to generate final data to be transmitted. The final data is received and divided by the generator polynomial again through the data receiving terminal. Thus, when the remainder is 0, it is determined that no data error has occurred.
CRC calculation methods using hardware may be classified into a serial CRC calculation method and a parallel CRC calculation method. The serial CRC calculation method is performed by shifting data bit by bit using a plurality of shift registers. By comparison, as shown in FIG. 2, in the parallel CRC calculation method, a plurality of XOR gates receive the combinations f00 to f57 of the input signals obtained by a software algorithm for CRC calculation and make CRC calculation of plural-bit data signals at one time.
In FIG. 2, the parallel CRC calculation method, which is faster than the serial CRC calculation method, is performed. According to a generator polynomial (X8+X5+X3+X2+X+1), a first CRC calculation data signal CRC[0] is f00 ^ f10 ^ f30 ^ f41 . . . f47 ^ f58, and an eighth CRC calculation data signal CRC[7] is f00 ^ f10 ^ f20 ^ f40 . . . f37 ^ f57. Here, a symbol ^ denotes an XOR operator. Since a process of obtaining the CRC calculation data signals CRC[0] and CRC[7] using a software algorithm and calculation values of the remaining CRC calculation data signals CRC[6:2] are known to one of ordinary skill, a further detailed description thereof will be omitted here.
The NOR gate NOR receives the obtained CRC calculation data signals CRC[7:0], performs a logic NOR on the received signals CRC[7:0], and finally outputs the error determination signal /ERR that is CRC[0]# CRC[1]# CRC[2] . . . # CRC[7]. Here, a symbol # denotes a NOR operator.
Accordingly, the conventional CRC calculators 50-1 to 50-8 and the data error determiner NOR receive combinations of input signals, which are obtained by the CRC calculation algorithm out of the 54-bit framing merge signals f_merge[53:0]) of the 5 respective packets that are output through the 6 lanes, in parallel at one time, gradually perform a logic XOR on the received combinations in proportion to the bit number of data, and perform CRC calculations. Thus, when a data error is detected and even a single bit of the 8-bit CRC calculation data signal CRC[7:0] is applied at a high level, the data error determiner NOR outputs the error deter signal /ERR at a low level.
FIG. 3 is a timing diagram illustrating operation of the conventional data parallelizing receiver shown in FIG. 1.
Referring to FIG. 3, the operation of the conventional data parallelizing receiver is related to the input and output of a system clock signal sys_ck, a sampling clock signal s_ck[3:0], a serial input data signal ser_Dx (x is one of 1 to 6), a parallel receiving data signal para_D[4:1], a framing data signal frame_Dx[8:0] (x is one of 1 to 6), a framing clock signal f_ck, a framing merge signal f_merge[53:0], a CRC calculation data signal CRC[7:0], and an error determination signal /ERR.
Similarly, for brevity, it is assumed that 6 lanes for data transceiving paths are connected to an external interface, four commands may be fetched at one time, and each of 9 serial data is externally transmitted as 5 packets a0˜a8 to e0˜e8. Also, it is assumed that the sampling clock signal s_ck[3:0] is generated by quartering the system clock signal sys_ck and the framing clock signal f_ck is generated by dividing the system clock signal sys_ck into ninths.
The system clock signal sys_ck is toggled at periods of a unit interval (UI).
The system clock signal sys_ck is received and quartered to generate the sampling clock signals s_ck[3:0], each of which is delayed by one UI and toggled at periods of 4 UIs.
9 serial data a0˜a8 to e0˜e8 of five packets are sequentially received through the external 6 lanes and loaded on the serial input data signal ser_Dx.
The demultiplexer 12-1 sequentially receives 9 serial data a0˜a8 to e0˜e8 of the 5 packets and demultiplexes each of the received data in a ratio of 1:4. Thereafter, the flip-flops D1 to D4 of the data parallelizer 14-1 sample the demultiplexed data in synchronization with the sampling clock signal s_ck[3:0], fetch the sampled data through 4 data lines at periods of a UI, and load parallel data a0˜a3, a4˜a7, . . . on the parallel receiving data signal para_D[4:1].
The data framer 16-1 of the data parallelizer 14-1 receives the parallel receiving data signal para_D[4:1], sequentially aligns the received data in 18 data lines in packet units at periods of a UI in synchronization with the framing clock signal f_ck, and loads parallel data a0˜a8 to e0˜e8 on the framing data signal frame_Dx[8:0].
The system clock signal sys_ck is received and divided into ninths to generate the framing clock signal f_ck, which is toggled at periods of 9 UIs.
The framing data signals frame_Dx[8:0], which are aligned in packet units, are received through the 6 respective lanes and merged in synchronization with the framing clock signal f_ck so that the framing merge signals f_merge[53:0] are sequentially loaded on a 54-bit data bus.
The CRC calculator 50-1 receives the framing merge signal f_merge[53:0] and makes a predetermined CRC calculation to detect a data error, so that the CRC calculation data signals CRC[7:0] are sequentially loaded on an 8-bit data bus.
The error determination signal /ERR is maintained at a high level until the data error determiner NOR receives the CRC calculation data signals CRC[7:0] and performs a logic NOR on the CRC calculation data signals CRC[7:0]. Thus, when a data error is detected and even a single-bit data signal of the 8-bit CRC calculation data signal CRC[7:0] is applied at a high level, the error determination signal /ERR is disabled to a low level.
Operation of the conventional data parallelizing receiver will now be described with reference to FIGS. 1 through 3.
Similarly, for brevity, it is assumed that 6 lanes for data transceiving paths are connected to an external interface, four commands may be fetched at one time, and each of 9 serial data is externally transmitted as 5 packets a0˜a8 to e0˜e8. Also, it is assumed that the sampling clock signal s_ck[3:0] is generated by quartering the system clock signal sys_ck and the framing clock signal f_ck is generated by dividing the system clock signal sys_ck into ninths.
When the conventional data parallelizing receiver externally and sequentially receives 9 serial data a0˜a8 to e0˜e8 as 5 packets through 6 lanes, the demultiplexer 12-1 receives the 9 serial data a0˜a8 to e0˜e8, demultiplexes each of the 9 serial data a0˜a8 to e0˜e8 in a ratio of 1:4, delays the demultiplexed data by a predetermined time, and outputs 4 parallel data through each of the lanes.
The clock generator 5 receives the system clock signal sys_ck, which is toggled at periods of a UI to control the operation of the data parallelizing receiver, and generates a plurality of sampling clock signals s_ck[3:0] and a framing clock signal f_ck. The system clock signal sys_ck is quartered to generate the sampling clock signals s_ck[3:0], which are delayed by 1 UI at periods of 4 UIs and synchronized in order to sample a plurality of serial data a0˜a8 to e0˜e8. Also, the system clock signal sys_ck is divided into ninths to generate the framing clock signals f_ck, which are synchronized at periods of 9 UIs in order to frame a plurality of sampled parallel data.
The 4 flip-flops D1 to D4 of the data parallelizer 14-1 receive 4 parallel data through each of the lanes, sample bits of the parallel data in synchronization with the sampling clock signal s_ck[3:0], sequentially fetch the sampled data through 4 data lines at periods of a UI, and output pipeline-type parallel receiving data signals para_D[4:1].
Also, the data framer 16-1 of the data parallelizer 14-1 receives the 4 fetched parallel receiving data signals para_D[4:1], sequentially aligns the received data signals in 18 data lines in packet units at periods of a UI, and outputs 9 parallel data through each of the lanes as pipeline-type framing data signals frame_Dx[8:0].
The framing data signals frame_Dx[8:0], which are transmitted through each of the 6 lanes, are merged in synchronization with the framing clock signal f_ck so that framing merge signals f_merge[53:0] are sequentially loaded on the 54-bit data bus.
Each of the respective CRC calculators 50-1 to 50-8 receives the framing merge signals f_merge[53:0] through a plurality of XOR gates of the corresponding one of the CRC detectors 52-1 to 52-8. Thereafter, a logic XOR is performed on the framing merge signals f_merge[53:0] according to a predetermined generator polynomial through the data transmitting terminal to transform data, a CRC code corresponding to the transformed data is written and added to the transformed data to generate final data, and the final data is transmitted. The final data is divided through the data receiving terminal to detect whether the received data has a data error depending on the presence or absence of the remainder. After that, the second flip-flops D7-1 to D7-8 sequentially output the CRC calculation data signals CRC[7:0] to an 8-bit data bus in synchronization with the framing clock signal f_ck.
The data error determiner NOR receives plural-bit CRC calculation data signals CRC[7:0] from the CRC calculators 50-1 to 50-8 and performs a logic NOR on the CRC calculation data signals CRC[7:0]. Thus, when a data error is detected and even a single bit of the CRC calculation data signal CRC[7:0] is applied at a high level, the data error determiner NOR outputs the error determination signal /ERR at a low level.
Meanwhile, the command decoding unit 32 receives a 54-bit framing merge signal f_merge[53:0], decodes the framing merge signal f_merge[53:0] to detect the kinds of input commands, and outputs decoded commands. The first flip-flop D6 receives the decoded commands and outputs the decoded commands in synchronization with the framing clock signal f_ck.
The command queue 40 receives the decoded commands, which are output in synchronization with the framing clock f_ck, temporarily stores the commands according to functions of execution commands, puts the commands on standby for a predetermined time according to a data transmission protocol, and outputs the commands.
The multiplexer MUX receives the decoded commands from the first flip-flop D6 and the command queue 40 and directly outputs the commands decoded by the command decoding unit 32 or outputs the commands, which are put on standby in the command queue 40 for the predetermined time, in response to a selection signal “sel” according to the data transmission protocol. The selection signal “sel” is generated by performing a logic combination of a plurality of control bit signals using the command decoding unit 32.
That is, when the selection signal “sel” is at a low level, the multiplexer MUX directly outputs the commands decoded by the command decoding unit 32, and when the selection signal “sel” is at a high level, the multiplexer MUX outputs the commands, which are put on standby according to the data transmission protocol.
The error command selector 60 receives selected commands from the multiplexer MUX. Thus, only when the data error determiner NOR inputs a valid command packet without a data error, does the third flip-flop D9 selectively output a final output signal to the core logic circuit of the data parallelizing receiver in synchronization with the framing clock f_ck.
Specifically, the AND gate AND directly receives the commands from the command decoding unit 32 through the multiplexer MUX, or receives the commands put on standby for the predetermined time in the command queue 40 and further receives the error determination signal /ERR from the data error determiner NOR. Thus, only when the data error determiner NOR detects no data error and the error determination signal /ERR is applied at a high level, does the AND gate AND perform a logic AND on the commands and the error determination signal /ERR and output an AND result to the third flip-flop D9. Thus, the third flip-flop D3 finally outputs the AND result in synchronization with the framing clock signal f_ck to the core logic circuit of the data parallelizing receiver.
However, as the bit numbers of the framing merge signals f_merge[53:0] applied to a plurality of XOR gates of the CRC detectors 52-1 to 52-8 of the CRC calculators 50-1 to 50-8 increase, the number of XOR gates required by the CRC detectors 52-1 to 52-8 increases, thereby delaying a predetermined time, which is longer than a delay time taken for the command decoding unit 32 to decode a command.
Therefore, it is necessary to ensure a sufficient data setup time and a sufficient data hold-time after the command is decoded. Accordingly, the data parallelizing receiver must further include the delayer 20, which delays the command by a predetermined time directly before the command is decoded. However, the use of the delayer 20 leads to an increase in a command input latency, which becomes more problematic as the number of input bits to be parallelized in a high-speed data transmission system increases.