1. Field of the Invention
The present invention is generally directed to transceiver applications involving transmitting and receiving parallel data. More specifically, the present invention is directed to receiving and regenerating parallel data words which have been broken into smaller data words and serially transmitted over multiple channels.
2. Background
In many transceiver applications, large parallel data words such as sixty-four bit words are broken up on the transmitting chip into smaller parallel data words such as eight bit words. As illustrated in FIG. 1, the smaller parallel data words are then serially transmitted over multiple channels at a higher speed. Thus, the number of output pins on the transmit chip and input pins on the receive chip is reduced, for example, from sixty-four to eight. On the receiving chip, the high speed serial data is then appropriately deserialized to regenerate the original sixty-four bit word.
Although smaller parallel data words are being transmitted over multiple channels, on a system level the transceiver should function as a single-channel, sixty-four bit transceiver. To achieve this functionality, receivers in each of the higher speed channels recover clock based on the incoming serial data and generate eight bit parallel data words timed to a channel word clock. The word clock is a division of the recovered clock. The eight bit data words from each of the channels are then transferred from their own channel word clock domain to a single receiver clock domain to form the originally transmitted sixty-four bit data word. However, since the eight bit parallel data words travel on separate channels, the skew between these channels can create problems. The first problem involves aligning the eight bit parallel data words across parallel data channels so they are properly regrouped with other eight bit parallel words, and the second problem involves framing the serial data in each separate channel into eight bit parallel data words.
The first problem encountered in regenerating parallel data words from multiple channels is word alignment across parallel data channels, which is generally illustrated by the timing diagram in FIG. 2. The timing diagram shows the problem of aligning data on the same clock edge across multiple channels using two channels as an example. A transmitter, breaks up a sixteen bit word into eight bit words A0-A7 and A8-A15 which are serialized and transmitted over channel one and channel two respectively. The receiver is supposed to regenerate the sixteen bit word on its own receiver clock. However, the channel one word clock and the channel two word clock on the two receive channels may not be exactly in phase, and the skew between the two channels may cause the wrong bits from channel two to be re-timed and grouped with the wrong bits from channel one. The timing diagram of FIG. 2 shows how the misalignment of received data and the out-of-phase channel word clocks between receive channels can cause an incorrect regrouping of the eight bit words from each channel. It is apparent from the diagram that in re-timing the eight bit word outputs from each channel, the re-timing edge of the receiver clock has missed the correct group of channel two data is bits, A8-A15, and has instead regenerated a sixteen bit word containing correct channel one data but erroneous channel two data.
Prior methods for solving the problem of aligning data words across multiple serial data channels include the use of slave channel architecture as illustrated by the block diagram of FIG. 3. The block diagram of FIG. 3 depicts a typical receive deserializer circuit using two serial data channels as an example. Briefly, in a typical single channel receive deserializer circuit as represented by the master channel 300 of FIG. 3, a sampling flip flop 302 receives serial data and samples it with the rising edge of the recovered clock. The recovered clock runs at the data rate frequency and is aligned to the serial data transition edge by a clock recovery module 304 so that all the sampling edges are in the middle of the data windows. The recovered clock is the source for eight phase clocks generated by a clock generator 306. The serial data is sampled by the eight phases to generate eight bits. The eight bits are finally re-timed on one of the phases, channel word clock to form a parallel data word.
In the multi-channel receive deserializer circuit which uses slave channel architecture to align data words across channels, as illustrated in the block diagram of FIG. 3, a single channel is chosen as the master channel 300. The master channel 300 performs clock recovery 304 using a local clock and serial data it receives. The recovered clock from the master channel 300 is then also used by the receivers in all the slave channels 310 to sample serial data input to each channel, thereby properly aligning all the sampled serial data across channels on the same clock edge.
However, the use of slave channel architecture to solve the problem of aligning data words across multiple serial data channels has limitations which often necessitate implementing rigorous and costly design standards when designing and fabricating these circuits. Using slave channel architecture requires that the skew between the serial data inputs across the channels be tightly controlled. The timing diagram of FIG. 4 illustrates the significant problem encountered when using the slave channel architecture of FIG. 3. Since the use of slave channel architecture employs just one clock recovery module 304 in a master channel 300 and uses the recovered clock to deserialize data in all the channels, any skew between serial data received in a slave channel 310 and serial data received in the master channel 300 directly reduces the setup/hold margin available at the slave channel 310 sampling flip flop 312. Rxd1 of FIG. 4 represents serial data input to the master channel 300 of FIG. 3. The clock recovery module 304 generates recovered clock by aligning the negative edge of the local clock with the data transition edge in order to ensure that a sufficient setup/hold margin exists at the master channel 300 sampling flip flop 302 when retiming the data with the positive edge of the recovered clock. However, as shown in the timing diagram of FIG. 4, the skew between the rxd2 serial data from the slave channel 310, and the rxd1 serial data from the master channel 300, reduces the net setup/hold margin at the slave channel 310 sampling flip flop 312 by the amount of skew. If sufficient skew exists between the master channel 300 and any slave channel 310, the result can be not enough setup/hold margin in the slave channel 310 to permit the sampling flip flop 312 to re-time the rxd2 serial data to the recovered clock from the master channel 300.
Once the serial data across the parallel data channels is aligned to the same clock, the second problem of framing the data into the proper parallel words in each channel must also be solved. When framing serial data into an eight bit parallel word in a single channel, a simple receiver deserializer demultiplexes sampled serial data and regenerates the eight bit parallel data words sent by a transmitter. However, the receiver has no information as to which bit of the eight bit parallel word is bit zero, the least significant bit (LSB), or which bit is bit seven, the most significant bit (MSB). Thus, information regarding the boundary of the eight bit parallel word has been lost in its transmission. The result is incorrectly framed parallel data words at the receiver which contain some bits belonging to the previous eight bit word or which contain some bits belonging to the next eight bit word.
Referring again to FIG. 3, the block diagram further illustrates a commonly used method for solving the data framing problem which will be discussed with reference to the master channel 300 only, as a single channel example. This prior method includes the use of additional storage elements to store the last received eight bit word in order to create a new sixteen bit word from the last word and the current word. The received serial data initially includes a predefined, eight bit data reference pattern and is retimed and aligned on the positive edge of the recovered clock by the master channel sampling flip flop 302. The demultiplexer 308 deserializes the retimed serial data into an eight bit parallel data word using a channel word clock from the clock generator 306. An array of eight storage flip flops 314 stores or effectively delays the eight bit word, which is then combined with the next or current eight bit word coming from the demultiplexer 308. A comparator 316 searches through the new sixteen bit word for the received reference pattern using its own preset reference pattern and identifies the location of the received reference pattern within the sixteen bit word to a sixteen-to-eight multiplexer 318. The sixteen-to-eight multiplexer 318 then selects these bits as the correct eight bits to be framed on the channel word clock and output as received data.
The timing diagram of FIG. 5 further illustrates this prior method for framing parallel data as implemented by the single master channel 300 of FIG. 3. Where bits B.sub.7 -B.sub.0 represent a received predefined reference pattern, the diagram indicates the combination of current and last data which forms a sixteen bit data word containing this received reference pattern. The bit locations of the reference pattern within the sixteen bit data word are found through multiple comparisons made by the comparator 316 of FIG. 3. Once located by the comparator 316, these bit locations are selected by the multiplexer 318 as containing the correct eight bits to be framed on the channel word clock and output as received data.
This and other prior methods of framing parallel data in single data channels can present significant costs in time and materials. For example, it is apparent from the timing diagram of FIG. 5 that framing parallel data by the method presented in FIG. 3, introduces unwanted latency. There is a necessary penalty of one word clock associated with this prior method of forming a sixteen bit data word to locate the predefined eight bit reference pattern.
Additionally, increased scrutiny of the block diagram circuit of FIG. 3 indicates the complexity of the circuitry required to implement the prior framing method, as illustrated in FIG. 6. A circuit framework for the multiplexer 318 and comparator 316 blocks of FIG. 3 is presented in FIG. 6. Though not intended as a complete representation of these circuit blocks, the depiction in FIG. 6 shows the significant hardware required to implement the comparator 316 block of FIG. 3. Nine different sets of bit locations exist within the sixteen bit word where the predefined eight bit reference pattern might be encountered. For example, the predefined eight bit reference pattern could be located in bit locations 0-7, 1-8, 2-9, 3-10, 4-11, 5-12, 6-13, 7-14, or 8-15. It is therefore necessary to dedicate nine sets of eight comparators each, typically operational amplifiers or logic gates, to search these locations in order that the multiplexer 318 can select the correct location for framing the parallel data.
The disadvantages apparent in this and other prior methods of regenerating parallel data words from multiple channels therefore include the costs related to both aligning data across multiple data channels and framing the data within each channel. The use of prior slave channel architecture to solve the problem of aligning data words across multiple serial data channels requires that the skew between the serial data inputs across the channels be tightly controlled which necessitates the use of rigorous and costly design standards when designing and fabricating circuits. The use of the prior methods of framing parallel data create costs which include requirements for additional data storage elements, complex comparator and multiplexer circuits, and unwanted latency inherent to these methods.
Accordingly, there exists a need for an efficient, simple and low latency method for regenerating parallel data words in a deserializer circuit which have been broken up and serially transmitted across multiple data channels.