Communication developments in the last decade have demonstrated what seems to be a migration from parallel data input/output (I/O) interface implementations to a preference for serial data I/O interfaces. Some of the motivations for preferring serial I/O over parallel I/O include reduced system costs through reduction in pin count, simplified system designs, and scalability to meet the ever increasing bandwidth requirements of today's communication needs. Serial I/O solutions will most probably be deployed in nearly every electronic product imaginable, including IC-to-IC interfacing, backplane connectivity, and box-to-box communications.
Although the need for increased communication bandwidth continues to drive future designs, support for other communication attributes, such as synchronized, low latency modes of operation remain important as well. Furthermore, achieving bit alignment, otherwise known as comma detection, while maintaining a substantially constant latency is becoming increasingly important as advanced communication protocols requiring deterministic latency, such as the Open Base Station Architecture Initiative (OBSAI) and the Common Public Radio Interface (CPRI), are developed.
High-speed serial input/output (I/O) interfaces that support gigabit data rates often divide data communication into various protocol layers. For example, communication layer protocols at the PHY layer may be divided into the physical media attachment (PMA) layer and the physical coding sublayer (PCS). The PMA layer, for example, provides the core analog interface between the host integrated circuit (IC) and the outside world. The PCS provides various coding and word alignment features that are critical for interfacing with the media access layer (MAC).
In programmable logic devices (PLDs), such as a field programmable gate array (FPGA), the PMA and PCS PHY layers may be implemented within the hard core of the FPGA, whereas the MAC layer and the higher link transaction layers may be implemented within the programmable logic fabric of the FPGA. Clock and data recovery is performed by the PMA on the serial data received, whereby a bit clock having substantially the same frequency as the received serial data is extracted. A byte clock, that is phase coherent with the bit clock, is then generated to propagate multiple bit data words that have been created from the received serial data.
The PCS often incorporates a byte alignment block, or barrel shifter, that is commonly used to modify the data alignment within each multiple bit data word. The PCS may further incorporate a first-in, first-out (FIFO) that functions as an elastic buffer, so that phase and/or frequency differences in the off-chip data rate and the on-chip clocking rate(s) may be mitigated.
For various reasons, each of the PMA, PCS, and programmable logic fabric within an FPGA represents three separate clock domains. Thus, in order to maintain reliable data propagation across each clock domain, care must be taken to maintain a known timing relationship between the propagated data and associated sampling instances. Such a known timing relationship becomes increasingly difficult to maintain in a low latency mode of operation, since the elastic buffer in the PCS layer is bypassed.
One prior art approach of maintaining a known timing relationship across each clock domain in a low latency mode of operation is disclosed in Davidson et al., U.S. patent application Ser. No. 11/040,423 filed Jan. 21, 2005, and is incorporated herein by reference in its entirety. In particular, a master clock signal is generated and is distributed across all clock domains to maintain synchronization. The master clock signal is a byte clock signal that is derived from the incoming serial data stream via a clock and data recovery (CDR) circuit in the PMA layer. Once derived, the byte clock signal is routed to the root of a global clock tree within the programmable logic fabric of the FPGA, where it is then delivered to the PCS, as well as other portions of the programmable logic fabric, via one or more global clock trees.
Data alignment within each data word is achieved without the need for a barrel shifter through the use of a fine phase adjustment implemented within the data path between the PMA and PCS layers. The fine phase adjustment is effected by slipping the data alignment between the PMA and PCS layers in each data word by one or more successive phase offsets of 2 unit intervals (UI), i.e. bit periods, until a satisfactory data alignment is reached.
Taking a 40-bit data word, for example, each phase offset results in a 5% phase change, since 2 bit periods relative to 40 bit periods is equal to 5%. Thus, a 360*0.05=18 degree phase offset is implemented for each 2 UI phase adjustment. In achieving the desired data alignment, however, a 2 UI change in the data latency is inherently created in the data path between the PMA and PCS layers for each phase adjustment. Thus, not only is latency introduced within the data path, but the magnitude of latency created also becomes non-deterministic, or at least non-constant, which is intolerable for certain communication protocols such as OBSAI and CPRI.
Furthermore, as gigabit transceiver designs advance, a 2 UI phase adjustment in the data path could become prohibitive. For example, if the data word length were to be decreased from 40 bits to 10 bits, then a 2 UI phase adjustment would represent a 20% phase offset, i.e., 2 bit periods relative to 10 bit periods. Thus, as the data word length decreases, a corresponding increase in phase sensitivity to the UI data alignment technique is created.
Efforts continue, therefore, to provide a solution which allows a synchronous, low latency mode of operation, while at the same time, provides a data alignment mode that maintains a constant latency, or at least, minimizes changes in latency. Both modes are required to interoperate with one another, while providing known clock and data relationships across all clock domains to insure reliable data propagation throughout the communication system.