High-speed data communication links usually exploit input-output (IO) signaling techniques that require a significant “static” power consumption part during operation, independent of the actual data rate. In most cases an important part of this power consumption is associated with the need to drive the line termination in order to obtain reliable signaling behavior across a transmission line. A commonly used driver structure is a differential pair, which is resistively loaded. Other line driving solutions, which provide a termination resistance at one end or at both ends, are possible as well. An example of the latter is the full-bridge driver structure with source-series termination and far-end line termination. This lower-power solution can be found, for example, in “Embedded Low-Cost 1.2 Gb/s Inter-IC Serial Data Link in 0.35 μm CMOS technology”, G. W. den Besten, Proc. IEEE International Solid-State Circuits Conference, pp 251-252, February 2000 and in some Mobile Industry Processor Interface (MIPI) high-speed interface implementations.
The Mobile Industry Processor Interface (MIPI) Alliance is an open membership organization that includes leading companies in the mobile industry that share the objective of defining and promoting open specifications for interfaces in mobile terminals. MIPI Specifications establish standards for hardware and software interfaces between the processors and peripherals typically found in mobile terminal systems. By defining such standards and encouraging their adoption throughout the industry value chain, the MIPI Alliance intends to reduce fragmentation and improve interoperability among system components, benefiting the entire mobile industry. The MIPI Alliance is intended to complement existing standards bodies such as the Open Mobile Alliance and 3GPP, with a focus on microprocessors, peripherals and software interfaces.
The terminated IO signaling techniques, indicated above, have in common that they consume DC power during operation, independent of the actual transmitted data rate (“Pay-per-Time”). This contrasts with (low-speed) CMOS non-terminated IO technology with rail-to-rail swings that only consumes power during signal transitions (“Pay-per-Signal Transition”).
In order to achieve high power-efficiency (Energy/bit), a terminated link should typically be operated at the high-end of the possible transmission speed range allowed by the design. The power consumption of digital circuitry scales with the frequency and the static power consumption per bit will decrease. Therefore, if there is a need for such high-speed IO, there is in many (probably most) cases more bandwidth available than actually needed. This favors burst-mode communication (packets) as it is usually not attractive to run a link at a lower rate, or keep it standby all of the time, especially if the bandwidth requirements are much lower than the available link bandwidth. Between transmission bursts the link can be powered-down to reduce power consumption. However, there is always some overhead time required to start and stop transmission in addition to the time required for the data payload transmission. In order to maintain power efficient operation, even with short data bursts, the start and stop overhead must be as small as possible.
In many cases the detection of the appearance of a data burst in the receiver (RX) is not the main bottleneck. Depending on the stand-by line state, this can, for example, be done with DC line level-detection (MIPI D-PHY), or differential amplitude detection (USB 2.0), periodic polling, activity detection edge detection. [The acronym “D-PHY” is MIPI's name for their serial interface that supports up to four lanes at rates of up to 1 Gbit/sec per lane, based on a 1.2 Volt, source-synchronous scalable low-voltage signaling technology using a nominal swing of 200 mV]. Alternatively, if another mans to communicate exists next to the high-speed (HS) transmission, e.g., slower and/or asynchronous communication via the same link that does not require a lot of stand-by power, a message, command or codeword can be used to identify the start of the data burst. The problem is that, if everything is powered-down, starting up and getting into the “ready-for-transmission/reception-state” (especially the process of locking and synchronizing the clocks) can take a lot of time. The reason for this is that conventional high-speed data communication solutions require that the clocks be stable before (reliable) transmission is possible.
Furthermore, there are two main kinds of clocking solutions for high-speed serial interfaces, referred to as source-synchronous and embedded clock. A big advantage of the source-synchronous solution over the embedded-clock solution is that data and clock (or strobe) signal together contain all necessary information. Frequency may vary over a large range as long as signal integrity is maintained. For conventional embedded-clock solutions the frequency is assumed to be stable during transmission and the data stream itself must include sufficient clock information in order to synchronize the receiver in a reliable way. However, embedded clock solutions can run at higher rates because there is no issue with matching the transmission-path for data and clock/strobe. On the other hand, the embedded clock receiver needs clock and data recovery (CDR), whereas the source-synchronous solution merely requires simple data slicing with the provided clock.
Conventional embedded-clock type solutions can be subdivided in several categories.
A first category relates to the use of a synchronous full-rate or half rate bit clock, or any other lower frequency clock with a fixed and known frequency ratio (e.g., byte or word clock) that is transmitted from the transmitter (TX) to the receiver (RX). It is not kept phase-synchronized with the data. TX and RX share the same clock frequency (or a known and fixed ratio between their clock frequencies), and RX only needs to carry out the phase alignment (and clock multiplication in case a lower frequency fixed-ratio clock is transmitted).
A second category employs a receiver that does not obtain a reference clock signal from the transmitting side, but locks to the embedded clock in the data stream and thus recovers both clock and data information from it. This is possible if the data stream is properly encoded so as to include sufficient clock information. For binary transmission this can, for instance, be achieved with 8B10B codes. A 8B10B code is a line code that maps 8-bit symbols to 10-bit symbols in order to achieve DC-balance and bounded disparity, and yet provide enough state changes so as to allow clock recovery owing to a reduced inter-symbol interference (see, e.g., U.S. Pat. No. 4,486,739). In order to avoid false locking on (sub)harmonics there must be either some locking aid provided, or the data encoding must implicitly provide sufficient frequency information (e.g., Manchester code). For coding efficiency reasons the use of locking aids is preferred in many cases. Locking aids can, for instance, include a local receiver reference clock, which helps to get close to the data rate, and/or a training sequence in the data stream.
A third category has a receiver that does not receive a reference clock signal from the transmitting side, but the transmitter and the receiver each have a local reference clock, whose frequencies are known to be close to one another (e.g., a difference in frequency in the order of a few hundred parts per million), but not exactly equal (i.e., plesiochronous clocks). The receiver clock remains locked to the local reference and data is recovered in the digital domain by over-sampling the data stream. Note that, if the receiver clock signal locks on the local reference before data is transmitted, and then synchronizes to the data stream with a training sequence before actual payload data transmission occurs, the local reference clock functions as locking aid and this is covered under the solution of the second category.
The solutions according to the second and third categories require fewer connections than those of the first category (source-synchronous solutions), as these embedded clock solutions do not need a separate clock signal to be transmitted. However, for solutions of the second category the synchronization becomes more complicated because phase synchronization needs to take place and the receiver must first lock on to the proper frequency before reliable data reception is possible. The solutions of the third category can start-up rather fast using the knowledge that the reference frequencies are very close, provided that the clock signals are operational. However, the solutions of the third category conventionally require availability of nearly equal reference frequencies at both ends. This might not be trivial to implement and may require additional reference (probably crystal) oscillators in the system. The solutions of the first category are less attractive than the solutions of the second and third categories, because more connections are needed and because the costs are higher in terms of IO power.
If start-up time is important and the reference frequency is (usually) much lower than the data rate, the clock multiplier in both the transmitter and the receiver must typically be operating and stable before actual data transmission can take place. Practically this means that in many cases clocks will be kept running most of the time because clock-multiplication solutions—for instance a delay-locked loop (DLL) or a phase-locked loop (PLL) is commonly used for this—cannot start-up and become sufficiently accurate fast enough. Keeping these functions awake while operating at high frequencies may consume considerable power.
Consider a conventional communication system, which is to transmit a burst of data starting-up from a fully powered-down state. First, the transmitter clock generation must be started. Transmission can be started when clock frequency and phase have stabilized. The receiver will stay in a powered-down state until it observes a certain indication that a data burst will arrive soon. This can for example be achieved by any of the methods described earlier in this document. A separate sideband signal is undesirable because of additional wires being needed in that case. After detecting some indication of an upcoming data transmission, the receiver clock generation must be started-up and some time is required to obtain a stable frequency and stable phase for the clock signal. In the time between the start of transmission and the moment that reliable reception is possible, a training sequence needs to be transmitted to synchronize the receiver. Although the start-up time of the transmitter and receiver clock generation procedures may be (partially) overlapping, and additional measures can be taken for faster acquisition, the start-up time in conventional systems will remain relatively long because it is practically bounded by at least a clock start-up, stabilization of both frequency and phase, and the synchronization time of a PLL or DLL. Fast power-down after data transmission has completed is typically not a serious problem.
A low reference-frequency, and therefore a high clock-multiplication factor, is desirable for reasons of power and electromagnetic interference (EMI). This results in a slow synchronization process, because clock multiplication loop bandwidth needs to be even significantly lower for stability reasons. For example, for a clock-multiplier PLL with an input reference clock of, say, 10 MHz, a loop bandwidth smaller than 1 MHz is realistic, which will typically result in a phase settle time larger than 10 μs. Lower reference frequencies and/or lower loop bandwidth for enhanced phase filtering properties further increase start-up time. Frequency acquisition time comes on top of this phase synchronization time. This can easily lead to a start-up time in the order of 10-100 μs or even longer. For example, for a 2 Gb/s data transmission, 100 μs is equivalent to 200,000 bits, which implies that transmission of short data burst becomes highly inefficient (“Pay-per-Time”). Higher reference frequencies can reduce start-up time, but not by orders of magnitude without running into severe problems on power and EMI issues.
The start-up duration problems are strongly correlated with the fact that conventional communication systems implicitly assume an underlying accurate absolute time-base for both transmitter and receiver. Although this results in a system which is easy to understand it is not efficient.