As high-speed digital (discrete-time) systems come to rely more on distributed processing techniques, the reliable transfer (crossing) of data samples from one clock domain to another clock domain becomes increasingly critical. Multiple clock domains arise in digital systems when differences in sampling clock sources, which are intentional (e.g., different input/output frequencies) or unintentional (e.g., latencies, timing skews, jitter, etc.), cause various processing operations to take place at different rates and/or at uncertain time intervals. The asynchronous data transfers associated with clock domain crossing (CDC), where data samples are generated (i.e., by a transmitting register) in one clock domain and sampled (i.e., by a receiving register) in another clock domain, are subject to errors when a transition from one data sample to the next happens at or near the sampling instant of the receiving register. For reliable transfer to a receiving register, a data sample must maintain a constant (stable) value for a short period of time prior to sampling (i.e., a required setup time must be met), and a data sample must maintain a constant (stable) value for a short period of time subsequent to sampling (i.e., a required hold time must be met). If either a required setup time or a required hold time is violated, the receiving (sampling) register can enter what conventionally is referred to as a metastable state, in which the digital outputs of the register float in an invalid state for a while, before eventually settling into a random state. Metastability is a particular concern is digital processing systems, because not only do transitions between data samples take an increased amount of time (i.e., increased latency), but the transitions result in data samples having an unpredictable value. Certain techniques and apparatuses are conventionally utilized to minimize or prevent metastability errors in asynchronous data transfers between clock domains. These techniques and apparatuses include cascaded flip-flops, handshaking first-in first-out (FIFO) memories, and dual-port FIFO memories. Because the operating frequency of these conventional techniques is limited, however, a need exists for improved methods of clock domain crossing and rate-conversion.
When the various elements of a digital system/network operate at the same sampling rate, but separate clock domains result from uncertainties in sampling instants (e.g., uncertainties in sampling phase such as those caused by latencies, timing skews, jitter, etc.), such a system conventionally is referred to as being mesochronous. Elastic buffers typically are placed at various data transfer locations (i.e., data interfaces) in mesochronous systems, to absorb the differences between an actual (instantaneous) sampling rate and an average sampling rate. FIG. 1 gives a block diagram of a basic elastic buffer (e.g., buffer 10A), which conventionally is utilized to absorb any random, zero-mean fluctuations between the rate of data sample transmission in one clock domain (i.e., a write clock domain established by WCLK on line 3A) and the rate of data sample reception in a second clock domain (i.e., a read clock domain established by RCLK on line 4A). Under the control of the enable lines (e.g., lines 6A-C) from write counter 7A, which is typically implemented as a conventional ring counter, input data samples received on line 1A are sequentially written into registers 5A-C. Then under the control of select line 8 from read counter 7B, output data samples are coupled onto output line 2A via conventional multiplexer 9, which is configured on each read cycle, to select the oldest sample from the register bank created by registers 5A-C. Retrieving the oldest sample from the register bank minimizes the chance of having a simultaneous read and write operation to the same register.
Additionally, the transfer of data samples across two separate clock domains often is synchronized using first-in, first-out (FIFO) memory structures, such as those similar to circuit 10B of FIG. 2. Conventional FIFO circuit 10B uses a dual-port, random-access memory (e.g., dual-port RAM 15) so that one port can be used to write (store) a data sample into a memory location (address), while a second port is used to read (retrieve) a data sample from a different memory location. Write operations occur on intervals defined by output 3B (WCLK) of write clock oscillator 13A, and read operations occur on potentially different intervals defined by output 4B (RCLK) of read clock oscillator 13B. The output of write address counter 12A determines the memory address that is accessed during a write operation, and the output of read address counter 12B determines the memory address that is accessed during a read operation. Conventionally, potential conflicts between write and read access are avoided by creating a lag (offset) between the outputs of write and read address counters, so that write access to a particular memory location occurs in advance of read access to that same memory location (i.e., write operations occur ahead of read operations). Synchronizers like FIFO circuit 10B, however, can experience problems with metastability in plesiochronous systems, where the different elements of the system operate at nearly the same sampling rate, but do not operate at exactly the same sampling rate (i.e., the transfer of data samples between clock domains is almost, but not perfectly synchronous). More specifically, if write operations take place less frequently than read operations, eventually an attempt will be made to read from a location of memory that has yet to be written (i.e., write address counter 12A falls behind read address counter 12B). Conversely, if write operations take place more frequently than read operations, eventually an attempt will be made to write to a location of memory that has yet to be read (i.e., the write address counter overtakes the read address counter). At a sampling instant when either the write address counter or the read address counter is overtaken by the other, metastability errors can result when a write operation and a read operation occur to the same memory location at the same time.
One of the conventional methods used to overcome metastability errors in plesiochronous systems is synchronizer circuit 20A, shown in FIG. 3A. A primary difference between FIFO circuit 10B and synchronizer circuit 20A, is that the FIFO depth is monitored and used to adjust the rate of read access. Via line 15A, depth indicator 19A calculates the difference (i.e., depth) between the value at the output of a write address counter and the value at the output of a read address counter, both of which are internal to FIFO circuit 17 (e.g., internal write and read address counters perform the same function as external write address counter 12A and external read address counter 12B in FIG. 2). Before the depth of FIFO circuit 17 reaches a value of zero, which would indicate that the write and read address counters are overlapping (i.e., the write address location is equal to the read address location), the frequency of tunable oscillator 13C is adjusted via line 15B, to advance the read address counter (e.g., by increasing the frequency on clock line 4C), or to retard the read address counter (e.g., by decreasing the frequency on clock line 4C). Conventionally, the FIFO depth is maintained at a level which can accommodate the expected variations in the phase (i.e., instantaneous frequency) of the write clock on line 3C. As shown in FIG. 3A, synchronizer 20A also includes a clock recovery function (e.g., circuit 18), which extracts a write clock on line 3C from the incoming data samples on line 1C. If the incoming data samples (e.g., the data samples on line 1C) transition to new values on irregular intervals, this clock recovery function can mitigate potential metastability errors from occurring at input register 14A. In conventional systems, where the incoming data samples transition to new values on regular (consistent) intervals, this feature typically is absent.
Another conventional method used to overcome metastability errors in plesiochronous systems is synchronizer circuit 20B, shown in FIG. 3B. Rather than adjusting the frequency of a read clock oscillator, synchronizer 20B uses a technique that is conventionally referred to as pulse stuffing (or bit stuffing), in which data samples are replicated or deleted in a predicable manner to prevent the internal write and read address counters of FIFO 17 from overlapping. More specifically, if write operations take place less frequently than read operations, a particular data sample will be read from memory twice (i.e., data sample repetition occurs when the write address counter starts falling behind the read address counter). Conversely, if write operations take place more frequently than read operations, a particular data sample is skipped over and not read (i.e., data sample deletion occurs when the write address counter starts moving ahead of the read address counter). Before the depth of FIFO 17 reaches a value of zero, which would indicate that the write and read address counters are overlapping, depth indicator 19B in conjunction with pulse modulator 16, adds or deletes pulses from the read cycle clock 4D. When a pulse is deleted from read cycle clock 4D, via control line 15C, the read address counter effectively is retarded relative to the write address counter (i.e., and relative to output data clock 4E), and a data sample is read from the current memory location a second time. When a pulse is added to read cycle clock 4D, via control line 15C, the read address counter is advanced relative to the write address counter (i.e., and relative to output data clock 4E), and a data sample is not read from the current memory location. Conventionally, the FIFO depth is maintained at a level which can accommodate the expected variations in the phase (i.e., instantaneous frequency) of the write clock on line 3C. To avoid altering the content of a data transmission (e.g., altering through the replication or deletion of data samples), special data samples are sometimes inserted into the transmission stream at strategic locations (e.g., locations defined by frame boundaries). At these strategic locations, the special data samples are replicated or deleted as necessary for synchronization, and then are subsequently removed from the transmission stream by processors able to recognize these special data samples.
Since the first use of elastic buffers for clock domain crossing and synchronization (i.e., rate-conversion), advances in the prior art have primarily addressed methods for preventing simultaneous write and read access to the same memory location (i.e., the advances have addressed the issue of buffer metastability). These advances include: 1) U.S. Pat. No. 3,093,815 which was issued in 1963, and discloses a means for monitoring buffer depth to adjust a receive data clock; 2) U.S. Pat. No. 4,002,844 which was issued in 1977, and discloses a means for inserting data samples (i.e., justification bits) to synchronize a system (i.e., a multiplexing system) having multiple clock domains; 3) U.S. Pat. No. 4,172,538 which was issued in 1979, and discloses a means for multiplexing the outputs of write/read address counters to prevent simultaneous reading and writing of the same storage location; 4) U.S. Pat. No. 5,583,894 which was issued in 1996, and discloses a means for dynamically adjusting read/write address counters so that potential overlaps (slips) occur at frame boundaries; and 5) U.S. Pat. No. 7,366,207 which was issued in 2008, and discloses a means for disabling the dynamic adjustment of read/write address counters during periods when data samples transition to new values at uncertain times (i.e., transition during periods of high jitter). The present inventor has not identified any advances in elastic buffering methods and apparatuses have addressed a means for increasing the sampling rates at which rate-conversion buffers are able to operate (e.g., the rate/frequency at which data samples can be transferred across clock domains). More specifically, the conventional multirate processing techniques that have been utilized in linear systems to distribute processing operations across multiple parallel paths apparently have not been adapted for use in clock domain crossing applications.
Conventional methods for multirate processing have been utilized in the implementation of linear circuits, including transversal (i.e., finite-impulse-response) and recursive (i.e., infinite-impulse-response) filter structures. In a conventional multirate system, a processing operation is decomposed (e.g., via polyphase decomposition) into multiple parallel processing paths. In effect, each of the parallel paths operates at a reduced sampling rate (i.e., a sub-rate), and generates the subset of data samples that would be obtained if the complete (i.e., full-rate) set of data samples were to be subsampled at a particular sample-time offset (i.e., at a particular subsampling phase). Thereby, the subset of output samples from each parallel path, represents a different polyphase component of a complete set of data samples. The ratio of the effective sampling rate (i.e., the full-rate associated with the complete set of data samples) to the parallel subsampling rate (i.e., the sub-rate associated with each parallel processing path) conventionally is referred to as the polyphase decomposition factor, and is generally equal to the number of parallel processing paths. More specifically, the operation of a processing function after polyphase decomposition by m is such that: 1) the data samples from the first parallel output correspond to the sub samples taken every mth (full-rate) sample-time period (i.e., subsampling by m), starting with the first sample (i.e., the outputs of the first parallel path are the even subsamples for m=2); and 2) the data samples from the mth parallel output correspond to the subsamples taken every mth (full-rate) sample-time period, starting with the mth sample (i.e., the output of the second parallel path are the odd subsamples for m=2). Because of discontinuities introduced by data sample repetition and deletion, rate-conversion (elastic) buffers do not operate as linear circuits. Therefore, conventional methods for multirate processing and polyphase decomposition cannot be applied directly to improve the operating rates of these buffers.