Ongoing demands for more-complex circuits have led to significant achievements that have been realized through the fabrication of very large-scale integration of circuits on small areas of silicon wafer. These complex circuits are often designed as functionally-defined blocks that operate on a sequence of data and then pass that data on for further processing.
This communication from such functionally-defined blocks can be passed in small or large amounts of data between individual integrated circuits (or “chips”), within the same chip and between more remotely-located communication circuit arrangements and systems. Regardless of the configuration, the communication typically requires closely-controlled interfaces to insure that data integrity is maintained and that chip-set designs are sensitive to practicable limitations in terms of implementation space and available operating power.
With the increased complexity of circuits, there has been a commensurate demand for increasing the speed at which data is passed between the circuit blocks. Many of these high-speed communication applications can be implemented using parallel data interconnect transmission in which multiple data bits are simultaneously sent across parallel communication paths. Such “parallel bussing” is a well-accepted approach for achieving data transfers at high data rates.
A typical system might include a number of modules (i.e., one or more cooperatively-functioning chips) that interface to and communicate over a parallel data bus, for example, in the form of a cable, other interconnect and/or via an internal bus on a chip. A transmitting module transmits data over the bus synchronously with a clock on the transmitting module. In this manner, the transitions on the parallel signal lines leave the transmitting module in a synchronous relationship to each other and/or to a clock on the transmitting module. At the other end of the parallel data interconnect, the receiving module receives the data on a parallel data bus. In such systems, the received signals (and where applicable, the receive clock) should have a specific phase relationship in order to provide proper data recovery.
Many integrated circuits (ICs) today include more than one clock domain; therefore a data-transmitting module might be operating in one clock domain at a first clock frequency, while a data-receiving module is operating in another clock domain at a different (and perhaps non-synchronous) second clock frequency. The interface between clock domains is a clock domain boundary, or a clock domain crossing where information crosses the boundary. Clock signal path distance is typically limited to confine clock signal “skew” effects within tolerable limits; therefore, a clock domain generally correlates with a compact geographical region of an IC.
Where transmitting and receiving modules reside in different clock domains, the instantaneous rate at which data are transmitted in one clock domain may not match the instantaneous rate at which data are used (i.e., consumed) in another domain. To accommodate data rate differences, a discrete buffering device is conventionally used between the clock domains. Data is clocked into the buffering device according to a source domain (i.e., write) clock, and clocked out of the buffering device according to a receive domain (i.e., read) clock.
Conventional buffering devices require that data, address and write-enable inputs meet certain setup and hold timing requirements with respect to a write clock, requiring the storage element be located near the write clock domain to maintain required timing relationships. At the same time, output (read) data from the conventional buffering device typically becomes valid with some non-zero delay after a read address changes. If a read clock frequency is not extremely low, a read address counter should be located near the multiplexing function inside the buffering device; therefore, the buffering device must simultaneously be located near the read clock domain to ensure data integrity. Typically, the buffering device is physically located very near the clock domain boundary so that neither clock domain is over-extended. Locating the buffering device within one of the clock domains requires extending a clock signal from the other clock domain into the “foreign” clock domain to reach the buffering device. This practice increases skew concerns for the over-extended clock signal.
Skew is a time delay or offset between any two signals. There is often an anticipated amount of time skew between transmitted data signals themselves and between address/data signals and a clock signal at the destination. A skew can be caused by a number of phenomena including, for example, transmission delays introduced by the capacitive and inductive loading of the signal lines of the parallel interconnect, variations in the input/output driver source, intersymbol interference and variations in the transmission lines' impedance and length. Regardless of which phenomena cause the skew, the phenomena present a serious integrity issue for the data being communicated and, in many applications, the overall communication system.
Implementing integrated circuits using a plurality of clock domains is desirable for a variety of reasons. Accordingly, improving data communication over parallel busses between clock domains permit more practicable and higher-speed parallel bussing applications which, in turn, can directly lead to serving the demands for high-speed circuits while maintaining data integrity in the presence of skew-causing phenomena. Various aspects of the present invention address the above-mentioned deficiencies and also provide for communication methods and arrangements that are useful for other applications as well.