The invention is generally related to data transmission over source synchronous communications interfaces, e.g., interfaces compatible with the PCI-X standard, among others.
As computers and computer processors increase in performance, data transfer performance becomes a significant bottleneck on overall system performance, as no processor can run at maximum efficiency if that processor is incapable of quickly obtaining the data upon which it is operating. The particular interface-technology utilized to communicate data between a processor and various peripheral devices such as disk drives and other storage components, display adaptors, network adaptors, etc. in particular can be a significant source of such a bottleneck. In addition, in high performance computers such as servers, mainframe computers, midrange computers, such bottlenecks are particularly problematic due to the significant amount of data that needs to be transmitted between components to handle the heavy workloads to which such computers are commonly subjected.
For communicating with peripheral components, a number of interconnect technologies have been developed over the years. One such interconnect technology is the Peripheral Component Interconnect (PCI) standard, which was originally developed to interface peripheral cards mounted in slots to a local processor in a computer, but which has since been refined through various followup standards to support higher bandwidth applications such as in servers and other high performance computers. One such followup standard is the PCI-X standard, which in Version 1.0 implemented the use of split transactions, as well as a higher clock frequency, to increase the overall bandwidth of a PCI bus.
While the PCI-X standard offers substantially greater performance than the original PCI standard, greater performance is still desirable. However, one limitation to future gains in performance with conventional PCI-compatible standards is due to the system synchronous nature of the communications over a PCI-compatible bus. In particular, with system synchronous communications, a common clock is used system-wide on all components resident on a PCI bus for the purpose of synchronizing the transmission of data across the bus. However, due predominantly to the skewing of signals relative to one another as a result of varying propagation delays between components, the maximum potential transmission frequency obtainable through system synchronous communications is inherently limited. Given also that in some high performance environments, the distance between components can be relatively long, and can vary from component to component, signal propagation delays can further limit the maximum transmission frequency in a system synchronous communication configuration.
In other communications standards, such as a number of existing Double Data Rate (DDR) Synchronous Dynamic Random Access Memory (SDRAM) and Accelerated Graphics Port (AGP) standards, source synchronous data communication is utilized in lieu of system synchronous communication. Source synchronous data communication relies on the component that serves as the source of a data signal to also provide a clock signal (often referred to as a data strobe signal) that is used by a target for the data to capture such data as it is being transferred over a data line to the target. By providing both the data and the clock from the same source, it is assumed that the signal propagation delays of the data and clock to a target will be more closely matched, and as a result, higher clock and data transmission frequencies may be used with reduced risk of transmission errors.
In many source synchronous systems, it is desirable to resynchronize received data in a target with a common or system clock signal, particularly where the received data is transmitted at a common clock-multiplied rate such as double data rate (DDR) or quad data rate (QDR). When data is transmitted at a clock-multiplied rate, multiple samples (or sub-phases) of a data signal are taken within a given cycle of the common clock signal (also referred to as a data phase). Thus, for example, if a common clock signal operates at 33 MHz, a DDR source synchronous data signal would have two sub-phases per data phase, providing an effective 66 MHz transmission rate for the data signal. Likewise, a QDR source synchronous data signal would have four sub-phases per data phase, providing an effective 133 MHz transmission rate.
To perform such resynchronization, input staging logic is typically provided in a target or receiver component to temporarily latch received data until such data can be resynchronized with a common clock. In addition, where clock-multiplied source synchronous transmission is used, samples from multiple sub-phases of each data phase of the data signal must be temporarily stored so that all of the samples can be output in parallel at the relatively slower rate of the common clock signal. Thus, with a 33 MHz common clock, two samples of a DDR data signal, and four samples of a QDR data signal, must be output by an input staging register for each cycle of the common clock.
One particular implementation of input staging logic useful in connection with the AGP standard includes two latching stages to resynchronize source synchronous data to the internal common clock of a target device. Each latching stage has a set of staging latches to store the data from all of the sub-phases for one data phase (i.e., two latches per stage for DDR data, and four latches per stage for QDR data). A sequencer circuit, which is clocked by the data strobe signals provided to the input staging logic, and which functions in much the same manner as a circular buffer, sequentially latches data on the data line into each staging latch in each stage of the input staging logic. For each data phase, a multiplexer coupled to the staging latches, and common clock-driven resynchronizing latch coupled to the output of the multiplexer, latch in parallel all of the data stored in the staging latches in one of the latching stages. The multiplexer is controlled during each subsequent data phase to sequentially route the data stored in the various latching stages to the resynchronizing latch.
While the aforementioned two stage input staging logic is suitable for many DDR and QDR source synchronous applications, in some applications the setup time (i.e., the time for data to propagate to a stable state at the input of a latch) for latching the last sub-phase of a data phase into a resynchronizing latch may be difficult to meet, particularly at higher frequencies. As an example, many standards such as various PCI standards require that data be available to a target device within two common clock cycles from when it is driven on a bus by a source device. As such, the use of conventional two stage input staging logic in such situations may be problematic at higher frequencies, requiring either a restriction on clock frequency (and thus, on overall bandwidth), or more stringent requirements on signal path lengths and switching times to adequately meet setup requirements for resynchronization.
Given the ever-present desires of maximizing bandwidth and maintaining design flexibility, any restrictions on clock frequency and/or constraints on circuit designs are generally not preferred. A need therefore exists for an improved manner of latching source synchronous data into a receiver that does not suffer from the aforementioned limitations of conventional two stage input staging logic.
The invention addresses these and other problems associated with the prior art by providing a circuit arrangement, program product and method that in one aspect utilize three stage input staging logic to receive source synchronous data in a source synchronous communications system. By incorporating a third latching stage into input staging logic, critical timing parameters such as the setup time for latching a last sub-phase of a data phase are more easily met, thus improving maximum operational frequency and reducing the criticality of signal path lengths, switching times, and other potential sources of delay.
Consistent with one aspect of the invention, source synchronous input staging logic is configured to receive a source synchronous data signal. The input staging logic includes first, second and third latching stages respectively configured to store a plurality of data values from the source synchronous data signal, and sequencer logic coupled to the first, second and third latching stages and configured to sequentially route data values from the source synchronous data signal to the first, second and third latching stages. In addition, common clock synchronizing circuit is coupled to the input staging logic and is configured to synchronize data from the source synchronous data signal to a common clock signal by sequentially latching data values from the first, second and third latching stages.
The invention also provides in another aspect, a circuit arrangement, program product and method that incorporate at least one holding latch intermediate the output of two stage input staging logic and a common clock synchronizing circuit to effectively increase the hold time of a staging latch in one of the latching stages prior to common clock synchronization. The holding latch is clocked concurrently with at least one other staging latch in the input staging logic that is clocked later in a data phase than the staging latch that feeds the holding latch so that the data clocked into both such staging latches is available for common clock synchronization at roughly the same point in time.
Consistent with this other aspect of the invention, source synchronous input staging logic is configured to receive a source synchronous data signal. The input staging logic includes first and second latching stages respectively configured to store a plurality of data values from first and second data phases of the source synchronous data signal. The first latching stage includes first and second staging register latches, where the second staging register latch is configured to receive data from a later sub-phase of the first data phase than the first staging register latch. Sequencer logic is coupled to the first and second latching stages and is configured to sequentially route data values from the source synchronous data signal to the first and second latching stages at least in part by selectively gating the first and second staging register latches. A common clock synchronizing circuit is coupled to the input staging logic and is configured to synchronize data from the source synchronous data signal to a common clock signal by sequentially latching data values from the first and second latching stages. Moreover, the input staging logic additionally includes at least one supplemental holding latch coupled intermediate an output of the first staging register latch and the common clock synchronizing circuit. The supplemental holding latch is gated by the sequencer logic to latch the output of the first staging register latch concurrently with gating of the second staging register latch.