According to IBM, “Geographically-Dispersed Parallel Sysplex (GDPS) is an integrated, automated application and data availability solution designed to provide the capability to manage . . . remote copy configuration and storage subsystem(s), automate Parallel Sysplex operational tasks, and perform failure recovery from a single point of control, thereby helping to improve application availability.” (“GDPS: The e-business Availability Solution,” Noshir Dhondy, et al., IBM Executive Summary, March 2005). GDPS supports both synchronous and asynchronous forms of remote copy.
GDPS is a disaster recovery manager for IBM computers. GDPS monitors all the Logical Partitions (LPARs) in the Sysplex, both operating systems (OSS) and Coupling Facilities, it also monitors the Direct access storage device (DASD) whether Peer to Peer Remote Copy (PPRC) or Extended Remote Copy (XRC) is being used to mirror the data. A Systems Complex, commonly called a Sysplex, is one or more IBM System/390 processors joined into a single unit. Put another way, a Sysplex is an instance of a computer system running on one or more physical computers. Sysplexes are often isolated to a single system, but Parallel Sysplex technology allows multiple mainframes to act as one. Sysplexes can be broken down into LPARs each running a different operating system. Components of a Sysplex include: a Sysplex Timer which synchronizes all member systems' clocks; Global Resource Serialization (GRS), which allows multiple systems to access the same resources concurrently, serializing where necessary to ensure exclusive access; and Cross System Coupling Facility (XCF), which allows systems to communicate peer to peer.
GDPS may be used in, for example, geographically-redundant server applications, especially those involving transaction processing, etc. A typical application includes geographically diverse data centers. In such server applications, it is crucial that both primary and backup servers reflect real time information, especially in synchronous mode. In other words, transactions may be time stamped, and these time stamps must be consistent between primary and backup servers, for a variety of application-specific reasons. The geographically diverse servers are typically connected together through optical fiber, and the sites can be diverse up to 200 km distance. Of particular importance is the time information between the primary and backup servers between the geographically diverse locations. The primary and backup servers include a timer, such as the Sysplex Timer with regards to GDPS, and the timer is configured to synchronize the clocks on both servers to ensure consistency. In regards to geographically diverse locations, a timing reference must be used to provide synchronization.
Previously with regards to GDPS, an external time reference (ETR) link was used (i.e. a clock distributor/timer box was associated with each server complex) to sync primary and backup servers. Sync accuracy, however, was limited. For example, the IBM 9037 Sysplex Timer is a mandatory component of GDPS/PPRC. The Sysplex Timer provides an ETR to synchronize the time of day (TOD) clocks on attached servers in a GDPS/PPRC environment. The 9037 Sysplex Timer uses two link types: an ETR and a Control Link Oscillator (CLO). ETR links are connections between the Sysplex Timer and the server ETR ports providing clock synchronization between multiple servers. CLO links are connections between two Sysplex Timer units in High Availability mode allowing synchronization of the Sysplex Timer timing signals.
To ensure correct Sysplex Timer and server time synchronization, the end-to-end lengths of the transmit and receive fibers within an individual ETR or CLO link must be equal (within 10 meters). However, special care should be taken when using erbium-doped fiber amplifiers (EDFAs) or dispersion compensation units (DCUs) to ensure the end-to-end lengths of the transmit and receive fibers of the link are equal (within 10 meters). EDFAs and DCUs contain significant lengths of fiber, which must be included in the total fiber distance calculation. Furthermore, the lengths of fiber may be asymmetric between the transmit and receive fibers. For long distances over fiber, these requirements are challenging and result in low accuracy.
More recently, IBM has integrated time synchronization functions previously provided via the ETR links, which operate at a data rate of 8 Mbps, into the intersystem channel (ISC) link which operates at 2.125 Gbps and provides other control functions beyond those provided by ETR links. These ISC links execute a proprietary server time protocol (STP) which is similar to the network time protocol (NTP). STP-capable ISC links are expected to replace ETR links over time. Advantageously, such an ISC link is faster and more accurate than the ETR link. However, ISC links are limited in distance below 100 km.
Despite these advances, the determination of transmit/receive path differential delay is still lacking in conventional systems and methods. This determination is very important because IBM specifies less than a 10 microsecond transmit/receive path differential delay requirement. This 10 microsecond transmit/receive path differential delay requirement is apportioned as follows: 5 microseconds for the fiber plant (including any required optical amplifiers and their associated dispersion compensating fiber), 2.5 microseconds for the electronic equipment at either end of the connection, and 2.5 microseconds for margin.
Referring to FIG. 1, the transmit/receive path differential delay is determined by first ascertaining the roundtrip delay between Sysplex A 10 and Sysplex Z 12 along both a transmit path 14 and a receive path 16. Time stamp 1 18 is associated with a message (or frame) upon transmission from Sysplex A 10 to Sysplex Z 12. Time stamp 2 20 is associated with the message upon receipt at Sysplex Z 12. Time stamp 3 22 is associated with the message upon transmission from Sysplex Z 12 to Sysplex A 10. Finally, time stamp 4 24 is associated with the message upon receipt at Sysplex A 10. The roundtrip delay is equal to the difference between time stamp 4 24 and time stamp 1 18, and the transmit/receive path differential delay is always assumed to be equal to zero, meaning that the transmit and/or receive path delay (the one-way delay) is equal to the difference between time stamp 4 24 and time stamp 1 18 divided by two. Clock 2 20 is then reset by the one-way delay to sync with clock 1 18, or vice versa. As described below, however, this one-way delay is often inaccurate, as it is rarely, if ever, actually equal to one-half of the roundtrip delay.
The problems with the above systems and methods are that: 1) the fiber disposed between Sysplex A 10 and Sysplex Z 12 is likely spliced differently between the transmit path 14 and the receive path 16 and/or the length of the transmit path 14 differs significantly from the length of the receive path 16; 2) one or more amplifiers 26, such as one or more erbium-doped fiber amplifiers (EDFAs) and/or the like, and/or one or more dispersion compensation modules (DCMs) 28, the one or more DCMs likely differing significantly in fiber length, are disposed between Sysplex A 10 and Sysplex Z 12, the one or more amplifiers 26 and/or DCMs 28 selectively affecting the delay between the transmit path 14 and the receive path 16; and 3) multiplexing, such as wavelength-division multiplexing (WDM), dense wavelength-division multiplexing (DWDM), or the like, is likely incorporated between Sysplex A 10 and Sysplex Z 12 (thereby allowing a plurality of protocols to be bundled per wavelength, such as Enterprise System Connection (ESCON) protocol, Fiber Channel (FC) protocol, etc.), the multiplexing scheme incorporated selectively affecting the delay between the transmit path 14 and the receive path 16. Each of these interventions contributes significantly to the transmit/receive path differential delay.
It should be noted that 5 microseconds of differential delay is approximately equivalent to a difference in length between the transmit fiber path and the receive fiber path of 1 km. It should also be noted that the electronic equipment budget of 2.5 microseconds is equivalent to approximately 530 bytes of data at a 2.125 Gbps line rate, the rate at which an STP capable ISC link operates. In order to deal with clock noise and variations in clock frequencies as well as supporting signal multiplexing, the data streams generally need to be buffered, typically via first-in/first-out (FIFO) registers. Depths (sizes) of these FIFOs are typically in the range of hundreds of bytes, with several FIFOs being present in the end-to-end datapath all with different fill levels. The variation in fill levels between the sum of all FIFOs in the transmit path versus the sum of all FIFOs in the receive path must be maintained below approximately 530 bytes in order for the buffering function itself to avoid introducing differential delay that exceeds the 2.5 microsecond requirement. Control of FIFO depth and its variation becomes a critical component to reducing the differential delay in the electronics components of the system.
As a result of the above-described improperly determined and/or uncorrected for transmit/receive path differential delay, time stamps may be inaccurate and may, in some circumstances, be duplicative, resulting in transaction processing overlaps and, in general, inadequate performance of the GDPS integrated, automated application and data availability solution, among other problems. The systems and methods of the present invention simply and effectively address these problems.