1. The Field of the Invention
The present invention relates to the evaluation of data events occurring in a communications system. More specifically, embodiments of the present invention are concerned with systems and methods for establishing and using a common time base in connection with the operation of a multi-link protocol analyzer in a multi-protocol communications system.
2. Related Technology
Many data communications systems use a variety of different data transmission mechanisms to enable communication between and among associated subsystems. In general, the type of data transmission mechanism employed in a given situation is determined with reference to the particular tasks desired to be accomplished in connection with that transmission mechanism and associated systems. Each different transmission mechanism, in turn, is associated with a particular transmission, or communication, protocol that defines various parameters concerning the transmission of data in connection with the transmission mechanism. Such communication protocols commonly specify, for example, the manner in which data is encoded onto a transmission signal, the particular physical transmission media to be used with the transmission mechanism, link layers and other attributes.
As suggested above, a single data communications system may use multiple different transmission mechanisms and protocols. As an example, an enterprise may employ a communications system that uses five different data communications protocols, each adapted for a particular situation, wherein the five protocols may include: a first protocol for a high speed, inexpensive short-haul connection on the computer motherboard; a second high-bandwidth protocol for data center transmissions; a third protocol that is suited for efficiently transmitting information across the enterprise local area network (“LAN”); a fourth protocol adapted for high bandwidth, long haul applications; and, finally, a fifth communications protocol suited for data transmission to high performance disk drive storage systems. Thus, the typical communications system comprises a patchwork of different subsystems and associated communications protocols.
In this way, a communications system can be created that makes maximum and efficient use of the functionalities and capabilities associated with various different communications protocols. Further, advances in communications technology, coupled with declining costs, enable such communications systems to be implemented in a relatively cost effective fashion.
While communications systems that include components, devices and subsystems of varying protocols are able to exploit the respective strengths and useful features associated with each of the protocols, such multiple protocol systems can be problematic in practice. This is especially true where problem identification, analysis and resolution are concerned. In particular, the use of multiple communications protocols within the bounds of a single communications system greatly complicates the performance of such processes.
For example, as network data moves from a point of origin to a destination, by way of communication links, or simply “links,” the data passes through a variety of devices collectively representing multiple protocols. Typically, each such device modifies the network data so that the data can be transmitted by way of a particular link. However, modification of the data in this way often causes errors or other problems with the data. Such errors may occur as the result of various other processes and conditions as well. Thus, the various communication links in a communications system are particularly prone to introduce, or contribute to the introduction of, data errors. Moreover, data errors and other problems present at one location in the data stream may cause errors or other problems to occur at other locations in the data stream and/or at an other points in the communications system and associated links.
One approach to problem identification, analysis and resolution in communications systems involves capturing a portion of the network data traffic. The captured data can then be retrieved for review and analysis. In some cases, such data capture is performed in connection with a protocol analyzer that includes various hardware and software elements configured and arranged to capture data from one or more communications links in the communications system, and to present the captured data by way of a graphical user interface.
Generally, such multi-link protocol analyzer analyzers, or simply “analyzers,” capture data traffic in the communications system over a defined period of time, or in connection with the occurrence of predefined events. Use of the multi-link protocol analyzer thus allows a network administrator or hardware developer to track the progress of selected data as that data moves across the various links in the communications system. Corrupted or altered data can then be identified and traced to the problem link(s), or other parts of the communications system.
Implementation of this functionality, however, requires that a causal relationship be identified between the data captured by way of the various links. In particular, in order to classify event “A” as a possible cause of event “B,” it must be shown that event “A” occurred prior in time to event “B.” If event “A,” or at least a portion of event “A,” did not occur prior in time to event “B,” then event “A” cannot be the cause of event “B.” Accordingly, identification of a causal relationship cannot be performed without knowledge of the order, in time, that the data of interest arrives at a particular destination, or destinations, in the communications system. That is, causal links or relationships between data events occurring on different links within the communications system cannot be identified until the temporal relationship between those data events is known. As discussed below, typical analyzers present a number of problems in this regard.
For example, identification of such causal relationships between data events is complicated by the fact that the data is transmitted at different rates over the different links. As noted earlier, the differing data transmission rates stem from the fact that multiple data communications protocols are employed within a single communications system, where each protocol has a different associated data rate and transmission frequency. Thus, Fibre Channel systems operate at a frequency of about 2 GHz, Infiniband systems operate at a frequency of about 2.5 GHz times 4, and Gigabit Ethernet systems operate at a frequency of about 1 GHz.
Thus, the speed with which a particular portion of data can be transmitted is a function of the frequency of the associated protocol. A comparison of the Gigabit Ethernet (“GigE”) and Infiniband protocols serves to illustrate this point. As noted above, GigE systems operate at a frequency of about 1 GHz, while Infiniband systems operate at a frequency of about 2.5 GHz, so that the same amount of information takes about 2.5 times longer to transmit in a GigE system as in an Infiniband system.
In typical data capture operations, the clock of one of the protocols is used as a basis for timestamping of the captured data. The timestamping is performed so that the temporal relationships between captured data events can be determined. However, because each protocol in multi-protocol systems has a different associated clock, the sorting of captured data based upon a timestamp made with reference to a particular protocol clock is frequently inadequate to enable determination of causal relationships between captured data events. This is especially true where it is desired to determine whether an inter-protocol relationship exists between, for example, a data event associated with the Infiniband portion of the system, and a data event associated with the GigE portion of the system.
In the aforementioned example, the GigE protocol is relatively more “coarse” than the Infiniband protocol in that, for a given time period, a GigE system clock increments fewer times than does the Infiniband system clock. Thus, a particular data event may appear relatively longer, or shorter, than another data event, depending upon which clock is selected as the basis for the timestamps. For example, a 2 clock increment GigE data event would be 5 clock increments long in the Infiniband protocol, so that while the respective data events appear to have different lengths, relative to their corresponding protocols, the data events actually have the same time duration in absolute terms.
As the foregoing suggests, the different data rates associated with the communications protocols also compromise the ability to determine start and stop times of particular data events. Of course, this situation is further aggravated where multiple additional communications protocols are employed in a communications system. Thus, in a system that employs multiple communications protocols, the protocol-based timestamping of multiple captured data events makes it difficult, if not impossible, to make accurate and reliable determinations as to absolute and relative data event lengths, and data event start and finish times. As a result, the identification of temporal relationships between data events, such as is required to facilitate time-based sorting and analysis of those data events, is substantially foreclosed.
One possible approach to the determination of temporal relationships and thus, causal relationships, between captured data events is to record a well known timestamp in each data stream. For example, an absolute time reference, such as Coordinated Universal Time (“UTC”) (measured in seconds) could be used to make a well known mark in each data stream that can be used as a reference point later. Since each data stream is transmitted at a well known rate, it would seem to be a relatively simple matter to determine any and all causal relationships for all data at any arbitrary point in the data capture. As discussed below however, this approach is problematic, at least because it is based upon the assumption that there is no drift in clock frequency at any of the links.
For example, in each data transmission method, the transmit clock is specified as being a certain rate, but includes a certain amount of acceptable error or deviation from the standard rate. In many specifications, an error of several parts per million is allowed. Further, the transmission error of each link is different, which means that even after a very short period of time, several seconds for example, each link may have a permissible error or deviation of thousands, if not millions, of clocks, or clock increments, from the original time. Moreover, the various clocks are unsynchronized with respect to each other as well. Thus, after the passage of several thousand seconds, there is no virtually way to accurately and reliably determine temporal or causal relationships between and among the data events in the data stream. As an example, one byte that appears to have preceded another byte in the data stream may, in fact, have followed several milliseconds afterwards instead.
In view of the foregoing, and other, problems in the art, what is needed are systems and methods for establishing and using a common time base in connection with the operation of a multi-link protocol analyzer, or a group of single link protocol analyzers, in a multi-protocol communications system. Among other things, the common time base should facilitate timestamping and ordering of captured data events in such a way that temporal relationships between and among captured data events representing multiple protocols can be accurately and reliably identified, notwithstanding differing clock rates, and effects such as clock frequency drift.