This invention relates to protocol analysis of signal networks, and more particularly to knowledge based systems for performing such analysis.
As known, networks represent shared access arrangements in which several network devices, such as computers or workstations (collectively xe2x80x9cstationsxe2x80x9d), are interconnected by a common communications medium to allow users to share computing resources, such as file servers and printers, as well as application software and user work product. The communication medium may be wireline, such as by coaxial, twisted pair, or fiber optic cable, or wireless, such as cellular or radio frequency (RF) transmission. The networks may range from bridged segments of local area networks (LANs) located in a department or single floor of a building, to a wide area network (WAN) of LANs which are geographically distributed and interconnected through switching devices, such as routers or bridges.
Depending on performance requirements, the different LANs within a WAN may have different physical connection configurations (or xe2x80x9ctopologiesxe2x80x9d), such as Ethernet or Token Ring. They may also have different vendor proprietary LAN hardware and software with different signal protocols that govern the exchange of information between the stations in the LAN. When these different topology and different protocol LANs are interconnected, which is referred to as xe2x80x9cinternetworkingxe2x80x9d, there must be an exchange of signal protocols. The Open Standards Interconnect (OSI) seven layer interconnect model developed by the International Organization for Standardization, and which is incorporated by reference herein, describes how information is exchanged between software applications on workstations in different networks by passing the information through a hierarchy of protocol layers.
Networks must be managed to ensure their performance. This includes monitoring signal traffic for trends related to signal volume, routing, and transmission speed to pro-actively plan for network growth and to avoid signal congestion and network downtime. This also includes detecting and diagnosing network operational problems which affect performance to both prevent problems and to restore network operation with minimum downtime following the detection of a problem. These are the responsibilities of a network administrator, whose network duties require both anticipation of performance changes and diagnosis of performance failures. This requires the availability of network statistics related to performance, and network administrators commonly collect an archive of network management statistics that indicate network utilization, growth and reliability, to facilitate near-term problem isolation and longer-term network planning.
The general categories of statistics monitored include those related to: utilization, performance, availability, and stability within a monitoring period.
These may defined as follows:
Utilization statistics relates to network traffic-versus-capacity (i.e. efficiency) and the statistics include frame count, frames-per-second (FPS), the frequency of occurrence of certain protocols, and certain application level statistics;
Performance statistics relate to quality of service issues, such as traffic delays, the number of packet collisions, and the number of message packets dropped;
Availability statistics gauge the accessibility of different OSI protocol layers within the network, and include line availability as percentage of uptime, root availability, and application availability; and
Stability statistics describe short term fluctuation in the network which degrade service, including: number of fast line status transitions, number of fast root changes (root flapping, next hop count stability, and short term ICM behavior).
Some of these statistics are empirical (xe2x80x9cmeasured statisticsxe2x80x9d) and obtained by counting the occurrence of the selected metric, and others require analysis of actual frame content (xe2x80x9canalysis-derived statisticsxe2x80x9d). Protocol analyzers are the known instruments for providing these measured and analysis-derived statistics.
To be of analytical value the acquired statistical values must be capable of being correlated in a real time composite which quantitatively measures real time network performance. Measured statistics are readily acquired in real time with hardware counters and time stamped counts, which acquire and report the data in real-time. With analysis-derived statistics, however, the network frames are captured in real time but the analysis must necessarily occur in machine time. User selected (xe2x80x9cfilteredxe2x80x9d) network frames are real time captured, time-stamped, serially numbered, and stored in a queue for analysis. The frames are then analyzed in machine time and the analysis-derived statistics are reported with their associated frame time-stamp, thereby allowing them to be correlated with the measured statistics.
In the event of xe2x80x9cburstyxe2x80x9d traffic patterns, the sequenced capture, storage, and analysis is prone to experiencing a back-up resulting from the inability of the process time to keep pace with the rate of frame capture. When this occurs, the capture is halted and network frames are lost until the back-up clears. The lost frames represent lost analytical data. In addition, however, the analyzer has no quantitative measure of the number of frames lost. The result is a loss in data integrity and a corresponding loss in the accuracy of the resulting statistical composite.
Even with accurate performance statistics, the ability to diagnose network failures quickly, or at all, relies on the education and practical experience of the network administrator in general, and their experience with a network in particular. So much of a network""s cyclic performance is he result of cyclic user demand, or of user custom, or of the manner of doing business, that xe2x80x9cinstitutional memoryxe2x80x9d is an important asset in diagnosing failures. Similarly, so much of network failures are the result of human error that the xe2x80x9cfamilialxe2x80x9d experience of the administrator with the user group is also important. Unfortunately, the continued rapid growth in network installations and expansions often requires that less experienced personnel be made responsible for administration. There is a demand, therefore, for network tools in the form of knowledge based systems which may assist in the diagnosis of network performance by lesser experience personnel as well as increasing the speed and accuracy of failure diagnosis even by experienced administrators.
The object of the present invention is to provide a knowledge based system capable of assisting users in the diagnosis of network performance. Another object of the present invention is to provide a knowledge based system capable of providing such diagnosis with increased accuracy and speed.
According to the present invention a knowledge based expert analysis system includes a rules based inference engine comprising a plurality of algorithms, or xe2x80x9cinference rulesxe2x80x9d, grouped in one or more categories of defined network performance criteria. The rules in each category are arranged in a hierarchy, with each rule being interdependent in a prioritized arrangement with one or more other rules in the same, or in another category. The rule interdependencies are fixed, however, the priorities are adapted to the objective of the particular analysis, as entered by the user, such that the rules to be used for a given analysis are defined at run time. In further accord with the invention, the threshold value required to satisfy a given rule condition is also programmed at run time based on information entered by the user and, alternately, in the event of no user entered information, on established default values. In still further accord with the present invention, some or all of the rules may similarly be programmed from their default state to detect alternate network events, as deemed necessary by the system based on user entered information describing the object of the analysis, or the condition of the network.
In yet still further accord with the present invention, each rule further includes a unique rule identifier, a variable sampling frequency defining the sampling interval (in seconds), a defined priority in relation to other rules to indicate which rule is evaluated first, and a status field which identifies the state of the rule as being enabled, disabled, or satisfied. In yet still further accord with the present invention, the sampling frequency of each rule is checked at defined intervals to determine if the rule should be evaluated to determine if it is satisfied and, if so, the status is changed to SATISFIED for the consideration of the other dependent rules. If the satisfied rule event is to be logged by the system, the rule identifier is sent to an event queue.
The rules based expert analysis system of the present invention includes four operational phases, the first two of which allow the user to enter information regarding the nature of the analysis, including specific problem conditions. The default priority is then automatically modified in dependence on the characteristics of a user stated problem. This novel approach provides several benefits. The rules to be enabled and evaluated are determined by the events and symptoms entered at run-time instead of at system design time. The events and symptoms can be based off of other events occurring or not occurring, allowing for event correlation to be supported by developing rules based on the events to be correlated. By allowing the text of a rule to be specified at run-time, further instructions can be given to users in the troubleshooting process that can provide additional assistance to them during the problem analysis.
These and other objects, features, and advantages of the present invention will become more apparent in light of the following detailed description of a best mode embodiment thereof, as illustrated in the accompanying Drawing.