Telecommunication equipment is generally divided into two main categories. One category is the telecommunication switch. The traditional switch routes numbers dialed to a particular circuit which, in turn, is routed to another circuit, continuing in an iterative process. Each stage, to some degree, constitutes a switch. Variations on the switch include the POTS (plain old telephone service) system and newer technology systems such as ATM (asynchronous transfer mode) which dynamically determines on which circuit to route the calls.
The other category of telecommunication equipment is transmission equipment. This includes high bandwidth copper wire or fiberoptic cable intended to transmit data over a long distance or to directly interface with other transmission equipment without additional switching. For example, in the U.S. and Canada, modern transmission equipment generally transmits at STS1, OC3, OC12 and OC48 speeds. Older transmission equipment transmits at DS1 and DS3 speeds. The connections of transmission equipment are not dynamic, but rather are provisioned as fixed ahead of time such as the connections between (or within) telephone companies. In modem telecommunication systems, software is used to establish these permanent connections between two geographical points.
The present invention can be applied to any of these systems including other connectionless networks such as local area networks (LANs) or the Internet, as well as to cable technology.
For most telecommunications equipment that have interfaces, such as switches, the interfaces are composed of copper wires and/or fiberoptic cables. The physical wires or cables (lines) can be defective, suffering from problems such as environmentally induced noise, hardware malfunction or complete severing. Furthermore, the traffic (the one or more paths carried within the line) being transmitted may be defective with respect to bandwidth. For example, problems can occur on a line which carried a particular problematic path which was itself multiplexed together with other paths from their respective lines to form the aggregated traffic contents of a higher bandwidth line.
Performance Monitoring (or Performance Management) ("PM") provides quality assurance to telecommunication system operators by allowing a piece of telecommunications equipment to assess its own health and the health of the traffic that is flowing into and out of it. PM is accomplished by monitoring and measuring certain kinds of quantitative operational data which reveal the quality of the paths and lines. This operational data can generate "PM data" which can be monitored by employees located at the switch or can be transmitted to a central hub of network operations where other employees oversee the entire network. PM data permits an operator, whether in a central office or worldwide, to diagnose a failure as it occurs or predict the onset of failures (nonfatal increases in the occurrence of a problem over a period of time) and execute steps toward preventative or remedial maintenance of the equipment. Sources of PM data include dedicated hardware application cards located at points of entry of traffic across the copper wire of fiberoptic cable and UNIX processes executed as part of the telecommunications process.
There are approximately 200 identical instances of PM data, including severely errored seconds, defects that have actually occurred, number of failed tests, laser temperature measurement or consumption of pools of resources. The occurrence of any of these instances detected by monitoring operational data creates an event. An event is a defect, failure or anomaly in transmission which causes the occurrence of a severely erred second or a spontaneous condition. The problems could be hardware (e.g., for POTS ) where PM data is focused on software (e.g., for new technology switches) where traffic metering and monitoring (TMM) data is focused. While there are several differences between PM data and TMM data, for the purposes of the present invention, PM data is meant to also encompass TMM data unless stated otherwise.
Because modern telecommunications systems exist in multi-vendor and/or multi-product environments, a telecommunications software system must be designed to operate in such an environment. Consequently, PM system activities and requirements are highly standardized due to the required compatibility between diverse products. Standards organizations such as the International Telecommunication Unit (ITU), the American National Standards Institute (ANSI) and Bell Communications Research (BELCORE) set forth standards which incorporate PM.
PM data is acquired by using several types of functionality. The vast majority of the PM data is acquired by an accumulator function. An accumulator function's value varies unidirectionally. For example, an additive accumulator function begins with a value of zero and is incremented or increased by a value (including zero) at each instance of an event. Conventionally, at each change of value, the accumulator function checks for whether the value has crossed a threshold, after which certain actions may be taken, including the emission of a threshold crossing alert (TCA). A minority of the PM data is acquired by a gauge function. A gauge function's value varies bidirectionally. The value fluctuates straight up and down depending on what is being measured on the transmission line. For example, gauge functions can reflect a laser temperature meter for measuring the power output of a fiberoptic cable. When the value of a meter becomes too high ("too hot"), an overload problem of some kind may be occurring. When the value becomes too low ("too cold"), there may not be enough power to drive the signal through the cable over the distances required. In either case, when a gauge function's value moves outside of the tolerable range, an onset threshold is crossed. When the value of the gauge function returns to the tolerable range, an abatement threshold is crossed. Thus, onset signifies entering some state. Abatement signifies crossing a threshold that returns the value to a previous state. The gauge function inherently checks for these threshold crossings.
These and other PM data acquiring functions are packaged together, conventionally, as one product-dependent software system by PM designers. Furthermore, PM data collection and storage functionality is also bundled into the conventional PM system package. While this satisfies the necessity for custom-tailored solutions, it results in much time and expense in designing, writing and debugging PM software products.
One of the standards for fiberoptic telecommunications is synchronous optical network (SONET). The SONET PM standards (GR-253-CORE SONET Transport Systems: Common Criteria) describe a set of state machines. One of the requirements of the SONET specification, with regard to the accumulator function, is that a network element support performance monitoring which permits the discarding of PM data accumulated in a ten second window of a monitoring interval ("the discard standard").
There are two approaches which satisfy the discard standard. One is more complex and harder to achieve than the other. The first easier approach actually permits "fudging" or falling short of discarding the unwanted data by clearing certain registers and counts. In essence, the requirement is not fulfilled in this option. It follows that the majority of PM software systems implement the less complex, easier to achieve standard approach. However, this approach results in PM software systems that are less accurate than systems that achieve the higher standard.
The second, more complex approach is dictated by state machines in SONET that imply an undo operation. There are basically two ways of implementing an undo operation. One method incorporates maintaining the previous state for a period of time as an open transaction and, at some point, closing the transaction. During the period from the transaction's opening to the transaction's closing, the system can discard the new state data and roll back to the previous state. The second method incorporates the theory of providing an inverse operation for every operation. For example, for each increment function, a decrement function can be provided and for each TCA emission, a TCA revocation can be provided.
Either method places a significant burden on the PM designer. The first, transaction-oriented approach would be very unusual and bulky to incorporate into a PM application due to its complex software, slow performance, high RAM storage consumption, high CPU consumption, and overall design complexity. To function correctly, the second inverse operation method implies a cascading wave front of inverse operations rippling through the telecommunication equipment and/or to computer systems outside the telecommunication equipment. This design complexity requires excessive coordination between personnel and systems. Thus, there is a need for a method of more efficiently implementing the more accurate, more complex standard approach.
PM data, after it is acquired during a monitoring interval, is normally collected for analysis and/or transmission to a remote site. Conventionally, this collection incorporates a remote application demarcation of the monitoring interval. The various PM data functionality is duplicated such that there are disparate current and previous accumulator functions, disparate current and previous gauge functions, etc. The application domain determines which function to collect from. This application responsibility creates complexity in the conventional PM system. Also, multiple bulk transfers which include PM data from different sources of PM data are transmitted in a piecemeal fashion. The methods presently used package subsets of PM data based on its source because the recipient of that data cannot process a mixed package of PM data. Also, most systems do not handle heterogenous mixtures of sizes of integers or categories of PM data or PM data aggregates (e.g., DS1, DS3, OC12). This type of system is wasteful of system resources and tends to be inflexible to the characteristics of PM data.
Performance monitoring data which have been generated at a source of PM data and acquired can be collected and transmitted to various local or remote processors. Generally, two methods of collection have been used in conventional PM systems. The first method used can be termed, the "pull" method. The pull method assumes a system architecture in which there is a central authority processor among multiple processors. The central authority demarcates the performance monitoring intervals by transmitting a signal to all the other processors informing them of the demarcation. The problem with this method of demarcation is that the time it takes to transmit to each one of the processors creates a time drift effect. By the time the last transmission is completed, the demarcation time is considerably different from the time of the first transmission. Even in a multicast demarcation setting, drift occurs due to the multiple layer architecture of the system. Because not all processors can communicate with all other processors, a multicast signal becomes a recursive multicast signal and latencies of each architectural level cause drift. The standards generally allow for only about ten seconds of error or "slop." Often, this time drift, along with the inherent inaccuracies in the time of day clocks, result in substandard system performance (contributing to more than 10 seconds of slop). Furthermore, the central authority processor often suffers from poor performance or slower execution speed due to its burdensome demarcating function. As a result, operators of these systems usually choose to set them to demarcate only a subset of the PM data.
The second method of PM data collection utilized can be termed the "push" method. In this method, the demarcation function is completely delegated to the various processors whose clocks are synchronized. At the point of demarcation, each of these processors package their data and, unaware of any of the others' performing the same operation in unison, dump it to a centralized controller such that a flooding situation occurs. In a real time system, this results in the controller not being able to execute its other activities, such as call processing , while it is processing the flood of data (e.g., for 8 to 10 seconds at a time). The controller may perform sluggishly for up to a minute after the data dump. Thus, the push method actually degrades the performance of a piece of telecommunications equipment when it is intended to monitor and improve its health. Thus, both methods suffer from problems due to their design.
Finally, after collecting PM data for a monitoring interval and transmitting it, if necessary, to a controller, an application must write the data to disk. Therefore, to perform the functions of a PM system, a method of data input/output (I/O) is necessary. Specifically, persistence on disk prevents the loss of data and state information in the event of a power loss or other shut down.
Conventionally, PM designers had three options to choose from in this regard. The first option for PM storage design was using a commercial off-the-shelf database product such as Sybase or Oracle, or even an object-oriented database such as Borland's "Interbase." The problem with using these database products for PM is that their high functionality and many features result in a very slow, sluggish system characterized by low throughput. That is, the writing to disk would be slow. Because PM does not require the extent of functionality supplied by these database products, the cost of lower throughput is not justified.
The second option for PM storage design is to write disk I/O operations from scratch using UNIX files in the application domain without any underlying consistent software. There are several problems with this method. First, it lacks reusability. Every time there is a variation or modification in the telecommunications equipment or system, the designer might need to re-tailor this custom software in multiple places, costing much engineering time. Second, from a maintenance perspective, debugging a system such as this would be difficult due to a lack of consistency in the functionality. Third, this system tends to be sequential. That is, it lacks the ability to access data in a random fashion from multiple threads of execution. Thus, a designer has no ability to deal directly with disk blocks (e.g., flush a single disk block or read one disk 5 block) in an efficient manner. The only control over what data is resident in RAM versus what has been written to disk is through flushing an entire file. Also, a designer could not control concurrent access of multiple threads. In sum, performance cannot be predicted since the operating system controls the disk block layout. Finally, this type of system normally lacks any caching mechanism or control over caching.
The third option for PM storage design is to utilize a non-cached, non-file system in which raw I/O operations are performed to disk. The problem with this option is that it has no caching functionality. The lack of caching means that many more disk operations are necessary. Because disk drives are two to three orders of magnitude slower than the rest of the electronics of a computer, this leads to poor system performance. Thus, none of these options satisfies the needs of a PM system.
Finally, as described hereinabove, in designing a conventional PM system to incorporate the above-described functionality which meets the standards, PM system designers have packaged all of the software's functionality together, including the product-specific functionality and product-independent functionality. These systems are designed to be used on and are highly focused toward particular types of telecommunication equipment.
As a result, these conventional PM systems have not been designed to be flexible. That is, they have always been custom written as a single software program for a particular telecommunication equipment product configuration. No functionality was separated out because functionality boundaries were not recognized. Therefore, a PM designer has had to rewrite the PM code each time it was to be applied to another configuration, type or piece of telecommunication equipment. This is very time-consuming and expensive to implement so that designing, writing and debugging in made easier and less costly.
An object of the present invention is to provide a method of collecting PM data which does not suffer from poor PM performance nor harm the overall performance of the telecommunications system.