A telecommunications network includes a litany of network elements, including switches and switching components. Switches help complete a circuit between two communications entities, such as between two persons to make a phone call or between two modems to communicate data. A switch directs traffic and completes circuits by referencing data stored in tables. These switch tables contain routing information as well as a variety of other data items that will be discussed in greater detail below. Exemplary data items, such as operational measurements, include a count of incoming calls, incoming call attempts, and overflows, instances of glare and much more. This data, however, is not stagnant.
At certain intervals, storage within a switch allocated to transient data is exhausted. Transient data is data that is periodically expunged to make room for new data. Thus, every thirty minutes for example, a rollover of accumulated traffic data occurs. The data to be expunged, however, can be valuable if a network problem occurs. Currently, there is no system nor method available to harness this data and make it immediately available incident to desired intervals (such as every half hour or so).
The aforementioned problem would be somewhat akin to having an envelope that could hold only a certain number of receipts, and individual receipts could not be removed. A consumer would thus have to empty the envelope after storing a threshold amount of receipts, thereby losing old receipts and their corresponding value should a respective item break and need to be returned. The current state of the art allows for data archival, such data is unavailable until the next day.
Unlike the retail world where waiting an extra day may not translate to a significant problem, in the telecommunications industry, seconds count. When a person wants to place a telephone call, but does not receive a dial tone, the problem is expected to be remedied with a handset-button press. Imagine the frustration of having to wait until the next day to place your call. Such a scheme is unacceptable and even dangerous—“911” and other emergency calls must be able to be processed quickly and reliably. Communications carriers have strived to maintain a high level of service at great expense and by expending considerable resources. But as data-communication demands rapidly expand, historical methods of dealing with the problems associated with iteratively losing temporarily stored data will not be adequate.
Absent the present invention, analysts were often made aware of problems by their own customers, who were experiencing an interruption in service—no dial tone, busy signals, misrouted calls, unclear service, loss of data packets, and more. To be firstly made aware of a problem by justly complaining customers is the bane of a company, especially a communications company, which is expected to deliver substantially uninterrupted service. After realizing a problem exists, analysts might attempt to gather information from the device, such as a switch, servicing the customers. As previously stated, historical data is not available until the next day.
But even in situations where analysts try to work rapidly to gather information from transient data before that data is expunged, what little information that is retrieved is in a format that is difficult to understand. Several time-consuming processes must be followed to return even a little data. These processes physically limit the amount of information that can be gathered prior to losing data.
In some, even many, situations, analysts must wait until the next day to resolve the problem. During the interim, efforts are made to reroute traffic. But even rerouting traffic is difficult because to reroute traffic, one must identify the problem device from which traffic should be rerouted. If temporary bandwidth is available, overkill techniques can be temporarily employed to reroute as much data as possible. Such a technique is risky, however, because if still another fault occurs, limited bandwidth remains with which to help resolve the additional problem. Even when the data is made available after a lengthy waiting period (such as the next day), it is in a confusing format that does not lend itself to quick analysis.
To illustrate an exemplary hard-to-read format, consider the data returned from an OmShow command issued to an illustrative switch, such as the DMS100 offered by the Nortel Networks Corporation of Brampton, Ontario, in Canada. An OmShow command returns certain data parameters of a switch. Table I provides an exemplary format of data returned to a screen or printer incident to issuing an OmShow command (“omshow trk active DMS20064K”) to a switch for the trunk group “DMS20064K.”
TABLE IPrior-Art Data Format>omshow trk active DMS20064KTRKCLASS: ACTIVESTART: 2003/10/10 10:00:00 FRI; STOP: 2003/10/10 10:11:40 FRI;SLOWSAMPLES:     7 ; FASTSAMPLES:     70 ;KEY (COMMON_LANGUAGE_NAME)INFO (OM2TRKINFO)INCATOTPRERTEABINFAILNATTMPTNOVFLATBGLAREOUTFAILDEFLDCADREUPREUTRUSBUMBUOUTMTCHFCONNECTTANDEMAOFANFTOTUANSWERACCCONGNOANSWERINANSWEROUTANSUINANSU448 DMS20064K2W   48   480000000000800000008000000
The data of Table I is relatively difficult to read. A person must be trained how to read the data. The trained person then must persevere the technical constraints of a monitor, which may lack a scroll function to view all the data at one time. From Table I begins with the actual OmShow command (“omshow trk active DMS20064K”), and ends with a matrix of numerals that relate to a corresponding matrix of labels. For instance, “INCATOT,” (incoming attempts total) corresponds to the upper left “0.” To determine what value corresponds to “CONNECT,” an analyst would need to determine the label's position in the upper matrix and then locate the data value in the same position of the lower matrix, which would also be “0,” vertically sandwiched between two “8's.” This tedious approach is compounded as fatigue sets in from analyzing display after display. Moreover, the data in Table I corresponds to a mere single OmShow request for only a singly trunk group.
In practice, several tens and maybe even hundreds of OmShow requests may need to be initiated to troubleshoot a problem element. All of the data from each OmShow request must then be tediously analyzed. Opportunities for human error abound, which could plague the troubleshooting process with a garbage-in/garbage-out analysis.
To summarize, the current state of the art suffers from a variety of shortcomings, not limited to the following. Prior-art techniques do not permit a carrier to effectively troubleshoot problems in a communications network where transient data is needed. No immediate notification of problems is available until at least the next day. Problems that arise could include call blocking, dropped calls, and more and are often discovered in reaction to customer complaints, which makes problem resolution reactive instead of proactive. A carrier cannot see a build-up of traffic and cannot effectively reroute affected data traffic. Without access to operational-measurement data is akin to trying to identify the source of a problem is analogous to looking for a proverbial needle in a haystack.
The state of the art could be improved by providing a system and method of identifying problems in a communications network by preserving transient data, such as operational-measurement data, and providing a way to automatically retrieve the data, store it, and format the data so it can be easily analyzed.