1. Field of the Invention
The present invention relates to a method of determining whether or not information within a character string is valid, and, more particularly, to a technique for validating that a string within a data record matches a pre-stored valid identifier to permit association of the data record with other data records having common identifiers.
2. Description of the Related Art
Both communication network service providers and their customers recognize an increasing need to accurately measure operational performance of data communications networks. Communications networks are currently utilized in various applications for transmission and reception of data between parties at different locations. A typical data transmission system includes a plurality of customers linked by one or more data packet switching networks. Ordinarily, when a party needs to send and receive data over distances, the party (customer) enters into a service contract with a service provider to provide access to a data communications network.
Depending on an individual customer""s needs, the service contract may include provisions that guarantee certain minimum performance requirements that the service provider must meet. Among the performance metrics that need to be monitored to comply with such requirements may be those that reflect system performance from the perspective of the end user. For example, a service contract may specify a minimum access speed or a maximum allowable percentage of time that a user gets a busy signal when dialing into the network (e.g., a specified access attempt success/failure rate or a dial up success/failure rate). Further, if the customer expects to send and receive a particular amount of data on a regular basis, the customer may want the service provider to guarantee that a specified minimum bandwidth or throughput rate will be available to the customer at all times. The service provider may be required to ensure that the amount of time the network is unavailable to the customer is less than a specified percentage. Certain customer applications are sensitive to transmission delays and/or the loss of data within the network. Thus, the customer may want the service provider to guarantee that the average or minimum ratio of data units delivered by the network to data units offered to the network at the far-end is above a certain percentage (e.g., a maximum packet loss rate) and/or that the average or maximum transmission delays or a maximum variation in delays (jitter) will not exceed a certain duration.
From a service provider""s perspective, it would be competitively advantageous to be able to demonstrate to potential and existing customers that the service provider is capable of meeting and does meet such network performance metrics. Thus, the capability to provide analysis of network system performance at the service level, particularly in the context of network systems that share bandwidth between end-points or sites, would be advantageous from both a customer and service provider standpoint.
Internet service providers (ISP), who provide Internet connectivity to many customers, are an example of service providers that may want to monitor their networks to ensure acceptable operation. In a typical configuration, an ISP provides several Points of Presence for user access. A Point of Presence (POP) is a local exchange that users dial into via a modem and which connects the users to a wide area or global communication network, such as the Internet. To connect to the network, a customer configures his computer to dial a telephone number associated with a local POP. Once the hardware at the POP answers, the POP initiates data communications with the client. The POP is coupled to the network via well-known systems that need not be described in detail herein.
Performance metrics that reflect end-user experience are of particular interest to ISPs and their customers. A network monitoring system capable of accurately assessing and determining network service perform from the perspective of an end user is described in U.S. Pat. No. 09/256,647, pending to Chu et al., the disclosure of which is incorporated herein by reference in its entirety. In the network performance monitoring system described by Chu et al., user modules within the machines of end users upload monitoring data to a data collector of a backend data processing system. Data records from several such data collectors are aggregated and organized in a backend aggregator module. The aggregator is responsible for directing data to relational databases and information reporting engines to produce information useful for assessing operation performance, system troubleshooting and system planning.
In order to generate meaningful information that can be used to analyze system performance and troubleshoot system problems, the aggregator must organize data by associating like data with like data. For example, it is more informational to group failed connections by POP rather than grouping all failed connections together. If one POP is having more login failures per attempt than other POPs, the service provider may want to focus on troubleshooting the equipment at that POP. Thus, knowing the POP associated with monitoring data from various end-users allows the database/reporting engine to group data on a POP basis, thereby allowing the service provider to glean information about performance of equipment at individual POPs.
One of the pieces of information within data records uploaded to the aggregator is the phone string dialed by the modem. This string indicates which POP or phone number the caller may have dialed. The string containing the POP may contain one or more of the following tokens: escape characters to reach an outside line (e.g. xe2x80x9c9,xe2x80x9d from a typical US Hotel); pause characters (e.g. xe2x80x9c,xe2x80x9d); a country code; a code to indicate calls to a foreign country (e.g. xe2x80x9c011xe2x80x9d in U.S.); a code to indicate calls across area codes/regions (e.g. xe2x80x9c1xe2x80x9d in U.S.); an area code; a local number; calling card information; ISDN information; extraneous characters; and other miscellaneous characters.
In a global environment, where the escape codes are different in different countries and the length of codes is variable, many of the tokens are optional, and many dialed numbers are incorrect, parsing a raw string of characters and deriving a corresponding POP number is a challenging exercise. The quality of this derivation process directly impacts the value of the data for the operators of the POP. It is important that as much of the valuable data possible is extracted; however, incorrectly identifying the POP number can waste considerable operational investigation resources. Some service providers"" dialers guide the users in selecting a phone number from a phonebook (downloaded to users"" PCs); however, most of these dialers still allow the end users to enter any number they desire.
The difficulty of correctly identifying the POP associated with end-user monitoring data is a unique problem faced by backend data processors of network monitoring systems. For example, conventional telephone switching devices deal with this problem in an easier manner, because the telephone number dialed is rigidly structured, and the switching devices have a context in which to parse the information. Secondly, if a telephone switching device does not understand the format of the dialed number (e.g. if the user mis-dialed), the switching device can reject the number with an appropriate error message that then forces the user to dial a number that the switching device can parse. In a backend data aggregating system, there is no standard context in which to parse the data string containing the POP (e.g., the kind of logic used in a hotel private branch exchange (PBX) does not exist).
Conventional switching devices inherently ensure accurate string parsing in the course of connecting users that provide valid, parsable information and simply reject any unparsable information at the time of transmission by failing to connect the user. In contrast, a backend data processor of a network monitoring system possesses no inherent mechanism to force the user to pass the system something that the system can understand, since all the processing is done post-facto. Further, to monitor end-user experience and to provide useful information for troubleshooting, it is preferable to collect monitoring data on both valid and invalid information (e.g., the reason that a particular user has a low connection rate may result from the user repeatedly entering an invalid POP number rather than from a problem with the POP itself). Thus, the backend data processing system may be required to process valid number formats that can actually be parsed by intermediate devices, as well as invalid formats that cannot be parsed.
In the system disclosed in the aforementioned Chu et al. patent application, a basic parsing technique is described in which the POP number is extracted by stripping off all other characters in the uploaded string containing the POP number and essentially assuming that the remaining digits represent the POP. Specifically, the aggregator uses a pattern-matching algorithm to xe2x80x9ccleanxe2x80x9d the raw POP number. This algorithm maintains a list of known patterns for prefixes, access codes, credit card numbers, and individual country""s dialing patterns. While this approach is generally successful in identifying the correct POP number, it is only as accurate as the pattern knowledge base is comprehensive. In practice, the technique fails to correctly identify valid POP numbers of a significant percentage of data records uploaded to the aggregator. Consequently, some of the monitoring data is not successfully associated and aggregated with other data from the same POP, thereby reducing the overall accuracy and value of the information contained in the resulting performance reports.
Information other than the POP string may be available to identify the POP associated with a data record uploaded to a backend data processor. For example, the data record may contain user-configured information stored in the end-users computer identifying the user""s country, area code, service provider, etc. However, in many instances the configuration information on the user""s computer is incorrect, making reliance on this information problematic. This may be so despite the fact that the computer has successfully connected to the network. A typical example is a person traveling on business with a portable computer (e.g., a laptop). Within the computer, the area code and country code are typically configured to reflect the location in which the person lives. As the person travels from city to city and country to country and connects to networks in various locations, he may not change the client configuration to match his geographic location. If the person dials an appropriate local number at a given location to connect to a network, the call attempt may succeed even though the user-configured area code and/or country code information stored in the computer is wrong (this is because a correct local number appropriate to the location was supplied, and the user-configured information is not used to make the call/connection itself).
User-configured country and area code information is also commonly incorrect where the user attempts to connect to a network by dialing a number in a different area code or in another country. Again, the call attempt may succeed in connecting the user to the network even though the user""s computer had the incorrect configuration information. Likewise, if the area code for a particular location changes, user""s may or may not update their computer configuration to match the new area code. Network connection call attempts may continue to succeed if the calls are made using only the local number (which is presently still permitted in some area codes in the U.S.). In any of the foregoing instances, if a backend database/reporting engine of a network monitoring system were to assume that user-configured information regarding area code and country code was accurate, the system may identify a nonexistent or invalid POP number, thereby prevent data records from such users to be properly aggregated with those from users connecting to the network via the same POP.
Accordingly, there remains a need in network monitoring systems for an improved technique for determining the identity of POPs associated with data records uploaded by end users to a backend processing system of a network monitoring system before using the POP number to associate and aggregate network monitoring data and generating performance information on a POP basis.
Therefore, in light of the above, and for other reasons that become apparent when the invention is fully described, an object of the present invention is to generate more useful network monitoring information and reports by improving the accuracy of methods for associating data records having common attributes, such as a common point of presence (POP), thereby permitting more accurate and meaningful association and aggregation of network monitoring data.
A further object of the present invention is to provide a more reliable and accurate method of determining the identities of POPs to which data records containing network monitoring data correspond.
Yet a further object of the present invention is to validate that POP identification information contained within a data record matches a known valid POP number.
A still further object of the present invention is to account for a variety of different POP number formats in a convenient manner when attempting to compare and match POP identification information contained within a data record to known valid POP numbers.
Another object of the present invention is to employ multiple search strategies to increase the likelihood of positively matching POP identification information contained within a data record to one of a number of pre-stored valid POP numbers.
Yet another object of the present invention is to use independently obtained auxiliary information within a POP search process to improve the likelihood of correctly matching POP identification information contained within a data record to a known valid POP number.
The aforesaid objects are achieved individually and in combination, and it is not intended that the present invention be construed as requiring two or more of the objects to be combined unless expressly required by the claims attached hereto.
In accordance with the present invention, in order to more reliably determine the identity of the POP to which data records containing network monitoring data correspond, a backend data processor of a network monitoring system employs a lookup xe2x80x9cphone-bookxe2x80x9d of known valid POP numbers to determine whether or not the POP identification information contained in the uploaded record matches one of the pre-stored valid POP numbers. Each POP can be uniquely identified by the telephone number (area code and local number) used to connect to the POP (i.e., its xe2x80x9cPOP numberxe2x80x9d). The POP identification technique of the present invention attempts to positively identify the POP associated with each network monitoring data record uploaded to the backend data processing system by comparing the POP number contained in the data record with the pre-stored POP numbers stored in the lookup phone book.
More specifically, one of the data fields within each uploaded data record contains the phone string dialed by the end-user""s modem. As least some of the extraneous characters in the modem-dialed string, such as non-digit characters and characters before commas and after ampersands, are stripped off in order to extract a raw POP string from the modem-dialed string. A series of different lookup searches are then performed by comparing a certain number of the digits of the raw POP string with corresponding digits of the POP numbers stored in the lookup phone book until an exact, unique match is found.
An initial xe2x80x9cpessimisticxe2x80x9d lookup search compares the last or rightmost N digits of the raw POP string with the rightmost N digits of each POP number in the lookup phone book for all countries. The search is first performed with the rightmost nine digits and, if unsuccessful in finding an exact, unique match, the search is repeated with the rightmost eight digits. The approach taken in the initial pessimistic lookup search avoids the need to take into consideration the various different POP telephone number formats (e.g., different length area codes and local numbers) that exist throughout the world and, consequently, the different POP telephone number formats of the POP numbers contained in the lookup phone book.
If the initial pessimistic lookup search is unsuccessful in finding a unique match between the raw POP string and any of the POP numbers in the lookup phone book, a three-stage xe2x80x9coptimisticxe2x80x9d lookup search is conducted in which independent information indicating the user""s country code and area code (e.g., user-configured information uploaded in the data record along with the raw POP string) is relied upon to match a portion of the raw POP string to a POP number in the lookup phone book. Specifically each uploaded data record contains data fields, other than the field containing the modem-dialed string, that indicate the user""s country code and area code. The optimistic lookup search is limited to pre-stored POP numbers of the country indicated by the user""s country code. The length of the raw POP string is determined by the POP rules of the user""s country. For example, in the U.S., the POP rule requires a three-digit area code (AC=3) and seven-digit local number (LN=7), resulting in a ten-digit POP number. Other countries may have multiple, different POP rules. For each POP rule in the user""s country, the rightmost AC+LN digits of the raw POP string are selected and compared with the rightmost AC+LN digits of the POP numbers in the lookup phone book corresponding to the user""s country. If this first stage fails to produce an exact, unique match, a second-stage lookup search is conducted by concatenating the user""s area code with the rightmost LN digits of the raw POP string (i.e., the portion of the raw POP string that represents the local number), and the concatenated digits are compared with the rightmost AC+LN digits of the POP numbers in the lookup phone book corresponding to the user""s country. If the second-stage is also unsuccessful in finding a match, a third stage lookup search is undertaken in which only the local number digits of the raw POP string and pre-stored POP numbers are compared to find a unique match. If multiple matches are found and they all correspond to the same service provider, the POP of the data record is identified only by the matching local number.
If the optimistic lookup search is unsuccessful in finding a match between the raw POP string and any of the POP numbers in the lookup phone book of the user""s country, a final xe2x80x9cpessimisticxe2x80x9d lookup search is conducted by again comparing the last (rightmost) N digits of the raw POP string with the rightmost N digits of each POP number in the lookup phone book for all countries. In this case, the search begins with the rightmost nine digits and is repeated with successively fewer digits down to six, until an exact, unique match is found or multiple matches are found in the same country with the same service provider. If any of the pessimistic and optimistic lookup search finds an exact, unique match, it is determined that the data record corresponds to POP identified by the valid POP number in the lookup phone book whose digits matched those of the raw POP string. If the final pessimistic lookup search finds two or more matches between the rightmost N digits of the raw POP string and POP numbers in the lookup phone book that are in the same country and with the same service provider, the POP is, in effect, only partially identified, and the POP of the data record is xe2x80x9cidentifiedxe2x80x9d by only the N matching digits rather than by a complete, valid POP number. If the final pessimistic lookup search fails to identify a match, the raw POP string is declared unparable. Optionally, a conventional parsing algorithm can subsequently be applied to the modem-dialed string to attempt to extract a valid POP number.
The system of the present invention determines the identity of POPs to which data records containing network monitoring data correspond more reliably than the aforementioned conventional parsing algorithms, resulting in a higher percentage of data records being associated and aggregated with data records corresponding to the same POP. This, in turn, makes the network monitoring reports generated from the network monitoring data more meaningful and more useful in troubleshooting network problems, planning future network resources and demonstrating compliance with service agreements.
The above and still further objects, features and advantages of the present invention will become apparent upon consideration of the following definitions, descriptions and descriptive figures of specific embodiments thereof wherein like reference numerals in the various figures are utilized to designate like components. While these descriptions go into specific details of the invention, it should be understood that variations may and do exist and would be apparent to those skilled in the art based on the descriptions herein.