1. Field of the Invention
The present disclosure relates generally to an improved data processing system and in particular to a method and apparatus for processing data. Still more particularly, the present disclosure relates to a computer implemented method, apparatus, and computer usable program code for isolating network faults.
2. Description of the Related Art
The Internet is a global network of computers and networks joined together by gateways that handle data transfer and the conversion of messages from a protocol of the sending network to a protocol of the receiving network. On the Internet, any computer may communicate with any other computer with information traveling over the Internet through a variety of languages, also referred to as protocols. The Internet has revolutionized communications and commerce, as well as being a source of information and entertainment. Many organizations such as businesses, universities, and governments use the Internet to transact business, as well as communicate with each other.
Various enterprise and service provider networks may be connected to the Internet to provide services to clients both inside and outside the organizations providing the services. Many different metrics may be used to gauge the level or quality of service provided by these types of networks. One metric includes, for example, the speed at which data may be transferred between different points within a network. When a slow down in data transfer happens, it is often difficult to isolate the particular cause of the problem. The potential cause may be, for example, applications, middleware servers, network, storage, clients, configuration, and third-party attacks.
Currently, the fastest way to isolate a fault or cause of a slow down to a particular component or path involves actively testing some part of the component set with active probes. These tests may perform ad hoc data retrieval and/or historical data retrieval. The testing may include identifying port error rates over a last five minutes and search for threshold crossing alarms. These alarms may indicate abnormal behaviors. This information and other types of information may be used to isolate the cause or fault resulting in the slow down in the transfer of data or providing of services.
Difficulties exist with these current techniques. To place probes effectively typically requires prior knowledge of the components and the relation to each other. If the relationship between different components changes how the probes perform or act, the location of these probes may have to be replanned. Further, these types of changes between relationships also may require replanning of ad hoc data retrieval and analysis of historical data. Further, this type of process can be expensive to learn, administer, and use.
Another alternative involves receiving reports from customers or users about the degradation of service. Handling these types of calls and verifying the actual presence of a degradation in service also is expensive. Once a degradation of service has been verified, the issue may be resolved by signing the issue to a particular group of specialists. In some cases, multiple groups may be assigned to the issue, depending on the types of components handled by the different groups. For example, one group may be an application group while a second group is a hardware or network group. Both of these groups may review the issue in an attempt to isolate the problem. As a result, much time and effort is required to handle slow downs or service degradation problems.
Thus, it would be advantageous to have an improved computer implemented method, apparatus, and computer usable program product for solving the problems discussed above.