Modern computers receive and transmit an increasingly large amount of packets. These packets represent communications with other computers, such as requests for services and data, for example. Because of the large volume of packets sent and received, it is difficult to determine problems and performance issues with the various services on the computer or that are available on the network.
Existing solutions to this problem generally rely on processing some type of alarm or error data to determine the cause of a particular problem. Other solutions repeatedly probe existing services to determine if they are available or working correctly. Processing alarm data fails to solve the problem because it may require a substantial infrastructure to be added to the system or network and may require dedicated hosts or applications to collect, aggregate and analyze the data. This alarm information is typically routed or collected at a central location, and may require data from various interdependent services. Often, the data is not particularly detailed, and it can be therefore difficult to determine what were the underlying causes of the alarms.
The repeated probing of services is also flawed in that there is a delay introduced before a problem can be detected. For example, if a system is set to probe each service every five minutes, then a problem may possibly go undetected for up to five minutes. Probing may also fail to detect intermittent errors, where a service is only working some of the time, but just happens to work when probed.
Furthermore, the perception of the behavior and performance of services may change depending on the viewpoint. For example, mailboxes may work for email users in Philadelphia, but may not work correctly for users in Europe. This problem may be a result of an error with the active directory service in Europe, and have nothing to do with the mailboxes themselves. However, conventional methods of error detection would initially attribute the error to the mail server and not the active directory service running in Europe. This added delay could result in increased expenses or losses, for example.