The present invention relates to a management apparatus that manages a network that connects a group of computers, a management method, and a recording medium.
Use of data centers that provide cloud services has been developing in order for corporations and organizations to utilize computer resources and applications in an inexpensive and flexible manner. If a communication fault occurs in the data center, the manager of the data center needs to quickly respond to customers and applications that have been affected by the communication fault. Conventionally, customers that have possibly been affected by communication faults (hereinafter referred to as “potentially affected customers”) are identified using static configuration information of the data center (connection and setting information for the server and communication apparatus, for example).
However, the information that can be gleaned from static configuration information is whether there is a possibility that communication of the customer passes through where the fault has occurred. It would therefore be unclear when a fault has occurred whether the customer was actually engaging in communication, and whether the customer was actually affected by the fault (such customers are referred to “affected customers”) or not (such customers are referred to as “unaffected customers”). Thus, if there are many potentially affected customers when the fault has occurred, the manager of the data center would be unable to distinguish between affected customers and unaffected customers, and in some cases would respond to unaffected customers before affected customers.
Therefore, in order to distinguish affected and unaffected customers, it would be necessary to determine whether the customer was using the network where the fault has occurred when the fault has occurred on the basis of whether the customer was engaging in communication. JP 2011-188422 A and the specification of US 2009/0180393 A1 are disclosures of such a technique.
JP 2011-188422 A proposes a method in which session information is managed using a resource management apparatus, and when a fault occurs, the customer affected by the fault (corresponding to the above-mentioned affected customer) is identified by comparing the fault information (fault occurrence location, fault occurrence time, time of recovery from fault) with the session information. The session information disclosed in JP 2011-188422 A refers to information that is a combination of the start time and end time of communication and the service endpoints (source and destination IP (Internet Protocol) address).
The specification of US 2009/0180393 A1 proposes a packet sampling method by which the topology of a network is estimated on the basis of communication packets flowing in the network, and anomalies in quality in other networks and the affected range thereof can be determined using the topology information and packet sampling information.
However, even with the techniques of JP 2011-188422 A and the specification of US 2009/0180393 A1, it is not possible to quickly identify customers that have been affected by communication faults in the data center, that is, affected customers. For example, in JP 2011-188422 A, it is assumed that the resource management apparatus would use session information, but it is difficult for a data center used for cloud services to manage session information. Specifically, in order to know the start time and end time of communication in a session, it would be necessary to always gather and analyze all communications in the data center and to determine the start and end of each session. However, it is difficult to analyze in real time all communications in a data center where vast amounts of communication occur, and it would not be possible to quickly identify customers who were affected by the fault. Also, in the specification of US 2009/0180393 A1, only communications with a large traffic are sampled, and thus, it is not possible to ascertain the communication usage for users with small traffic.