1. Technical Field
The present invention relates to distributed discovery in a computer environment, and more particularly to a parallel discovery system and method that reduces network traffic and improves efficiency.
2. Description of the Related Art
Most if not all information technology (IT) optimization tasks require IT asset, configuration, and operation characteristics discovery performed before the actual IT optimization can be planned and executed. A common strategy is to perform discovery in stages: first discover basic information and then discover more detailed information. For example, one usually starts from scanning networks to list the hosts and their IP addresses. After that, a per-host discovery may be performed to get basic system information. Next, it is usually desirable to discover per-middleware information.
Each of these stages frequently requires several sub-stages because certain information cannot be discovered without first discovering some other information. Here are two practical examples: 1) Each Distributed Computing Environment (DCE) or Andrew File System (AFS) cell member possesses information about the overall cell configuration including very detailed information at the level of per-directory data locations on each file server. Unfortunately, gathering this information takes time, and large volumes of collected information add network and storage overheads and inconvenience for these who perform the scans. So it is desirable to first discover cells and lists of their members and only after that gather cell-specific information on only one member node. 2) Network topology or security zone discovery may require sending out probing packets. If all servers being discovered send out such probing requests, they will generate noticeable network traffic and may easily trigger network detection systems leading to network outages of whole data centers. Therefore, subnetworks should first be discovered and only one subnetwork member should be asked at a time to send out the probing packets.
Unfortunately, each discovery stage costs money and takes time. For example, if system administrators are involved in the discovery process, they would need to run the scripts twice: first to locate cells or subnetworks and next to run some other script on a smaller set of servers. The problem is exacerbated by the fact that it is usually impossible to establish any information exchanges between the servers being scanned for the discovery scripts to make a collective decision during a first run.