Field
The present invention relates to computer processing, and particularly relates to a method and system of monitoring a service object.
Related Art
With the rapid development of Internet technology, the development of various application services and Web services such as instant messaging and search is advancing quickly. To monitor various application services and network services, generally one needs to sort all processes (e.g., using top command) in the kernel of an operating system according to factors such as resource usage. For example, this may involve collecting statistics of connections (e.g., netstat command) in the system and memory usage (e.g., slabtop command).
In a high performance computer, it is becoming more common to collect statistics for a million or more objects. For example, with high-traffic services such as instant messaging and search, and cloud computing-based services (e.g., Infrastructure as a Service (IaaS)), it is very common that a single computer reaches a million or even more connections. The system typically needs to obtain statistics in real-time for millions of objects. For example, out of millions of connections, the system may collect statistics for the 100 connections with greatest traffic.
At present there are many approaches to collecting statistics, and collecting statistics involving a unit of time is generally based on a timer. Using the collection of traffic statistics as an example, to trace the traffic conditions within one second for each connection, the system may need to start a timer when establishing each connection, and set a timeout duration to 1 second. This way, each time the timer times out, the system calculates the traffic within the previous second, and performs sorting.
Such timer-based approaches to collecting statistics is acceptable in cases where the number of objects is relatively small. However, since timer-based approaches need to interrupt context and perform traverse, in cases of a million or more objects, the system needs to perform as many timeout operations. Since the timing is generally very short, this type of timing operation may be very frequent. It is likely that multiple connection timers may timeout simultaneously. In this case, the computer's performance may consume enormous amounts of resources, even to the point that the computer is no longer capable of operating and may crash.
In addition, current statistical approaches may collect statistics at different parts of the kernel. When the system load increases, a statistics module's resource consumption may also increase, and use too many resources in an uncontrolled manner. This may reduce the normal resource consumption of a service module, thereby degrading the performance of the service module.
Further, current statistical approaches are typically carried out independently according to specific scenarios (e.g., traffic, storage, memory), and cannot be applied to other service scenarios. In addition, with current statistical approaches, during hot upgrade of the service modules, previous statistical information may be lost. Re-initialization of the logic for collecting statistics is also time consuming.