Software instrumentation generally includes software entities used for collecting, storing and retrieving performance metrics of a computer system. These software entities typically include writer entities for collecting the performance information and storing that information in a designated area of main memory, called a repository. The repository preferably holds the performance information for retrieval by reader entities, which may include application programs executing on the computer.
Namespace services impose a uniform structure on the information stored in repositories. A namespace is a collection of information managed by an operating system of the computer and a namespace service, or registry, is the entity that stores and organizes that information. The registry generally provides an application programming interface (API), which is a mechanism for an application program to obtain services from the operating system. APIs typically include a collection of system calls to the operating system requesting, for example, establishment of network connections on behalf of an application. Specifically, the application may, via system calls to the registry API, create, modify, request, add and delete information in the registry.
The performance information collected at the registry typically includes metrics relating to components of the computer system, such as a central processor unit (CPU), main memory, the operating system and input/output (I/O) system. Examples of the collected information include loading metrics of the CPU and bandwidth parameters of the memory, along with timing latencies for execution of a particular request involving the I/O system, e.g., how long it takes to complete the (entire or portions of the) request.
By collecting and retrieving performance information, the software instrumentation provides access to the internal state and behavior, i.e., "views", of the operating system and application software executing on the computer. Operating systems are complex pieces of software configured to, e.g., handle asynchronous events within a computer (such as interrupts from I/O devices), provide interprocess communication capabilities and implement complex network protocols. Operating systems also control execution of application programs; instances of those programs in execution are called processes.
Knowledge of the internal characteristics of an operating system and application processes is useful for debugging, optimization and design verification of a computer. The internal views provided by software instrumentation fall into two general categories: (i) tracing, which provides a view into the behavior of a software program by recording the time and details of its state at interesting points in a running system, and (ii) statistics, which record when and how resources, such as device drivers of the operating system, are used.
Tracing is a broadcast form of interprocess communication with many source processes (e.g., writer entities) and sink processes (e.g., reader entities) capable of observing each other's execution; a trace, therefore, consists of a display that chronicles the actions and results of program execution. Specifically, the trace provides a detailed record of the program's execution path by, e.g., taking an application program and placing it under observation by a special routine that monitors the progress of the program.
Performance information obtained by writer entities engaged in tracing operations are typically provided to the registry as trace messages. Each trace message has a variable length and is stored in portions of the registry configured to accomodate such variable-length messages. Specifically, each message is stored in a trace buffer having a circular structure consisting of a fixed number of equally-sized entries; each entry generally has a capacity sufficient to store a portion or fragment of the variable-length trace message.
A plurality of messages and corresponding fragments are generally interleaved within the trace buffer, which is typically shared among the software entities. As a result, the trace buffer may be accessed by multiple writer entities attempting to load messages into the buffer and multiple reader entities attempting to retrieve those messages. The broadcast nature of tracing indicates that all reader entities may observe successful retrieval of each trace message. Typically, the reader entities determine whether there are any new messages to retrieve by constantly polling the trace buffer; depending on the quantity of readers continually accessing the buffer, such activity could completely exhaust system resources.
Each writer entity, on the other hand, merely "posts" messages in the buffer independent of reader activity. Prior to posting those messages, the writer must typically process the messages to determine their lengths so that it can allocate a sufficient number of entries of the buffer. However, coordinating and managing such allocation may be difficult, particularly when attempting to recover from a writer entity that "dies", i.e., stops functioning, in the midst of processing a message. If such recovery is not swift and efficient, other writers may be "blocked" from placing messages in the buffer, thereby disrupting operation of the system.
Therefore, it is among the objects of the invention to provide a mechanism for limiting the quantity of reader entities constantly accessing a trace buffer of a registry so as to conserve system resources.
Another object of the present invention is to provide a mechanism for ensuring that a writer entity does not block other writer entities from posting messages in a trace buffer of a registry.