1. The Field of the Invention
The present invention relates to collecting metric data. More specifically, the present invention relates to configurable collection of computer related metric data.
2. Background and Related Art
The continued popularity of a product often demands that product manufacturers conduct ongoing product improvement. Central to effective product improvement is data on how consumers actually use the product. Various methods exist for attempting to obtain this information. For many products, it is common to employ a group of people, known as a “focus group,” whose members are asked to use the product and provide specific comments to the manufacturer either verbally or in writing. Focus group studies are helpful because they can often be conducted before a product, or an improved version thereof, is released to the general public. The manufacturer can thus consider pre-release refinements to the product. Following a product's release to the public, a manufacturer may also obtain information concerning a product's usage by, for example, monitoring calls to the manufacturer's customer service department. Similarly, the manufacturer can monitor consumer comments from various other sources in an attempt to address such comments in a future version of the product.
Effective product improvement has become particularly important for computer software products to remain competitive. The past twenty years have witnessed an exponential growth in the use of personal computers. Driving this popularity to a large extent has been the availability of computer software that users find appealing. At an early point, software for personal computers was largely character-based and employed a limited number of commands whose use could be generally predicted. Thereafter, personal computer software evolved to the now-familiar graphical user interface, such as that exemplified by the Microsoft Windows operating system products.
The shift to a graphical user interface provided many advantages for the user, such as simplifying the knowledge required to effectively use certain computer software. Graphical user interfaces also offered increased user flexibility regarding use and configuration of the computer. As a result, the permutations of individualized usage of personal computers multiplied. Software manufacturers have an increased need to predict and understand how users actually use a personal computer and the software thereon in order to make product improvements that are meaningful for a broad segment of a user population.
To address this need, computer software manufacturers have employed traditional product usage analysis techniques. For example, often a preliminary, or “beta,” version software is made available to groups of users who use the software and provide comments to the manufacturer. As with products generally, this approach requires a software manufacturer to rely on users' descriptions of software usage. Information can also sometimes be obtained from customer support incidents relating to the software.
While this methodology is helpful in the software area for identifying some pre-release product problems, it does not always provide comprehensive feedback to the manufacturer about how consumers use the software. For example, if a user experiences difficulties with the software and does not communicate these to the manufacturer, the manufacturer can lose potential insights for product improvement. Moreover, if the software contains features that are not used by a significant user population, the manufacturer may have difficulty in learning of such potentially unnecessary features. In addition, it is often difficult for a manufacturer to precisely gauge the spectrum of hardware and telecommunication environments in which the software is actually used. Product capability could be enhanced by better targeting the software to the actual computing environments in which it is used.
In short, the feedback provided to a software manufacturer by traditional product analysis methods has often become too generalized. Particularly with respect to modern computer software, the feedback often fails to provide a comprehensive picture of hardware and software usage and hinders the quick improvement of software to meet users' demands.
As computer hardware and software usage grows, it is becoming increasingly important to obtain up-to-date performance and usage data from a statistically significant population of users. Traditional techniques are becoming less workable, particularly as users of a given software can now number in the tens of million. Moreover, the current approach leaves many informational gaps in communicating how users actually used a product. These limitations are likely to become more significant, particularly as Internet-enabled, embedded computerized devices proliferate, such as microprocessor-equipped home appliances and other common devices.
To address these and other needs, some mechanisms have been developed for enabling a software manufacturer to record a set of data points about a computer while it is executing an application. The data points contain measurements concerning a status, condition, action, event or other measurable property about the computer. The data point information is thereafter transmitted to a central computer for analysis so that the manufacturer can obtain timely and precise feedback about how an application is being used.
For example, an application can be adapted to measure predetermined parameters about the usage, performance or status of a local computer on which the application is running. Applications that have been adapted to measure predetermined parameters can be referred to as “instrumented applications.”The parameters to be measured are determined by the software manufacturer and can include information such as the processor speed of the computer system, the amount of its random access memory or the speed of the computer's Internet access. Upon execution, the instrumented application initiates an instrumentation session and measures the predetermined parameters to obtain values (potentially one or more values for each parameter). The instrumented application then represents the parameters and the corresponding values as data points. A single value data point can record a numeric or alphanumeric value, such as the amount of the computer's random access memory (RAM). A multiple value data point contains a series of numeric or alphanumeric values whereby the order of the values within the stream indicates the order in which the events or other parameters occurred, such as a list of clickable links the user selected.
An instrumentation session can end when a user exits from the instrumented application or when no other parameters are to be measured (even if the instrumented application is still active). When an instrumentation session ends, maintained data points are saved in a session file at the local computer. The local computer system then attempts to transmit the session file to an upload server computer for further processing.
Due to the potential volume of data from multiple instrumentation sessions on multiple computers, session files from various instrumentation sessions can be processed in a distributed server computing environment using queues. As session files are received at an upload server, the upload server examines each session file to determine whether it should be retained based on predetermined criteria. Retained session files are written to a transfer file that is stored in a transfer file queue for transmission to a processing server. The processing server receives the transfer file, parses it to extract a predetermined subset of data points and loads the subset into a raw data database table. The raw data database table information is then summarized according to predetermined criteria and stored in a data warehouse for on-line analytical processing (OLAP) and reporting concerning the measured parameters.
To adapt an application for instrumented functionality, a software manufacturer would insert additional source code statements into the application's source code to measure parameter values at the point during execution when measurement of the selected parameter is desired. For example, a source code statement could be inserted at the logical beginning of the application's source code to measure the local computer's total random access memory and to obtain a value thereof shortly after execution begins. During generation of a corresponding executable file, these additional source code statements are compiled into the executable file. The executable file is subsequently delivered to an end-user. During execution, parameter values are measured in accordance with the additional source code statements.
Unfortunately, since instructions for implementing application instrumentation are compiled into a corresponding executable, the development and financial resources needed to modify what predetermined parameters will be measured or when the predetermined parameters are measured can be quite large. For example, many existing commercially available applications (e.g., word processors, electronic mail clients, etc.) include thousands or even millions of lines of source code statements. Due in part to the magnitude of source code statements, altering or adding even a small portion of source code statements to an existing application may require skilled developers and quality assurance personnel to spend hundreds of hours before some level of reliability in the altered or added source statements is achieved.
However, due to varied computer and network environments or simply to the desire of an administrator, the parameters that are to be measured and when measurements are to occur can change (or vary in importance) with much greater frequency than applications can be instrumented. The inability to efficiently instrument applications based on changing needs, can result in the generation of extraneous data points, insufficient data points, and/or data points that have limited usefulness. For example, measuring data points characteristic of a 64-bit computer may only be relevant when an instrumented application is executed on a 64-bit computer. However, when the instrumented application is run on a 32-bit computer, measurements of the characteristic 64-bit data points may none the less be attempted.
The generation of extraneous data points can unnecessarily consume network bandwidth. That is, an instrumented application may generate a number of data points corresponding to a number of different measurable parameters. All the data points may be transferred from the local computer over a network to a data warehouse. However, analytical processing on the data points may be directed to only a subset of the generated data points. Thus, the remaining generated data points are not used (and may in fact not even be needed) but are still transferred over the network.
On the other hand, failure to generate desirable data points can make determining how an application is being used more difficult. For example, failure to measure both wired and wireless data packets transferred by an application may make it more difficult to identify heavily utilized operating environments. Further, since an instrumented application is often a binary (and thus data collection is hard coded into the instrumented application), collecting different sets of data from different users is relatively difficult. Additionally, instrumented applications provide little, if any, ability to turn on a reduced set of instrumentation at run-time and then turn on additional instrumentation for all or a subset of users without having to replace the user's binaries. Therefore, what would be advantageous are mechanisms that facilitate more flexible collection of computer related metric data.