Nearly all software providers distribute, in some form or fashion, software updates for the software products and services they have sold to their clients. Prior to the prominence of the Internet, software providers distributed software updates on some form of computer-readable media, such as a floppy disk, a magnetic tape, or an optical disk. Moreover, in pre-Internet times, software providers offered software updates on an infrequent basis, such as on an annual or semi-annual basis. Now, however, with the ubiquitous prominence of the Internet, nearly all software providers distribute software updates in an online manner, where a client simply downloads an available software update to the client computer over the Internet. Furthermore, software updates are now available much more frequently, such as on a weekly, bi-weekly, or monthly basis.
While software providers take various quality control measures, sometimes extensive measures, to ensure that their software updates download and install on a client device in a problem-free manner, they also realize that there may be cases where there are problems with a software update, such as failure to download to a client computer, and failure to install, as well as more worrisome problems such as failure to operate the updated software application, or another application, after the software update has been installed. Thus, for a variety of reasons, including ensuring the quality of the distribution and installation of software updates, as well as simply collecting data to provide accurate statistics as to the reach of an update and forecasting growth areas, software providers frequently monitor or gather information regarding the distribution, downloading, and installation of software updates.
FIG. 1 is a pictorial diagram illustrating a typical event collection system 100 for collecting software update distribution and installation information from client systems, as found in the prior art. Typically after each phase of software update distribution, including determining whether an update is available, downloading an update from a distribution location, and installing the update on the client computer, client computers report update event information to the software provider detailing the success or failure of each phase.
As shown in FIG. 1, client computers 102-110 send update event information through a communication network, such as the Internet 112, to a collection service 114 provided by or associated with a software provider. Upon receiving update event information, the collection service 114 will typically store the update event information (at least temporarily) in update event store 116. As indicated above, the update event information will typically include some indication as to the success or failure of the particular reported event (download, install, execution, etc.). Furthermore, update event information may include a success indicator that supports a range of values corresponding to the event, including success, failure, client cancellation, success upon reboot of the client computer, and the like.
Clearly, one of the problems (at least to this point) of collecting software update event information is the large amount of information that must be collected, organized, and evaluated. As mentioned above, for each update, numerous update events may be transmitted to the collection service 114. For software providers having a large client base and/or numerous products, this problem can be especially acute. For example, Microsoft Corporation's family of Windows® operating systems boasts a customer base of nearly 200 million installations. Assuming that a software update is released that addresses a critical security issue, such as a weakness exploited by a computer virus circulating and infecting computers worldwide, nearly all of the installed base will want to download and install the software update as quickly as possible. Further assuming that at least two update events (which is very conservative) will be sent by each client computer to the collection service 114, one after downloading the update and one after its installation, 400 million events will be sent to the collection service in regard to that one software update! This number of events, of itself, can overwhelm many computer systems. Of course, as those skilled in the art will appreciate, any given software update may include a prerequisite that one or more other software updates must be previously or concurrently installed. Moreover, each prerequisite update will also generate at least two software update events. Considering that updates are frequently released on a periodic basis, sometimes as many as ten updates per month, it is easy to see how the 400 million events can easily grow into well over a billion events (10 updates times 2 update events times 200 million installations), and how these update events are concentrated into a very small timeframe.
In addition to planning for and accommodating very large numbers of update events over the Internet 112, a software provider must also have some plan for processing the reported information into useable data. When such large numbers of update events are reported, processing each and every event is almost always impractical. As such, software providers usually rely upon a statistical sampling of the reported events, one that yields accurate results on a relatively small sample of the entire event population. Thus, as shown in FIG. 1, the collection service 114 selects an update event sample 118 from the update event store 116 (based on selection principles that support statistical sampling), and uses that sample to generate an update event report 120 for the software provider to ensure quality control, determine the reach of any software update, predict growth areas, and the like.
Clearly, receiving update events from each client computer for each software update, while viewed as necessary by many software providers, also poses numerous and significant challenges, including handling the potential network traffic, temporarily storing the received events, and processing the received events into meaningful data. In light of these issues, what is needed is an efficient network collection system for sampling information prior to submitting it to a collection service, i.e., client-side event sampling. The present invention addresses these and other issues found in the prior art.