Software analysts are relying more and more on “data warehousing” in order to collect and to interpret massive amounts of external data. As one scenario in which data warehousing represents an advance over previous techniques, consider a manufacture of a consumer product that must continually improve its product offering in order to remain competitive in the marketplace. To do this, the manufacture would like to know how its customers actually use the product. In the past, the manufacturer could employ a “focus group” where people would use the product in a simulated setting and then provide specific comments to the manufacturer. The information collected from the focus group could be combined with data gleaned from calls to the manufacturer's product support center. Together, this information provided valuable insight into the strengths and shortcomings of the product.
While still useful, these techniques fail to capture the full range of user interactions with the product. First, by relying on customers' descriptions of their product usage, these techniques present usage patterns that are always incomplete and that are often simply incorrect. Second, and more seriously, these techniques cannot reflect the large proportion of usage that is not reported. For example, rather than calling for product support, many customers simply give up when they cannot understand how to use a feature, or they may develop an alternative method to achieve their desired result, an alternative not considered during the manufacturer's development. Third, seldom used features of the products may not produce enough usage data to be meaningful, even though those features may be vital to an important segment of the user population. Fourth, if the product is usable in many different computing environments, then these traditional techniques usually miss the nuances of the effects of those environments on the product's patterns of use. In short, the feedback provided to the software manufacturer by traditional methods of product analysis is often too generalized. By failing to capture comprehensive usage patterns, these techniques fail to adequately support the manufacturer's desire to quickly improve its products to meet ever changing user demands.
In order to obtain up-to-date performance and usage data from a statistically significant population of users, software products can now constantly monitor themselves while they run. These products produce datapoints containing measurements of a status, condition, action, event, or other property of the product or of its working environment. The datapoints are sent to a data warehouse at a central computing facility for processing and analysis. The manufacturer then queries the data warehouse to obtain timely and precise feedback about how its products are used in the real world.
The above description intentionally simplifies the intricate, but well known, process by which raw datapoints are converted into useful information in the data warehouse. This processing of raw datapoints as they enter the warehouse provides many of the strengths of data warehousing. By processing the raw datapoints in predefined ways, the data warehouse continuously creates information to support answers to queries posed to the data warehouse. This processing makes the data warehouse much more responsive when queried than would be a traditional database of raw datapoints which would have to process its contents to develop an answer for each query, a clearly infeasible task when confronted with massive amounts of usage data. As another benefit, which is very important when the raw datapoints are provided by a population of consumers, the processing removes any personally identifying information found in the raw datapoints, preventing the data warehouse from tracking individual usage, but still allowing the data warehouse to uncover important usage trends.
However, the benefits of processing the raw datapoints come at a cost. To set up the processing, the data warehouse is carefully configured before it can begin to accept incoming datapoints. This configuration, and any reconfiguration needed to support a new set of queries, has required the services of a select group of data warehousing experts. Depending upon this group both increases the costs of configuring the warehouse and limits the speed with which the data warehouse can be reconfigured.