There are many practical situations when a large data set needs to be analyzed and summarized. However, examining the entire data set would involve a prohibitive computational cost.
As one familiar example, consider the shopping-basket data collected by a large on-line grocery retailer. The management would like a report of the sales volume for most popular items and of the number of customers which purchased each of these items.
As another example, from the area of data management, more specifically from the area of statistics collection for an XML (Extensible Markup Language) data management system, consider a collection of XML documents stored in a database. A query optimizer component needs access to histograms of the most frequent paths in the document collection and the number of documents which contain each of these paths.