The low cost of data storage hardware has led to the collection of large volumes of data. Merchants, for example, generate and collect large volumes of data during the course of their business. To compete effectively, it is necessary for a merchant to be able to identify and use information hidden in the collected data. This data could include shop floor sales, and where the merchant operates a website, the use that is made of a website may also be collected. The task of identifying this hidden information has proved very difficult for merchants.
It is also important for other individuals and organisations to analyse stored data. Each time a game of sport is played, there is generally a large volume of data collected. For example, a game of rugby union generates statistics such as total number of points scored, the number of tries scored and the number of tries scored which are then converted. There is an increasing trend toward analysis of collected data with a view to analysing opponent strategies and as a coaching aid in assessing the strengths and weaknesses of a particular team. It is also especially desirable with televised sports to present the collected data to spectators in a form which is easily interpreted.
Traditionally, analysis of data has been achieved by running a query on a set of data records stored in a database. The merchant or other party first creates a hypothesis, converts this hypothesis to a query, runs the query on the database, and interprets the results obtained with respect to the original hypothesis.
One disadvantage of this verification-driven hypothesis approach is that the merchant must form the desired hypothesis in advance. This is merely confirming what the merchant already suspects and does not provide the merchant with information which may be unexpected. Another disadvantage is that the merchant needs to have available the technical knowledge to formulate the appropriate queries.