With the growing importance of managing massive repositories of structured, semi-structured, and unstructured data, an important task for database research is the development of frameworks that support a rich set of analytical operations on such large scale. A primary goal for these systems is to achieve a high degree of abstraction in defining the underlying concepts (schema, objects, queries), so that the framework gives rise to a coherent and principled architecture for efficient information management rather than an assortment of ad hoc solutions.
Considering for example, a large collections of media articles that can be segmented by a number of “dimensions”:                By time: articles from April 2004, or from 1978;        By content type: articles from newspapers, or from magazines, and within magazines, from business magazines, or from entertainment magazines;        By geography: articles from the U.S., or from Europe, and within Europe, from France or from Germany;        By topic: articles about a war, a hurricane, or an election, and within these topics, by subtopic.These different dimensions may be correlated; for instance, knowing the content type may give information about the topics. The initial dimension in this list (time) can be viewed as numerical and as the example shows, a user may be interested in intervals of time at various granularities. The other dimensions in this list (content type, geography, and topic) can be viewed as hierarchical.        
Because of the rich nature of the data (in this case, documents), a user is interested in probing the data with queries that are much richer than those in a typical database system. For example, a user is interested in the following types of queries:                What are the ten most common topics in the collections?        From 1990 to 1995, how did articles break down across geography and content type?        Which subtopics caused the sudden increase in discussion of the war? Are these subtopics different among European newspapers?        Break the early 1990s into ten sub-periods that are most topically cohesive, and explain which topics were hot during each period.        Break documents that mention the President by topic so that they are most closely aligned with different content types, to capture how different types of media with different audiences focused on particular topics.        
These queries are rather “fuzzy”: there is not necessarily a clear “right answer.” These queries are further unlike standard database queries in that these queries are resolved not by, for example, searching indices, but instead by an optimization procedure. Furthermore, these queries seek to provide the user with the highlights or salient characteristics of the data set.
An ability to use “fuzzy” queries is especially useful in business intelligence systems and systems using multi-faceted search. Multi-faceted search is used, for example, on product websites. A user searches for products on a site that matches selected criteria, restricting results to a type of product, a product cost, etc. Conventional multi-faceted search technologies require that search restrictions be placed concurrently. However, this approach can return a null set, with no results. The user then has no clear idea of what restrictions to change to obtain results for the query.
The business intelligence systems and systems using multi-faceted search comprise objects. For a business intelligence system, the object can be a transaction; for the system using a multi-faceted search, the object can be a product. Each object comprises different dimensions of ancillary data. A product can have price, manufacturer, features, etc., as dimensions. A transaction can have geography of the transaction, amount of the transaction, product type involved in the transaction, etc.
Several conventional query systems or approaches have addressed a variety of techniques for formulating queries for repositories of structured, semi-structured, and unstructured data. One conventional formulation for queries distinguishes between measures, which are numerical and are the target of aggregations, and dimensions, which are often hierarchical. However, this distinction is not required, and there are formulations in which both are treated uniformly.
Another conventional approach describes a formulation in which each dimension is a lattice, and a joint lattice is defined over multiple dimensions. Although this technology has proven to be useful, it would be desirable to present additional improvements. It would be desirable to perform optimizations over a resulting multi-dimensional lattice rather than to characterize the set of possible queries and hence the candidate materializations that may be considered.
One conventional approach describes, in the context of a cube operator, aggregation functions that may be distributive, algebraic, or holistic. Another conventional approach considers the “BIKM” problem of unifying business intelligence and knowledge management, by creating data cubes that have been augmented by additional information extracted through text analysis.
Yet another conventional approach proposes that online analytical processing systems may support multiple query frameworks, possibly at different levels of granularity. There have been a number of such approaches suggesting frameworks that move beyond traditional hypothesis-driven query models into discovery-driven models. Some of these conventional approaches have utilized data mining to extract association rules on cubes. Another conventional approach identifies areas of the data space that are likely to be surprising to the user. The mechanisms of this approach allow the user to navigate a product lattice augmented with indicators suggesting which cells at a particular location are surprising, and which paths lead to surprising cells. This conventional approach produces exceptions at all levels of the cube, rather than at the leaves. One conventional approach considers partitions of the cube to summarize the “semantics” of the cube; all elements of a partition belong to the same element of a particular type of equivalence relation.
In general, conventional approaches to querying multi-dimensional data specify certain nodes of a multi-dimensional lattice for which results may be computed; these nodes are, for example, all the months of 2003 crossed with all the products in a certain category. Further characteristics of conventional online analytical processes comprise a single entry for each cell of a cube in a fact table and dimensions that are required to be leveled or fixed-depth. In terms of business intelligence systems and systems using multi-faceted search, these conventional systems require well-defined queries.
Although conventional approaches have proven useful, it would be desirable to present additional improvements. What is needed is a search technology that allows a user to focus on discovery and exploration of data, allowing a user to formulate higher-level, less structured queries to search for trends or discover previously unknown correlations in data. Thus, there is a need for a system, a computer program product, and an associated method for performing a high-level multi-dimensional query on a multi-structural database. The need for such a solution has heretofore remained unsatisfied.