U.S. Pat. No. 5,555,403, which is incorporated by reference herein, discloses techniques for associating user-friendly business names called “business objects” to database entities. The user-friendly business names offer an abstraction of the underlying database schema. The business names are then used to create a symbolic query. The symbolic query is then converted to a database query, typically in Structured Query Language (SQL). The decision of which SQL query to generate determines the semantics of the query.
Today, this decision is in large part based on additional metadata that describe which tables in a logical database schema are related to objects in the symbolic query, and how these tables are related through links called “joins”. Determining a set of SQL statements associated with a user query is equivalent to identifying one or several trees in the graph whose vertices are tables and edges are joins. A graph is a pair (N. A), where N is a set of vertices (e.g., tables, entities) and A is a set of edges (e.g., joins, relations). A tree is a graph without loops where exactly one list of joins relates any two tables. A loop is a non-empty minimal path relating a vertex to itself. The resultant set of trees is called a query path. For the purposes of this description, the technique for finding such trees is sometimes referred to as “pathfinder”.
Existing techniques aim to create physical queries against one or several data sources based on a higher-level query specification that only contains business objects. In order to do so, object-level information, such as the qualification of objects (measures vs. dimensions or details) is relied upon. An entity-relationship diagram that provides a logical model for tables and joins exposed by the data sources or created by the user may also be used.
The tables in this model are created beforehand and they may include logical entities that enrich the expressive power of queries (virtual tables) or remove possible ambiguities upfront (aliases). Likewise, joins are created across these tables either automatically through strategies or manually; they may include shortcut joins aimed at further optimizing query generation. Last, the user creates groups of joins called contexts that are aimed at resolving ambiguous queries. Thus, a context is a list of joins. By specifying a context, one specifies that all the joins used to relate the tables' underlying objects in a query must belong to this context.
When generating physical queries from business objects, several phases are invoked. Initially, the tables associated with specified objects and filters of a query are determined. Next, a path containing joins connecting all these tables is identified. A path is a collection of linked edges and vertices.
The path determines a large part of the query semantics. Quite often this path is not unique and a user is solicited for input. The user can pick a context and exclude some joins, which will restrict the number of possible paths until only one is retained. This iterative process of generating paths and getting user input in case of ambiguity is referred to as pathfinder. The pathfinder technique relies on a higher-level abstraction than the entity relationship model itself, it sees tables as vertices and joins as edges in a graph, and relies on classical graph algorithms to generate candidate paths or detect ambiguities due to cycles in the graph.
Once a unique path connecting all tables in the query has been identified, a large part of the query semantics is determined. Additional decisions can be made: splitting the query if the chosen path is not a tree, applying shortcut joins to enhance query performance, and/or warning users against Cartesian products. Then the chosen path (or each sub-tree of it in the case of a split query) is handed over to the physical query generation subsystem which will generate a physical query (typically in SQL). Whenever multiple flows are generated, the path also contains the information necessary to determine how these flows should be further synchronized.
Existing pathfinder techniques do not rely on any assumption about the cardinality of joins. That is, the pathfinder techniques do not change behavior depending on whether a join has “many to one”, “one to one” or “many to many” semantics. As a result, many of the paths currently generated do not make sense. This results because the existing techniques work on an undirected graph and generate trees that relate all vertices of the query. The techniques do not take into account the orientation of joins—or edges—to propose intuitive query semantics. This is complemented by sophisticated context and alias generation mechanisms, but whether these are properly applied is the designer's responsibility. These mechanisms are tricky and their maintenance is expensive. Any mistake in this area may lead to counter-intuitive query results. Also, too often the user is left with a choice between a sensible and a not so sensible interpretation of a query. This raises a number of problems, which can be partly addressed by a specific technique, but at the cost of additional settings and metadata that increase the cost of design and of maintenance of the semantic layer. For instance, a query may look ambiguous if the orientation of joins is not taken into account, whereas the ambiguity could be resolved automatically using this information. Another problem is that in certain cases wrong calculations could be performed, resulting in a case of “inflated results”.
In view of the foregoing, it would be desirable to identify techniques to reduce the number of query paths considered during the analysis of a database query.