1. Field of the Invention
The present invention relates generally to database queries and more specifically to expediting database query results by dynamically querying a combination of multi-dimensional databases, fractional databases, or data warehouses for fast data analysis.
2. Introduction
A common method of obtaining information from a database is to perform a database search. Google.com is an example of a well-known website that enables users to perform searches across large databases and retrieve results. Searching databases may be performed in a number of ways. For example, a business intelligence company may need to query a specific set of databases that contain large amounts of data for a particular search. In some cases, the amount of information is quite large and search results could be desired quickly. While querying large databases or warehouses can offer accurate results, performing the query can be time consuming and may not be useful when immediate query results are needed. The time necessary to perform a full search of a database or group of databases to produce results of a certain level of accuracy may take longer that the user desires and provide results that are more accurate than what is actually needed.
Several approaches have been used to seek solutions to provide quicker query results. Multi-dimensional databases (MDDBs) and sampled databases are common solutions for expediting query results. In addition, software and/or hardware driven parallel computing technologies are also used to improve database performance.
MDDBs may be referred to as summary cubes. MDDBs are created by captured common statistics or common business metrics (also referred to as measures) across common business groupings (also referred to dimensions). Common dimensions or grouping patters may be identified so derived member databases (data cubes) are constructed to optimize at least one parameter, such as query response times. A measure is a business metrics or a measure attributes. In the case of MDDB, common statistics are applied on the selected measures. A Dimension is a grouping attribute. In the case of a MDDB, common measures are aggregated across frequently used dimensions. A fractional or sampled or approximate database (FxDB) (also known as a data mart) is built by taking representative sample of a larger database. As the name suggests, the results from sampled databases will necessarily be approximate. A data warehouse (DW) is a Full Databases (FullDB) containing all the available information in a database or group of databases.
MDDBs address the challenge of immediate access to data by pre-computing a variety of measure attribute statistics across a number of common grouping attributes. The grouping attributes are also referred to as dimensions and measures attributes are referred to simply as measures in the context of MDDBs. However, MDDBs are typically most effective for data analysis only within the predefined dimensions and their member groupings. Users often revert to time consuming querying of large databases whenever database results are required across categories different from predefined member groupings available in MDDBs. Users sometimes resort to adding more number of dimensions to address part of these challenges which in turn results in very large or unmanageable MDDBs which also could defeat the purpose of immediate data access.
Approximate or sampled databases are another alternative to expediting data access. Sampled databases address the challenges of predefined groupings as the detailed data within it can be grouped exactly like its original databases. Sampled databases can be pre-built to offer approximate results within a desired level of accuracy for most queries and may not be effective for detailed queries resulting in few rows compared to original results.
Often multiple sampled databases are built to address different functions in a business. Each of these sampled databases could be built with a focus on the common groupings within that function, for example, marketing may focus on program or campaign tracking while risk management may focus on portfolio wide risk metrics with different groupings or dimensions in each. One of skill in the art should appreciate that building one large sampled database with all the possible groupings could result in a sample size that is too large, defeating the purpose of a sampled database. Multiple sampled databases built to focus on frequent sets of grouping attributes can result in duplication of databases records across sampled databases, which may lead to wasted storage space due to unnecessary duplication or may lead to data inconsistencies.
Sourcing samples dynamically to meet desired accuracy levels can address the challenge of volatility of approximate answers from pre-built sampled databases but can be more time consuming than querying pre-built sampled databases. Accordingly, what is needed in the art is an improved method for improving database searching such that expedited results are provided at an appropriate level of accuracy.