The present invention relates to data management systems. More particularly, the present invention pertains to a method and system for optimizing ranking queries in analytic applications.
Due to the increased amounts of data being generated, stored, and processed today, operational databases are constructed, categorized, and formatted for operational efficiency (e.g., throughput, processing speed, and storage capacity). The raw data found in these operational databases often exist as rows and columns of numbers and code that appear bewildering and incomprehensible to business analysts and decision makers. Furthermore, the scope and vastness of the raw data stored in modern databases render it harder locate usable information. Hence, xe2x80x9canalytic applicationsxe2x80x9d have been developed in an effort to help interpret, analyze, and compile the data so that it may be more readily understood by a business analyst. These applications map, sort, categorize, and summarize the raw data before it is presented for display, so that individuals can interpret data and use it as the basis for making decisions.
Given the many different types of source data available, and the many ways that the source data can be transformed and combined, enormous amounts of data are available that can be difficult for a user to navigate. This is especially true for users who are executing multiple rankings in a single query. An example is a user who needs to know the top 10 products of the top 10 customers. Currently, this query is conducted by combining the tables of sales, customers, and products. Then, the needed data are extracted and presented to the user. A search of this sort generates huge amounts of data as the tables multiply each other as they are joined.
When multiple data tables are joined, the typical analytic application will yield a Cartesian product. This means the application joins the tables in all possible ways and then discards data it does not need. For example, if a company has 100,000 customers and 1000 products, a table with 100,000,000 entries can be generated. This is very resource expensive in terms of the processing power needed and the time spent generating this table. For every additional ranking that occurs in the query, the multiplication of entries continues. This means that huge tables of data are generated before usable data are extracted. This is a problem for today""s analytic applications because typical queries involve accessing a large number of tables in a single query. The current trend is toward increasing levels of detail in reports, which will involve accessing even greater numbers of tables in query searches.
Hence, it is desirable to facilitate access to relevant information in the databases defined by analytic applications, especially when conducting queries which are performing multi-level ranking operations. It is also desirable for the user to be able to conduct such a query without generating excess data which must be sorted through. The present invention provides a method and system that meet the above needs.
Accordingly, the present invention will facilitate the access of information in databases defined by analytic applications, especially when conducting multi-level ranking queries. Users can conduct multi-level ranking queries without generating excess data and wasting valuable resources. These and other objects and advantages of the present invention and others not specifically recited above will be described in more detail herein.
Embodiments of the invention are directed to a method of optimizing ranking queries in analytic applications. In one embodiment, a query ranking command constructed by a user is divided into a series of sub-commands which are executed sequentially as a batch command. As data are extracted from each sub-command, it is stored and used to filter out unnecessary data for the next query in the batch. By dividing the query into smaller components, the invention prevents the analytic application from generating excess data, thereby speeding the query process. The extracted data are joined after the last sub-command is completed. This keeps the size of the tables to a minimum while the query is in progress which also speeds the query process.