The present invention relates to a method and apparatus for storing, retrieving, and processing customer behavior data, account information and the like from a multi-dimensional viewpoint.
Two types of systems for handling data representative of customers, accounts or the like may be utilized, that is, On-Line Analytic Processing ("OLAP") systems (both proprietary and relational database versions) and systems specifically designed for database marketing.
OLAP systems can be viewed as an extension of a spreadsheet paradigm in which a spreadsheet is a two-dimensional view of a data set. For example, product identification (ID) may be arranged on one axis, time on the other axis, and sales as the entry in the data cells. Multi-dimensional database systems may generalize such arrangement to allow more than two dimensions. For instance, in the previous example, in addition to product ID and time, geographical location may also be arranged as a third dimension.
There are a number of products which may present users with a multi-dimensional view of their data. Such products may fall into two groups or systems: those that actually store the data using multi-dimensional data structures (arrays and generalizations of arrays) and those that store the data in a relational database system. The former class of system may be referred to as "MOLAP" (for Multi-Dimensional OLAP), while the latter may be referred to as "ROLAP" (for Relational OLAP). Both systems may answer queries about the contents of cells in a logical multi-dimensional space, which is similar to asking for the contents of a given cell in a spreadsheet. Additionally, they may enable questions to be addressed regarding columns and rows by mathematical computation of columns and rows. For example, it may be desirous to obtain sales by product over all time periods, or sales of all products on a particular date. Further, in both systems, each cell may store a single number or a small set of numbers.
In these systems, the use of "hierarchies" for dimensions may be employed. As an example, consider the "time" dimension. In such dimension, days may be the lowest level in the hierarchy, followed by weeks as the second level of the hierarchy, followed by months as the third hierarchial level, and a fiscal year as the highest hierarchical level of the time dimension. As another example, consider geography. Here, stores may form the first hierarchical level, followed by districts, regions, and countries. Such use of hierarchies for dimensions may facilitate the ease of use of the system, as the data is organized in a logical, user-oriented manner. Additionally, such hierarchies may provide structural information to the system itself that can be used to answer queries efficiently. For example, if the sales of a given product by month are known, the sales of the product for a given year may be computed by summing the sales over the corresponding 12 months. Without the use of hierarchical information, it would be necessary to revert to the lowest level of detailed data to compute the sales for a year which, as is to be appreciated, may be considerably slower.
While the use of OLAP systems may be acceptable for certain types of applications (as, for example, in analyzing the financial performance of a business), they may not be acceptable for use with other applications such as those involving customer-oriented data sets. For example, in customer-oriented data sets, information pertaining to individual customers should be retained. If such individual customer information is omitted, it may be very difficult, if not impossible, to analyze the data set at a unit of individual customers. As a result, database marketing and other applications may not be effectively performed or may even be impossible to perform. To store individual customer information in current systems normally requires that "customer" is one of the dimensions. However, as is to be appreciated, for any reasonably sized data set, such "customer" dimension may be extremely large.
Multi-dimensional database systems typically assume that dimension sizes will be reasonable. As such, "extremely large" dimensions may present a serous problem. More specifically, in current multi-dimensional database systems, a dimension size of several tens or hundreds of elements may be typical, and a dimension with 10,000 elements may be considered very large. By contrast, the customer dimension of a medium sized retailer or financial institution can easily reach 10,000,000 or more elements. If one uses a multi-dimensional database system on such a data set, several problems may arise. First, since the techniques used for good performance in these multi-dimensional systems (heavy pre-summarization and sophisticated indexing) are not effective, the performance may degrade such that interactive use is very difficult or impossible. Second, the query paradigm may not fit with the analyst's goals. This mismatch arises because a standard way of displaying a multi-dimensional query result (a table or graph) may be of limited value when one of the axes has a million or more elements.
Due to the above-described limitations, current multi-dimensional database systems may handle customer-oriented data sets having a relatively large customer dimension in one of two techniques. In a first technique, individual customer information may be omitted, whereupon, such system is really a merchandise sales analysis tool rather than a customer analysis tool. In a second technique, the large number of customers (which may be 10,000,000+ customers) is statically segmented or arranged into a small number of groups, and all future analysis is based on those segments rather than on the individual customers comprising the segments. Thus, both of these techniques may lose or obliterate individual customer information which is a substantial portion of the economically critical information that true customer-centered data sets may contain. For this reason, a typical multi-dimensional database tool may not be effectively utilized for customer-oriented data processing.
Relational database systems, on the other hand, may store and process relatively large data sets. However, the models embodied in relational database systems are typically very simple and generic. For instance, all data may be represented in two-dimensional tables. Further, such models may be insufficient for many business intelligence applications. At best, a relational model of a relational database system may be used as a lower-level substrate upon which to build more sophisticated and useful model. (Relational multi-dimensional data analysis tools are examples.)
Therefore, neither relational database systems nor multi-dimensional OLAP tools may be effectively used for customer-oriented data analysis. In an attempt to handle such analysis, a special purpose system tailored explicitly to process a large list of records may be used. Such system may be used in database marketing applications. There is no multi-dimensional paradigm in these systems; typically the data is represented in a very primitive structure, usually a so-called flat-file or a flat file and a collection of so-called inverted files based upon that flat file. (A "flat-file" is a file of records without any extra structure imposed thereon. That is, a flat file may have no index structures to speed access to records within the file. On the other hand, an inverted file is an auxiliary file based off of a main file but sorted on another attribute. For example, consider a situation wherein a base file has customer ID, store ID, and purchase amount which are sorted in order of increasing customer ID number. In such situation, an associated inverted file may be defined and populated that again has customer ID, store ID, and purchase amount, but which are sorted by store ID. This inverted file may facilitate queries such as "find all purchases in store 27", since the records for purchases in store 27 will be co-located in the inverted file.)
Unlike multi-dimensional database systems, the above-mentioned special system may somewhat enable operations involving large detailed lists of customer behavior. However, these systems also have a number of serious defects or disadvantages.
One disadvantage is that, unlike multi-dimensional database systems, the model embodied by these special purpose or "list processing" systems may not be rich or complete enough to allow a large class of optimizations that improve performance, nor may it not be rich enough to facilitate a structured analysis of a data set. As such, a collection of ad-hoc queries and result sets may be provided, with no clear relationship among them, no opportunities for system-based optimal re-use of information, and no opportunities for judicious pre-computation as a run-time query accelerator. That is, in most if not all decision support applications, query-time performance may be improved by computing answers to common queries or sub-parts of sub-queries in advance or ahead of time. However, determining which pre-computed results may be used to assist in answering which queries may be difficult unless a formal structure to the system is provided which facilitates such process.
Another disadvantage is that these models or tools may not provide a multi-dimensional view of data and/or may not be integratable or usable with multi-dimensional data analysis tools which are becoming the choice for business data set analysis.