Due to the increased amounts of data being stored and processed today, operational databases are constructed, categorized, and formatted in a manner conducive for maximum throughput, access time, and storage capacity. Unfortunately, the raw data found in these operational databases often exist as rows and columns of numbers and code which appears bewildering and incomprehensible to business analysts and decision makers. Furthermore, the scope and vastness of the raw data stored in modern databases renders it harder to analyze. Hence, applications were developed in an effort to help interpret, analyze, and compile the data so that it may be readily and easily understood by a business analyst. This is accomplished by mapping, sorting, and summarizing the raw data before it is presented for display. Thereby, individuals can now interpret the data and make key decisions based thereon.
Extracting raw data from one or more operational databases and transforming it into useful information is the function of data "warehouses." In a data warehouse, the data is structured to satisfy decision support roles rather than operational needs. Before the data is loaded into the data warehouse, the corresponding source data from an operational database is filtered to remove extraneous and erroneous records; cryptic and conflicting codes are resolved; raw data is translated into something more meaningful; and summary data that is useful for decision support, trend analysis or other end-user needs is precalculated. In the end, the data warehouse is comprised of an analytical database containing information useful for decision support. With data warehouses, the transformed, understandable information is retained at the disposal for key decision makers.
In the past, data warehouses were relatively small and easily managed. However, as operational databases grew to meet increased business demands, their data warehouses grew correspondingly. And contributing to the growth in the size of individual data warehouses was the fact that many diverse functions and departments of a business, such as finance, payroll, marketing, sales, inventory control, etc., all desired to gain from the benefits conferred by a data warehouse. Eventually, implementing a single, universal data warehouse for servicing all the needs of a corporation became too unwieldy and cumbersome. Maintaining, updating, and accessing data in such a grand and centralized data warehousing scheme became overly complex, time consuming, and expensive.
In an effort to ameliorate this problem, data "marts" were created. Data marts are similar to data warehouses, except that data marts usually contain only a subset of corporate data which is directed towards a single aspect of that business (e.g., a separate finance data mart, sales data mart, human resources data mart, etc.). The substitution of numerous, smaller, distributed data marts in place of much larger data warehouses, provides increased autonomy and flexibility. Furthermore, individual data marts can be tailored to suit the needs of a particular department.
However, the users of different data marts within a business often have, or develop over time, the need to share useful data or metadata (i.e., data describing the content and structure of other data) across their departmental boundaries. If the data marts deployed by these various departments are completely disjoint, each group will subsequently be forced to recreate the metadata that it needs from another group's data mart. In turn, this will lead to duplication of effort and problems coordinating the usage of shared metadata. For example, the marketing department of a retail store may have developed a series of relational tables for capturing certain customer profiles for direct mail marketing. The sales department of the same store may also want to use similar profiles for analyzing buying trends based on various customer data. Furthermore, the sales department may have created a number of algorithms for forecasting revenues based on specific promotional advertisements. The store's marketing department may want to use these same algorithms for enhancing their advertisement strategies. From this example, it is clear that sharing of metadata between different departments of a business would save time and effort in addition to promoting better coordination of creation and usage of reusable metadata. The sharing of metadata becomes even more advantageous for global organizations with dispersed teams trying to solve similar or related data analysis problems using an integrated computing approach. In such organizations, coordination of efforts relies heavily on network computing and effective use of knowledge and resources developed by different departments, groups, or teams. Indeed, the ability to share and reuse metadata within and across data marts becomes extremely important as the data marts become more interdependent and various departments and groups attempt to collaborate more closely and effectively.
Thus, there is a need for some method and apparatus which provides the flexibility and autonomy of the multiple data mart approach, yet also has the capability of sharing and reusing metadata. The present invention provides an novel solution which preserves the autonomy, flexibility, and ease of management associated with multiple data marts while also providing the ability of sharing metadata so that duplication is minimized and changes are captured and propagated efficiently, seamlessly, and transparently to the users. The present invention incorporates the best features of both data warehouse and data mart applications in terms of independence and sharing of metadata into one integrated solution.