1. Field of the Invention
This invention relates to database, data warehouse, and data mart technology and, more particularly, to an improved system and method for exploring information relationships in data.
2. Discussion of Related Art
Modern computing databases have extremely large quantities of data. Businesses often desire to discover information relationships in this data to make better informed business decisions. In this regard, xe2x80x9cdata warehousingxe2x80x9d is used to describe computing technologies used to discover relationships within a database, and xe2x80x9cdata martxe2x80x9d is used to describe technologies for a subject-specific data warehouse.
To date, data warehousing and data mart tools have been undesirable because of their high cost, both in infrastructure and human capital. Modern systems are effectively customized database applications. Consequently, exploring relationships usually involves the creation of new, custom queries and typically requires a management information systems (MIS) professional, or other programming personnel, to implement the query. If a user, for example, in a marketing department, wishes to investigate a potential new information relationship, he or she is often forced to cross department boundaries and as a result almost invariably experiences undesirable delays. As a result, much of the data is under utilized because many relations are never explored because the delay outweighs the benefit.
Moreover, because modem data warehouse systems are effectively customized database applications, they often inherit inefficiencies from the underlying database. These inefficiencies may be information related (e.g., inherently precluding certain lines of questioning because the application is tightly-coupled to the database""s schema) or performance related (e.g., the system may be optimized for a certain type of transactional access that does not perform well to the accesses involved in the data warehousing queries).
More specifically, concerning performance related issues, most systems rely on the relational data model (RDM). The performance of a RDM implementation is typically limited by its xe2x80x9caccess method.xe2x80x9d Commercially-available systems, for example, have their software logic rely on an access method (e.g., xe2x80x9cB+treexe2x80x9d) that requires multiple accesses to storage (e.g., memory or disk) to obtain a given record. Some of the accesses are to reference structures that are used to effectively xe2x80x9cpoint toxe2x80x9d the data of interest (e.g., indices or hierarchies of linked lists). Sometimes, these reference structures can get so large that portions of the structure must reside on disk. Thus a given request for a database record may involve multiple disk storage requests. Moreover, the database operation algorithms are tightly bound to the access method. That is, the algorithm itself has been optimized to the access method and is thus dependent on the existence of the access method. Much of the literature on database performance explicitly or implicitly assumes the existence of such access methods.
Aside from the above limitations, most commercial systems are limited to the actual data within the database. The systems cannot query other important data elements such as the schema, the meta data, or the data dictionary without significant custom programming. Consequently, significant knowledge, e.g. useful queries, is not reported or available for use within such systems.
The above difficulties are exacerbated in the context of data residing on disparate databases.
Alternative approaches have been attempted. Childs, for example, discusses set-theoretic approaches in Feasibility of a Set-Theoretic Data Structure: a General Structure Based on Reconstituted Definition of Relation, Information Processing 68, Edinburgh, 1968; Description of a Set-Theoretic Data Structure, Fall Joint Computer Conference, San Francisco, 1968; and Extended Set Theory: a General Model for Very Large, Distributed, Backend Information Systems. He is believed to have developed a system (STDS and XTDS) in which a user may express queries directly from a small set of set operators.
Preferred embodiments of the invention provide a system for, and method of, exploring relationships in data stored in a computer readable medium. A query is received having at least one operator chosen from a set of operators that includes relational operators and having at least one input and output associated with the operator and defined as a table having at least one domain having a type associated therewith. The query is transformed into a set program having at least one operation structure, corresponding to the operator and having logic for type-independently performing an operation, corresponding to the operator, and having a data relation structure, cooperating with the operation structure, for handling all data access and storage associated with the operation.