1. The Field of the Invention
This invention relates to databases and more particularly, to novel systems and methods for structuring queries and indices, and executing queries for databases.
2. The Background Art
Database methodologies, developing for decades, now include relational, object-oriented, and heterogeneous types. Heterogeneous databases have arbitrarily structured records.
Ubiquitous personal computing, collaborative group computing, and highly integrated, distributed environments create new demands on databases. Databases store information, that is most useful when retrieved completely, reliably, and quickly.
Records (or combinations of records) in a database generally represent an object in the real world (a product, a customer, an employee, a business division, etc.). As such, a record typically consists of a collection of fields that represent attributes of the object. This collection of fields is not necessarily "complete," but has been deemed sufficiently useful to describe the object and distinguish it from any other object represented in the database. Ultimately, the contents of these fields is the information that distinguishes one object from another object.
By way of example, traditionally, databases use schema to define record "types" or object classes. In such databases, a record type (or object class) is an abstraction or generalization about the collection of records in the database that represents the same "kind" of real world object. As such, a record "type" may be thought of as "meta-data," or "data about data." A record type typically defines certain relevant attributes and/or behaviors that are to be found in instances of that record type. For example, the record type "person" may specify that a "person" record contains attributes of height, weight, hair color, phone number, etc. The set of "person" records in the database is homogeneous in that each record contains exactly the same set of attributes (those that are defined in the "person" record type).
The rigid structures, incompatibility with modern programming languages and methodologies, and the inability to represent and manage complex data masses have contributed to increasing dissatisfaction of users with the performance of relational databases. Meanwhile, the need exists to extend relational database systems with some kind of support for large and internally complex data as well as object-oriented data. Although object-oriented databases have not displaced relational databases in the software market to any major extent, neither relational nor object-oriented databases solve all of the problems that need to be addressed for users.
An additional difficulty is the need to build data into databases as the data becomes available. Thus, imposing structure on a database at the time of its creation is not always practical nor useful. Data may arrive from many sources at a central collecting point. The data may tend to be somewhat amorphous. Context may be known based upon information within or without a data mass. A database has been created to store arbitrarily structured, persistent data, along with any content and context associated with the data. What is needed is an apparatus and method to efficiently query a database containing arbitrarily structured records.
For example, in an arbitrarily structured record, repeating fields, missing fields, null-valued fields, and sub-record entities may exist. A database containing arbitrarily structured records presents numerous difficulties for a query engine designed to locate records within the database.
Internal self-description exists within arbitrarily structured records since no over-riding schema need exist, as compared with relational databases. Thus, data may not always be cleanly divided into homogenous tables, each having a single schema (record template), as required by the relational database model.
For example, a business organization may have some substantial structuring. Nevertheless, an address book might regard every company entity (e.g. company, division, department, unit, individual, etc.) as a contact, customer, client, or the like. Such a universal address book may need to accommodate all entities possessing an address and a phone number regardless of other attributes. Such a heterogeneous collection of arbitrarily structured records needs a query mechanism that can search and evaluate the records.
An arbitrarily structured record might include more than a single field having the same field identification or field name. A need exists to provide a meaningful query and a meaningful result for a search across repeating fields of the same name.
Also needed is an ability to provide logic to support multi-value results and to support unknown results. For example, with repeating fields (same name, identifier) within a record, query results may be ambiguous, at least by conventional methods of inquiry. For example, an answer to a query directed to a field name might have a true result for one field of the designated name, and a false result for another field in the same record having the same field name. Thus, a true result and a false result may exist for a query directed to a repeating field. Also, certain operations may result in undefined or otherwise unknown results. Some mechanism is needed to deal with such ambiguities without resulting in a failure of a query engine. Thus, a query structure and a query engine are needed to support multivalued results and unknown results.
Another need is for a hybrid query. An arbitrarily structured record might contain textual contents in some fields while having non-text content in other fields. A search engine is needed to handle full text search operations and non-full text search operations combined in a single query.
The term heterogeneous database is often used to refer to databases provided by different database vendors (e.g. Oracle, Sybase, Informix, etc.). Heterogeneous, herein, by contrast, indicates that an individual database supports collections (or sets) of arbitrarily structured records within itself. That is, one record is arbitrarily structured with respect to another record within the same collection (or set) in a single database.
Some method of indexing and querying a database of such arbitrarily structured records is needed. Methods are also needed to optimize such searching to provide timely results. Accordingly, a query apparatus and method are needed for efficient construction and execution of queries directed to a heterogeneous database.