It is frequently desirable to retrieve information elements stored in an information base on the basis of queries--for example a search for all information elements in the information base that have certain values of certain fields or attributes. Data processing systems typically require the query specification to employ exact values in order to retrieve the desired information from the information base. Thus, mathematically exact values of particular attributes (fields) are input, which are then compared with corresponding attribute (field) values of the information elements in the information base to select those elements with exactly equivalent values. This is also true of data manipulations, such as sorting, where it is desired to output information elements based on an ordering rule of one or more attributes (first, the record with the highest attribute value, then the next highest and so on). Such selective access permits the system to abstract the information base and deal only with the elements which are pertinent to the specifications of the query.
Methods currently used to handle such selective query specifications fall into two broad classes. The first is an exhaustive iterative examination of each of the elements of the information base to find those meeting the specifications of the query. The second is to store, for all elements, duplicate values of selected attributes (associated with a corresponding element address) in a specialized data structure (index) designed for rapid access to values and corresponding information elements meeting the specification. Examples of such specialized data structures include ordered lists, trees, hashed indexes and a number of other variations, of which, only a few are commercially viable.
Where applicable, such data structures or indexes provide much faster access than iterative search methods but are subject to the following limitations:
(1) The index files needed for reference to the attribute or attributes of the information base may be of substantial size, especially when the information element contains a large number of attributes which are indexed for subsequent retrieval. In some instances the storage requirements for the index files may equal or exceed the storage requirements for the information base itself.
(2) Indexes provide efficient access only for the specific attribute or combination of attributes for which the index is designed. They are inefficient or inapplicable for the flexible inquiries encountered in commercial practice which include a broad range of logical relations between varied combinations of numerous attributes, often on the basis of partial or inexact specifications.
Thus, while these methods satisfy the minimal requirements of data processing systems, they are far from adequate for the increasingly critical need for a general approach to efficient processing of complex multi-attribute specifications.
It should be noted that there are specialized examples of current methods which superficially deal with inexact specifications, such as partial key access, occasional use of explicit ranges and some recent systems which purport to permit the use of "plain english" specifications. However, such systems are still dependent on the ability of the logic of the system to translate or cross-relate such input to an exact key structure. Thus, partial keys will locate a record in a tree index (after operator inspection of a number of incorrect records) only if the initial characters of the partial input exactly match the initial characters of the complete key. Equivalent limitations apply to all other such methods and performance becomes less efficient and more inaccurate as the specifications become less precise. This is also true of recent developments in "artificial intelligence" systems, which employ very complex (and thus computationally bound) analytical logic, rule logic, classifier logic and so on to translate incomplete and imprecise input into the most specific and highest probability output possible, generally incorporating prompts for additional input to clarify ambiguities. Conversely, there is no general approach in the prior art which purposefully utilizes less precise representations of data to enhance the efficiency and validity of manipulating exact data values. It is an object of this invention to provide such generalized systems.