Information Retrieval/Data Bank systems (IR systems) and Data Base Systems (DB systems) are basic applications of digital computers. Whereas the IR systems are generally available only on mainframes via networks, the DB systems are available on the majority of computers, from mainframes to personal computers, and consist of software packages and sometimes specialized hardware apparatus. The query process involves a selection of stored records. The user can request the execution of final operations on the selected records (their printing or display on the screen of the personal computer) or, for DB users, also intermediate operations (their manipulation and storage). The records are selected on the basis of the query condition expressed by the user on the data by means of a query language. The language provided by the system developer, however, is not always semantically adequate to support all the user needs.
The aim of the system developer is to implement methods for the fastest possible selection of records from those available. The selection execution time depends on data structures utilized, performance of CPU and input/output devices, data transmission speed between main memory and mass storage, and, finally, query complexity. In fact, the more the number of atomic conditions expressed by the user, the longer the mean CPU time needed to evaluate them on the data. On the other hand, there may be inherent difficulties directly concerning the decision as to whether particular records do or do not satisfy the query condition.
As an example, a common use of IR systems, namely, bibliographical reference systems, is first considered. In this context, the user query concerns the retrieval of bibliographical references on a specified subject and it is formulated as a logical expression of keywords-each one signifying a search topic. A problem arises in the presence of keywords which are generic or not included in the thesaurus on which the bibliographical source classification is based. The current query processing method consists of assigning every bibliographic source with a "probability of relevance" to the specified topics, then evaluating the probability of relevance of the source to the global query and, finally, furnishing the bibliographic sources in decreasing order of probability (exceeding a given threshold).
As regards DB systems, the main difficulty is due to the possible presence of null values. The American National Standard Institute report (ref. [1] of bibliography) lists 14 kinds of null values, that is, 14 cases when a data value is considered null. For example, the value is null whenever it is inapplicable to some entities (the maiden name of male employees), applicable but presently non-existent (the profession of a child), inconsistent, or unknown (because it is protected, unavailable, missing, being updated or validated, etc). Lastly, the value is null when it depends on values which are null themselves. The decision process is also problematic when the data value is a placeholder for one in a given set of (real and/or null) values. This "special value" case differs from the "null value" in that the actual value is neither null nor protected and, in any case, it is not suitable to specify it with precision. Details can be found in Lefons' paper (ref. [5] of bibliography).
The usual query processing consists typically of the following schematic steps. Step I: query compilation and translation into object code. In particular, the query condition is decomposed into atomic conditions directly applicable to the data. The object code contains the instructions to evaluate the atomic conditions and to appropriately assemble the partial results. Step II: serial or indexed read-in of a data record from peripheral memory. Step III: evaluation of the atomic conditions on the data. For each atomic condition, this evaluation process assigns the truth value true or false as result according to whether the data record does or does not satisfy that atomic condition. Step IV: assembly of the results obtained in step III by means of Boolean operations and, or, and not. On the basis of the result (true or false) of this composition, the record is selected or rejected. Steps II, III, and IV are repeated for all records available. Step V: execution of the intermediate/final operations on the selected records.
There are two main inconveniences which compromise the flexibility and functionality of the system. One is the fact that the time required to process the selection condition depends directly on its complexity. The second is derived from the presence, often unavoidable, of null, special, or probability values as possible data values and, consequently, as results of the atomic condition evaluation. In fact, using the ordinary two-valued logic {true, false}, conditions applied to null, special, or probability values cannot be assigned a proper truth value. The problem of deciding if and how to support many-valued logics is, at present, a difficult task for the system developer. In state-of-the-art systems, the trend is to adopt fuzzy logics for the IR systems. As for Data Base Management Systems (DBMSs), the proposed solutions generally disagree on both the number and meaning of truthvalues to be considered and the logic semantics (truth tables for and, or, and not). In his paper (ref. [2] of bibliography), E. F. Codd suggests the use of the standard ternary logic to support the null value meaning "property unknown" and the application of the so-called null substitution principle to evaluate the three-valued logical expressions. One problem with the null substitution principle is that it is computationally hard to apply (its complexity is esponential in the number of nulls which occur in the logical expression to be evaluated).