1. Field of the Invention
The invention relates to the field of hardware systems for the rapid processing of database search operations in the broad sense, including assertional updates, in large-scale databases.
In a more general sense, the process and processing structure of the invention are applicable to all automated operations that may be expressed in the form of expressions to be evaluated. For example, applications to arithmetic calculations or to associative memories with variable length records may be cited.
As a specific example, in one of the versions described below, the object of the invention is to evaluate Boolean expressions of the type (A AND B AND C) OR ((D OR E) AND (F OR G)). This expression may, for example, represent a selection query of the tuples of a relation in a database, wherein the letters A, B, C, D, E, F, G represent one of the selection conditions (for example, an equal comparison of a given attribute of the tuple with a constant determined by the operator).
Because the invention was initially designed for the purpose of developing a high performance database processor, it is noted that this invention provides two options in relation to the state of the art:
the option of designing a specialized system with the ability to resolve the stages of database operations (multiple selection, sort, text retrieval, join, projection operations, etc.) which may be expressed in assertional form;
the option of developing a structure and a process which operate in a set-oriented manner, that is, which perform a large number of elementary operations in parallel in a network of interconnected processing modules that may be individually controlled.
2. Description of the Related Art
Several major classes of systems are known for resolving certain database operations. These systems develop algorithms by index construction, sorting, automatic chaining, hashing, cartesian product, or extended selection.
Prior to making the actual calculations, index construction algorithms create an order or a preliminary order on one of the concerned relations or on the operands of the selection predicates. This principle provides for efficient but suboptimal relational selection and join evaluation and for sorting operations. On the other hand, it does not provide for evaluation of text searches and particularly for searching nonanchored sequences, or for detecting rules activated by facts.
Algorithms developed by construction of automatons provide for faster scanning of a tree corresponding to an index and possibly more complex graphs in text retrieval operations. They may therefore be used for processing most database operations. How ever, their execution implies very high compilation times which makes their use fall far below the theoretical optimum, particularly for joins.
The theoretical optimum is defined as executing query resolution in a processing time which is on the same order as memory access times for accessing useful data.
Hash algorithm systems generally display very high performance for searches using the equal operator but are poorly adapted for selections with operators other than equal and do not provide for management of either sorting or text retrieval operations.
Cartesian product algorithms display a performance inferior or comparable to that of sort algorithms, and therefore fall several stages short of the optimum. Among cartesian product algorithms, for example, systolic algorithms are known which rely on the use of processing elements (PEs) that transfer data to adjacent PEs during each cycle. These algorithms require configurations of all of the PEs of the processor which are specific and different for each operation, or a reconfiguration of the processor through an interface matrix which is an expensive component. In addition, cartesian product algorithms do not provide for text retrieval.
The principle of systematic expression of database operations in assertional form places the system of the invention in a class of database consultation processes which the inventor proposes to call "search by extended selection."
This class of algorithms is partially implemented by associative memories, the use of which has until now been relatively limited due particularly to technological reasons. The operating principle of an associative memory is to search for information according to its content. Operands are stored in a memory, together with an operator (&lt;, .ltoreq., &gt;, .gtoreq., =&lt;&gt;) or a function (min, max, sort in ascending or descending order) and a comparison mask. These operands, which may be strings of characters or numbers, are compared to an operand of the same type stored in a special operand register, in this case bit to bit for the min, max and sort operations. The comparison is done in parallel for each operand. The result is a string of operands which verifies the comparison, or of references or addresses of these operands.
Despite its interesting functionalities, this type of processor does not provide for selections using several operators concurrently, for multiple joins, or for controlling attributes whose length is variable or longer than that of the associative memory word, for text selections, or for inter-attribute comparisons. In addition, for applications using operators other than the equal operator, performance is mediocre since the associative memory evaluates one bit during each clock cycle; even with a 20 Mhz internal clock, which is very fast, it would not be possible to achieve throughputs in memory.
The invention nevertheless maintains the principle of using extended selection algorithms for database operations as being the most attractive solution for implementing a high-performance system. In fact, it obviates the need to resort to prior compilation and preprocessing of data, such as indexing, sorting or hashing operations.
In general, the extended selection algorithms developed in designing the invention are based on a principle of "multiple comparisons" within a Boolean qualification expression. Depending on the type of operation performed on the database, Boolean expression resolutions supply one of the following results:
the value "TRUE" or "FALSE" for the Boolean qualification expression (example: selection); or
the identifiers of verified subexpressions in the said qualification expression (example: join of two relations, for which the qualification expression is in the normal disjunctive form, and the subexpressions each correspond to one (multi-) comparison the join attributes of a given tuple of the source relation and, in succession--for each new resolution of the qualification expression--each current tuple of the other relation); or
the number of verified expressions in the said qualification expression (example: the sort operation within a relation, for which the said qualification expression is in normal disjunctive form, and each subexpression corresponds to a comparison of the sort attributes of a given tuple of the relation, and in succession--during each new resolution of the qualification expression--each tuple of the same relation); or
a combination of the three foregoing results.
As an example, a selection operation may be translated into a qualification expression in the form:
Q=(p.sub.11 p.sub.12 . . . p.sub.li)v . . . (p.sub.n1 p.sub.n2 . . . p.sub.nj), if Q is in the normal disjunctive form or in the normal isomorphic conjunctive form (priority of or). The selection predicates P.sub.i are in the form Att.sub.i op. cte.sub.i, or Att.sub.i op. Att.sub.j, where Att.sub.i and Att.sub.j designate the value of the attribute i or j in the current tuple (or of the field i or j in the record) of the relation (file) on which the selection is performed, and cte.sub.i is a constant selected by the user. Op.=[&lt;, .ltoreq., &gt;, .gtoreq., =, &lt;&gt;].
In the case of a join of two relations, the selection predicates p.sub.i will be in the form Att.sub.i op. Att.sub.j, where Att.sub.i designates the value of the join attribute i in the current tuple of one of the relations and Att.sub.j designates the value of a join attribute j of a given tuple of the second relation.
In other words, each of the predicates is a logic variable which may assume two states (TRUE/FALSE) depending on whether the comparison it represents is verified or not. Evaluation of the qualification expression then consists of determining whether it is overall in the TRUE or FALSE state, that is, whether the query is verified.
In the example cited, the query result is determined in succession for each tuple of the relation, wherein the value TRUE or FALSE of the qualification expression of the selection determines the way the current tuple will be processed subsequently.
With respect to the architecture of existing systems that provide for extended selection, the closest are those of associative memories and, under certain conditions, the tree circuits for parallel evaluation of non-standard expressions, or the sequential evaluation architecture described in the article "Design and Analysis of a Direct Filter Using Parallel Comparators," Proc. 4th Int. Workshop on Data Engineering, Grand Bahama Island, Springer ed., March 1985.
Existing associative memories 700 are useful for query resolutions consisting of searching for a stored term 701 which is identical to an operand 702, whether it is masked or not masked 703, as shown in FIG. 1. Conversely, as mentioned earlier, systems providing for the use of operators other than the equal operator, or the use of the min, max and sort function are substantially suboptimal. Finally, no known application allows for the simultaneous use of several distinct operators in a single expression to be evaluated.
In the direct evaluation of an expression in a tree as illustrated in FIG. 2, it is necessary to calculate the connector for each node 704 of the tree 718 and to individually transmit it to this node. The addressing of individual nodes substantially increases the complexity of the structure and the number of transistors needed by the node. In addition, up to 50 percent of the comparators 722 may remain idle.
The third known solution, as shown in FIG. 3, is to break down a comparator vector into subvectors 714 with typically a maximum of 30 comparators per subvector. The evaluation is performed sequentially in each of the subvectors. In order to obtain a final evaluation of the expression by extracting the value of each subvector, the maximum size of the subvectors is fixed in the silicon. The result is that when the size of the subexpression to be evaluated exceeds approximately ten predicates, the proportion of comparators used may decrease to approximately 50 percent. Finally, the address of each predicate, that is, the address of the comparator on which it is evaluated, must be calculated at the time of initialization because of the structure's rigidity.