A database is defined as a collection of data items, organized according to a data model, and accessed via queries. The present invention applies to any data model. The invention is illustrated using a relational database model.
In a relational database or relation, data values are organized into columns or fields wherein each column comprises one attribute of the relation. Each column or attribute of the relation has a domain which comprises the data values of that attribute. Each row of a relation, which includes one value from each attribute, is known as a record or tuple.
FIG. 1 shows an exemplary relational database. The relation 1 of FIG. 1 contains data pertaining to a population group. The relation 1 has six attributes or columns 2-1, 2-2, 2-3, . . . , 2-6 for storing, respectively, name, age, weight, height, social security number and telephone extension data values of the population. The database also has twelve records or tuples 3-1, 3-2, 3-3, . . . , 3-12. Each tuple 3-1, 3-2, 3-3, . . . , 3-12 has one data value from each attribute. For instance, the tuple 3-10 has the name attribute value "Lee" the age attribute value 40, the weight attribute value 171, the height attribute value 180, the social security number attribute value 999-98-7654 and the telephone extension attribute value 0123.
Often, it is desirable to identify and/or retrieve tuples of interest or tuples which meet criteria of interest. Queries are used to retrieve tuples of interest from a relation using selection operations. Queries may be predefined, i.e., accessed from libraries, or dynamic, i.e., defined and translated into selection operations at run time. Predefined queries restrict flexibility as only predefined queries stored in the library may be evaluated. Dynamic queries, on the other hand, may be freely defined and executed at run time. Dynamic queries are also referred to as ad hoc queries.
Selection operations incorporate precise and/or imprecise predicates. Predicates are logical or mathematical expressions for specifying the criteria which must be satisfied by the tuples in order to be selected. For instance, it may be desired to select all tuples of a relation R having an attribute A value which is the same as some constant c. Such a selection operation is denoted R.multidot.A=C or S.sub.R.multidot.A=C. The selection operation is specified by the precise predicate "A=C". The precise predicate, in turn, incorporates the precise selection operator "equals" for specifying the desired criteria that the selected tuples must satisfy. Other precise selection operators include "greater than," "less than," etc. Additionally, individual precise predicates may be combined with logical operators "AND," "OR," "NOT," etc.
Precise predicates are predicates which return one of two values, i.e., "1" or "0". For example, "A=C" is a precise predicate which is logic "1" for all values of the attribute A which equal the constant c and which is logic "0" otherwise. In the evaluation of a query comprising only a precise predicate (called a precise query), the precise predicate of the query is applied to the corresponding attribute value of each tuple in the relation. This query identifies the set containing only the tuples for which the application of the precise predicate returns a logic "1" value. Such a set is referred to as the read set. The tuples contained in the read set are said to satisfy the precise predicate of the query.
An imprecise predicate, on the other hand, cannot always identify with certainty the tuples which satisfy and which do not satisfy the imprecise criteria comprised therein. Rather, an imprecise predicate depends on criteria which by their nature are ambiguous or difficult to quantify exactly. For example, consider the relation pertaining to a population group illustrated in FIG. 1. An imprecise predicate which requires identification of tuples having young age attribute values (denoted young(age)) cannot discriminate between all tuples with complete certainty. This is because there is no consensus opinion as to which age values are young and which are not. Furthermore, the imprecise predicate young(age) may depend on the domain of the age attribute. For instance, the age value 65 may not be young in the domain of ages (1,70), but may be young in the domain of ages (60,90).
Queries comprising an imprecise predicate (called imprecise queries) may be implemented using fuzzy set theory. In fuzzy set theory, a membership function is defined which determines the degree to which a particular object belongs to a group or set based on numerical criteria. Such sets, called fuzzy sets, comprise certain objects with absolute certainty and other objects with varying degrees of membership. See L. Zadeh, "Fuzzy Sets," Information and Control, vol. 8 (1965). Zadeh proposes that imprecise predicates, also called fuzzy predicates, may be evaluated by defining a membership function for each imprecise predicate.
In evaluating an imprecise query, the membership functions corresponding to the imprecise predicates of the query are applied to appropriate attribute values of each tuple in the relation. The application of each membership function returns a possibility value in the continuous range of (0,1) (rather than only one of two values, "0" or "1"). These possibility values indicate the possibility that the tuple satisfies the criteria of the imprecise predicate. A fuzzy set is thereby formed comprising tuples with varying degrees of membership depending on the possibility values returned by the membership function. Tuples may be selected or identified if their respective possibility values exceed some threshold. Thus, the read set of an imprecise predicate may be defined as the fuzzy set of tuples (or subset thereof) derived using the appropriate membership function. The read set would comprise the tuples which may possibly satisfy the imprecise predicate and their associated possibility values.
An example of such a membership function is depicted in FIG. 2. In FIG. 2, a membership function f(x) corresponding to the imprecise predicate young(age) is depicted. As shown in FIG. 2, the domain of the age attribute of the relation is plotted along the abscissa and the possibility values are plotted along the ordinate. As defined by the membership function f(x), tuples having age values 0 to 15 definitely satisfy young(age) as the application of f(x) to these age values returns the possibility value of 1. Further, tuples having values 20 or greater definitely do not satisfy the imprecise predicate young(age) as the corresponding membership function f(x) returns a possibility value of 0 for these age values. Finally, it is not certain whether or not tuples having age values between 15 and 20 satisfy young(age) or not. However, the membership function f(x) defines a possibility between 0 and 1 that these tuples satisfy the imprecise predicate young(age). As depicted, tuples having age values closer to 15 have a greater possibility than tuples having age values closer to 20 of satisfying young(age).
Membership functions such as f(x) depicted in FIG. 2 map each attribute value to a particular possibility value. Furthermore, if such membership functions are applied to a particular attribute value, the membership function will always return the same possibility value. The returned possibility value is the same even if the domain of attributes, over which the membership function is defined, is narrowed. For example, if f(x) of FIG. 2 is defined over the domain of age values (1,90), f(65) always equals 0 whether the domain is narrowed to (1,70) or (60,90). Membership functions which map attribute values to particular possibility values in this manner are called "static membership functions."
In fuzzy set theory, membership functions may also be defined which depend on more than one attribute. Such membership functions are referred to as multi-dimensional membership functions. Multi-dimensional membership functions may be defined for evaluating imprecise predicates which depend on more than one attribute. For example, an imprecise predicate could be defined to identify tuples in the relation pertaining to a population group illustrated in FIG. 1 which are "healthy" depending on the height and weight attribute values of the tuples.
Fuzzy set theory also defines fuzzy logic operators for logically combining individual membership functions. Suppose g(A.sub.1) denotes a first membership function (corresponding to a first imprecise predicate) which is applied to a first attribute A.sub.1. Similarly, suppose h(A.sub.2) denotes a second membership function (corresponding to a second imprecise predicate) which is applied to a second attribute A.sub.2. Each membership function returns a possibility value in the range (0,1). A union operation or logical or is defined as the maximum of the results of these two membership functions. Such an operation is denoted max(g(A.sub.1),h(A.sub.2)). Similarly, an intersection operation or logical AND is defined as the minimum of the results of these two membership functions. The logical AND operation is denoted min(g(A.sub.1),h(A.sub.2)). Finally, the complementation operation or logical NOT is defined as one minus the results of either membership function. The logical NOT operation is denoted 1-g(A.sub.1) (or 1-h(A.sub.2)).
The above discussion has been presented to introduce basic principles necessary for understanding prior art methods for evaluating imprecise database queries. Some prior art proposals have disclosed implementations for evaluating imprecise queries using fuzzy set theory. See V. Tahani, A Conceptual Framework for Fuzzy Query Processing--A Step Towards Very Intelligent Database Systems, 1977; Buckles and Petry, "A Fuzzy Representation of Data for Relational Databases," 31 Fuzzy Sets & Systems (1982). The prior art has proposed imprecise query evaluation implementations using single dimensional, static membership functions. See J. Kacprzyk, S. Zadronsky, A. Ziolkowski, "FQUERY 111+: A Human Consistent Database Querying System Based on Fuzzy Logic with Linguistic Qualifiers," Information Systems, vol 14, no. 6, 443-53 (1989). These prior art implementations propose that membership functions for particular imprecise queries be defined prior to the creation of the relation. Data, e.g., tuples, are then stored in the relation. As tuples are stored in the relation, the predefined membership functions are applied to appropriate attributes of each tuple. Possibility values are thereby produced for each tuple as the tuple is stored in the relation. Additionally, a pointer is generated for each possibility value which points to the location in the relation of the one or more tuples corresponding to this possibility value. The pointer and the possibility values are then stored together in a separate storage structure.
The evaluation of an imprecise query requires the retrieval of each possibility value and its associated pointers from the separate storage structure corresponding to each imprecise predicate of the query. The pointers are then used to point to the location in the relation of the tuples corresponding to each possibility value. The tuples pointed to by the pointers may then be retrieved from the relation. If necessary, possibility values are combined using fuzzy logic operators. Finally, the tuples which may possibly satisfy the imprecise query, or a subset thereof, are placed into the read set.
In addition, the prior art does not disclose the support of multi-dimensional imprecise predicates. For example, the prior art does not disclose that a query comprising an imprecise predicate "healthy" which depends on two attributes, e.g., healthy(weight,height), could be evaluated.
The prior art paradigms present a number of limitations and disadvantages for the evaluation of imprecise queries including:
1. Since all membership functions are determined before queries are evaluated, the user cannot define new membership functions or alter existing membership functions dynamically. Thus, the set of imprecise queries which may be evaluated is limited to those for which membership functions were previously defined prior to storing the tuples in the relation. For example, suppose "young(age)" was the only imprecise predicate for which a membership function was defined before tuples were stored in a relation. A query could not be evaluated for retrieving all tuples having a "tall" height attribute value.
2. The mapping of attribute values to possibility values is fixed by the membership function. This prevents the evaluation of an imprecise query to return tuples relative to a narrowed domain. For example, suppose the membership function f(x) of FIG. 2 was defined for the imprecise predicate young(age) over the domain (1,90). A query to retrieve all young nursing home patients, i.e., over the domain of ages narrowed to (65,90), would not retrieve any tuples.
3. A space constraint is imposed in the paradigms disclosed by the prior art. Many possibility values and corresponding pointers must be stored for each membership function. In order to support a reasonable number of membership functions for certain applications, a large amount of storage capacity could be required.
Finally, no imprecise database query implementations have been disclosed which use adaptive feedback to modify membership functions. Adaptive feedback has been proposed for use with membership functions in control systems. Control systems use membership functions to control mechanical and electrical devices by varying control parameters based on the current state of the device. In such systems, a membership function translates information regarding the current state of a device into control parameter outputs. A critical aspect of such systems is the ability to adjust the membership function in real time to account for changes in the device or to deal with variations in the state information. See, S. Isaka, A. Sebald and A. Karimi, "On the Design and Performance Evaluation of Adaptive Fuzzy Controllers," Proceedings of the 27th Conference on Decision and Control, Austin, Tex., 1068-69 (1988); H. Chunyu, K. Toguchi, S. Shenol, L. Fran, "A Technique for Designing and Implementing Fuzzy Logic Control," Proceedings of the 1989 American Control Conference, Vol. 3, 2754-55 (1989). Accordingly, the prior art has proposed using the results of interactions between the control system and the controlled device as feedback to modify the membership functions. However, no implementations for imprecise database queries which use adaptive feedback have been disclosed.
It is therefore an object of the present invention to provide a system and method for evaluating imprecise predicates which permits the definition and application of membership functions at run time. It is also an object of the present invention to support membership functions which return possibility values relative to a domain selectively adjusted by the query processing system. Additionally, it is an object of the present invention to efficiently evaluate imprecise queries without additional storage structures for storing predetermined possibility values and pointers. Furthermore, it is an object of the present invention to support imprecise queries which require multi-dimensional membership functions. Finally, it is an object of the present invention to support imprecise database queries which use feedback to modify membership functions in real time.