1. Field of the Invention
This invention relates in general to computer implemented database management systems, and more particularly, to aggregate predicates and search in database management systems.
2. Description of Related Art
Databases are computerized information storage and retrieval systems. For example, a Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into physical tables which consist of rows and columns of data. The rows are formally called tuples or records; A database will typically have many physical tables and each physical table will typically have multiple tuples and multiple columns. The physical tables are typically stored on random access storage devices (RASD) such as magnetic or optical disk drives for semi-permanent storage.
Additionally, logical tables or “views” can be generated based on the physical tables and provide a particular way of looking at the database. A view arranges rows in some order, without affecting the physical organization of the database.
In existing database systems, aggregate predicate support is not available. Many applications, however, need a search capability using aggregate predicates. For example, aggregate predicates are needed for the following situations:                Similarity search on images and documents.        Nearest neighbors search on spatial objects.        
In existing database systems, users can limit the results of queries by using standard relational operators (<, <=, =, < >, >, >=) and logical operators (and, or, not). In addition to these relational and logical operators, object relational databases, such as DB2® from International Business Machines, Corporation, also allow users to define predicates that can be used in queries and be exploited by a query optimizer. W. Chen, J.-H. Chow, Y.-C. You, J. Grandbois, M. Jou, N. Mattos, B. Tran, Y. Wang, “High Level Indexing of User-Defined Types,” Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, September 1999, pages 554–564.
These predicates are scalar predicates that are true or false for individual values, independent of other individual values. For example, consider a condition such as distance (customer.location, store.location)<5. For the distance condition, given any pair of customer and store locations, the distance condition evaluates whether the distance between the customer and store location is less than 5 or not (i.e., the condition evaluates either true or false), independent of any other customer or store locations.
Existing database systems support both scalar functions and a few aggregate functions. For example, some scalar functions are: abs (which returns an absolute value) and sqrt (which returns the square root of an argument). The aggregate functions operate on a collection of values (i.e., a column) and return a scalar value. Examples of aggregate functions include: max (which returns a maximum non-null value in a column), min (which returns a minimum non-null value in a column), and avg (which returns an average of the non-null values in a column). The main difference between scalar functions and aggregate functions is that aggregate functions work over a set of values, while scalar functions take only individual values as arguments. Recently, aggregate functions have been generalized to on-line analytical processing (OLAP) functions. F. Zemke, K. Kulkarni, A. Witkowski, B. Lyle, “Introduction to OLAP Functions,” ISO/IEC JTC1/SC32 WG3-YGJ-nnn, ANSI NCITS H2-99-154, Apr. 12, 1999. OLAP functions are aggregate functions. Each invocation of an OLAP function has an associated window that specifies the set of values over which the OLAP function applies.
Unlike functions, existing database systems do not support-aggregate predicates that are true or false of individual values with respect to a given set of values. Many real world applications require aggregate predicates. The following are just some of the common examples:                1. Find the top ten images that are similar to a given image.        2. What are the top five fault lines that are nearest to a house.        3. What is the closest hospital to a given location.        4. For each store, find the top ten selling products in the last month.        
These examples share some common aspects. First, each example involves an aggregate predicate. For example, one cannot determine if a hospital is closest to a given location without comparing it with other hospital locations relative to that given location. Second, all of the examples require the search based upon an aggregate predicate. In other words, the goal is not to check to see if a given hospital is closest to a location. Instead, we are searching for a hospital (from, possibly, a group of many hospitals) that is closest to a given location.
Thus, there is a need in the art for introducing aggregate predicates into existing database systems and to enable search based upon aggregate predicates.