Databases are ubiquitous—most people during their daily lives interact, sometimes unwittingly, with databases. Generally, corporations, research laboratories, educational institutions, small businesses and government organizations employ at least one database and some entities utilize several databases. Common examples of databases include: human resource databases, machine parts databases, product inventory databases, production schedule databases, pay disbursement databases, insurance claim management databases, medical treatment and patient history databases, vehicle licensing databases, etc. It is thus safe to state that currently there exists a multitude of database types and implementations, but despite the multifariousness of database type and implementation most if not all extant databases operate and function on similar principals.
The first databases emerged in the 1960s and were applied to large enterprise wide problems, such as for example airline reservation systems. At that time computers were monumental in scale and expensive to operate and maintain. Typically, only large corporations, research and development establishments, educational institutions and government agencies could afford to own and operate computers, let alone run a single database system. However, as computing power increased and the costs associated with operating and maintaining computers decreased many smaller organizations have been able to justify the expenditure necessary to acquire computers, and commensurately the ability to acquire data and support larger collections of databases. Thus, advances in computer technology (e.g., microprocessor speed, memory capacity, data transfer bandwidth, software functionality, and the like) have generally contributed to increased computer application utilization in various industries. Ever more powerful server systems, which are often configured as an array of servers, are commonly provided to service requests originating from external sources such as the World Wide Web, for example.
Nevertheless, to truly appreciate what a database is, it may be constructive to comprehend what data is, the types of information that can constitute data and the uses to which data can be put. It is commonly understood that data can be a collection of information that can be utilized as a basis for reasoning, calculation, processing, etc. Further, data can be acquired in a plethora of manners and can be put to a multitude of uses. Common examples of data that can be acquired include names of people, places and things, descriptions of people, places and things, dates and times of events, descriptions and prices of things, business information, images, video, audio, production specifications, inventory information, meteorological information, historical stock market quotes, pharmacological information, banking information, vehicle registration information, personal medical information, and the like. Some uses that this acquired data can be put to include, meteorological forecasting (e.g., tracking winter storms, hurricanes, typhoons, tornados, . . . ), and generation of economic forecasts (e.g., consumer confidence indexes, stock market prognostications, . . . ) to name but a few.
Consequently, as the amount of available electronic data grows, it has become more important to store such data in a manageable manner that facilitates user friendly and quick data searches and retrieval. Today, a common approach is to store electronic data in one or more databases. In general, a typical database can be referred to as an organized collection of information with data structured such that a computer program can quickly search and select desired items of data, for example.
Usually, data within a database is organized via one or more tables where these tables are arranged as an array of rows and columns. Further, such tables can comprise a set of records, wherein each record includes a set of fields. Records are commonly indexed as rows within a table and the record fields are typically indexed as columns, such that a row/column pair of indices can reference a particular datum within a table. For example, a row may store a complete data record relating to a sales transaction, a person, a vehicle, or a project. Similarly, the columns of a table can define discrete portions of the rows that have the same general data format, wherein the columns can define fields of the records.
Each individual piece of data, standing alone, is generally not very informative or useful. Consequently, database applications are utilized to make data more useful by aiding users to organize and process data stored within databases. Database applications allow users to compare, search, sort, order, merge, separate and integrate disparate items of data, so that meaningful information can be generated from the data. However, despite the manifold benefits associated with the aforementioned database applications it has been found, in the context of database searches, for example, that such searches can often return empty answers when all the search requirements remain unfulfilled. This is particularly evident where a user wishes to utilize a database to search for a particular item based on a set of preferences, for example, the best flight, the most perfect house, the most ideal used car, or the most optimally located hotel, given a set of criteria. Until recently, traditional database engines or query languages had not supported searches based on preferences, or if there were support such searches were rudimentary and/or cumbersome to implement. Thus in order to satisfy the need for searches that include preferences a skyline operator has been introduced as an extension to SQL (Structured Query Language) to implement queries with associated preferences. This newly introduced skyline operator in effect takes a set of preferences as input and returns only those results for which there is no other result that is better with respect to all other input preferences.
While the skyline operator does not necessarily add to the expressive power of SQL, implementation of the skyline operator without being cognizant of the properties of the operator can be inordinately expensive. Thus, having introduced the skyline operator into SQL, it has now been noted that it is not sufficient to merely have the skyline operator as the top-most operator in an operator tree, and further it has been observed that in some instances, the interaction between the skyline operator with other operators can result in significant performance benefits. Additionally, it has latterly been observed that there are properties associated with the skyline operator that distinguishes it markedly from traditional operators, such as, for example, the selection operator. An example of one distinction that can be drawn between the skyline operator and the selection operator, for instance, is that unlike utilization of the selection operator where adding new selections only decreases cardinality, adding new preferences can increase the skyline cardinality of the operator.
Thus, while the skyline operator can be expressed and implemented in SQL, it has nevertheless been widely recognized that the most efficient and efficacious implementation requires incorporating the skyline operator inside the database engine itself. However, introduction of such an operator into the database engine requires that, amongst other factors, the cardinality and the cost associated with utilization of the skyline operator be optimized prior to execution of the preference queries.