A database system may be described as a computerized record keeping system whose overall purpose is to maintain information and to make that information available on demand. Many databases in use today are based on a "relational model" in which the database data is perceived by its users as a collection of tables. The tables in a relational database include a row of column names specifying one or more column fields, and zero or more data rows containing one scalar value for each of the column fields. Each column in a database stores data regarding a particular concept or object using a particular data type, such as character strings, numeric data, and dates.
One feature that distinguishes relational from nonrelational databases is the ability to "join" two or more tables. In general, a join is described as a query in which data is retrieved from the fields of more than one table (although data may also be retrieved by joining a table with itself). Typically, the tables within the same database are joined. However, with current database technology, the tables to be joined need not be physically present in the same database.
Products such as SQLConnect.TM. from Oracle and SQLNet.TM. from Microsoft, enable the use of a heterogeneous database where a collection of database tables on different hardware platforms operating under different database management systems all appear to a user to be on one machine operating under one database management system. And in the Internet environment, some Internet search services enable users to search for information from data sources that are both implemented on different platforms as well as distributed throughout the world.
In these type of database environments, a user's query may produce an extremely high number of "hits", which are passed back to the user in the form of query results. Although the results returned from one particular data source may not contain duplicates, there are often many duplicates in the overall set of returned results. Current methods for removing duplicates involve storing the total number of results returned-from the data sources, and then searching through the list to remove duplicates. Storing the total number of returned results increases the memory requirements of the database system and may result in a decrease in the speed at which the results are provided to the user.
Accordingly, what is needed is an improved method and system for removing duplicate query results in a database system. The present invention addresses such a need.