Computing and network technologies have transformed many aspects of everyday life. Computers have become household staples rather than luxuries, educational tools and/or entertainment centers, and provide individuals and corporations with tools to manage and forecast finances, control operations such as heating, cooling, lighting and security, and store records and images in a permanent and reliable medium. Networking technologies like the Internet provide individuals virtually unlimited access to remote systems, information and associated applications.
As computing and network technologies have evolved and have become more robust, secure and reliable, more consumers, wholesalers, retailers, entrepreneurs, educational institutions and the like are shifting paradigms and are employing the Internet to perform business rather traditional means. For example, today consumers can access their bank accounts on-line (e.g., via the Internet) and can perform an ever growing number of banking transactions such as balance inquiries, fund transfers, bill payments, and the like.
Typically, an on-line session can include individuals interfacing with client applications (e.g., web services) to interact with a database server that stores information in a database accessible to client applications. For instance, a stock market web site can provide users with tools to retrieve stock quotes and purchase stock. Users can enter stock symbols and request stock quotes by performing mouse clicks to activate a query. Client applications can then query databases containing stock information and return appropriate stock quotes. Users, based on returned stock quote information, can thereafter purchase or sell stocks by supplying suitable information, wherein submitting buy or sell orders initiate database queries to return current pricing information and order status.
When provided with collections of sets, set-similarity joins are capable of identifying all pairs of sets, one from each collection, that have high overlap. Set-similarity Joins (SSJoins) are a useful primitive that can be employed to implement similarity joins involving other types of data, and has numerous applications. However, to date, implementation of set-similarity joins has been deficient in that such implementations have not always produced exact answers and/or have not provided precise performance guarantees. Rather previous implementations have either been probabilistically approximate with performance guarantees, or exact but without performance guarantees. As such, probabilistically approximate approaches can miss some output pairs with a small probability, while exact techniques always produce the correct answer but without appropriate performance guarantees.