Data mining is typically the extraction of information from data to gain a new, insightful perspective. Data mining can employ machine learning, statistical and/or visualization techniques to discover and present knowledge in a form that is easily comprehensible to humans. Over the last few years, however, the sizes of databases have been exponentially increasing as the ability to gather data more efficiently increases. This has produced enormous databases that take immense amounts of time to analyze. This holds true despite the ever increasing speeds gained in computer processing technology and data storage access methods.
Pairing up items for selling is often known as “associative selling.” An effort is made to correlate various items/products based upon a particular buyer's past buying habits and/or the past buying habits of other buyers who purchased similar items in the past. This associative process can also be expanded beyond direct product sales. It can be utilized indirectly to enhance sales such as with television viewing habits. A television company can predict that most viewers of show X are men who prefer rugged sports such as football, extreme mountaineering, and rugby. This would give the television company a good idea that programming an opera or ballet in this time slot would probably reduce their viewer ratings. Even the existing show could be “enhanced” with more rugged content to increase the size of show X's audience. A successful show with a large audience naturally draws advertisers who want to reach more of their market. Thus, the viewing habits can even be used to provide appropriate commercials that have a high audience acceptance rate for a particular genre of viewers.
Techniques that attempt to determine preferences of a user are known as collaborative filtering. A collaborative filtering system can produce recommendations by determining similarities between one user and other users. The value of this type of information to society increases daily as we move towards an electronic oriented environment where our preferences can be easily disseminated to us by any number of means such as computers, televisions, satellite radios, and other devices that lend themselves to the potential of having interactivity with a user.
The basis of most measures of association, such as those utilized with collaborative filtering, is knowledge of the association between two or more items and some measure of its frequency. Using the prior example, baseball is associated with some people and basketball is associated with some people. In this example, baseball and basketball are features or attributes of a person. Thus, there can be a co-occurrence of both baseball and basketball for some portion of the people. Tracking these co-occurrence events is often done with “counts” that increment whenever a co-occurrence is found. Determining these “counts” in huge databases, however, is not without its problems. Often, the amount of data is so vast that there are computational limits on the systems trying to extract co-occurrence counts from the data. In addition, there can be many attributes associated with an object which compounds the computational challenges. The large computational overhead also limits when the co-occurrences are determined. A user typically will not wait hours, or possibly days, for a database to reveal the information. Thus, the information is typically not mined at all, or it is mined at a significant cost. Users have a strong desire to be able to determine co-occurrence counts without the time and expense that exists with current technology.