1. Field of the Invention
The present invention relates to the field of database querying, and, in particular, to a system and method for analyzing database records using sampling and probability.
2. Description of the Related Art
Marketers aim to build a list of known recipients with information about those recipients and the purchase transactions in which they engage. For example, an online marketer may track a known recipient (e.g., an existing customer) from a promotional email sent to the recipient containing a hyperlink to the marketer's website, to the actual purchase of a product using, for example, an online shopping cart. When the recipient purchases a product, the marketer obtains valuable information about the recipient's purchasing habits. The marketer may continue to track information about the recipient's purchases over time (such information referred to herein as “transaction data”) to enhance its overall understanding of the recipient. The combination of such transaction data with personal attributes of the recipient such as gender, age, address and interests (such information referred to herein as “recipient data”) allows the marketer to more accurately target users with relevant, personalized marketing promotions or other marketing content.
Typically, personal attributes for a given recipient may be updated from time to time, but the quantity of such data associated with a given recipient stays relatively constant over time. Thus, for a given marketer, the size of recipient data is proportional to the number of recipients who are known to the marketer, and grows only when the marketer identifies new recipients. Conversely, the size of the transaction data can grow quickly as recipients make purchases over time. Typically, recipient data is stored in a separate database or database table from transaction data.
Marketers often desire to send marketing content to recipients deemed likely to identify with the product or service being advertised in the content. Effective targeting relies on using both recipient data and transaction data. For example, a cycling merchant located in San Francisco may desire to send a particular promotion to a subset of its known customers, e.g., men who are between the ages of 18 and 40, reside in certain zip codes in the San Francisco Bay Area, and who have purchased one or more bicycle-related products in the past six months.
Marketers also often desire to know the approximate number of potential recipients of a proposed promotion before deciding to actually launch the promotion or modify the promotion. For example, if the number is low, the criteria may need to be loosened, and vice-versa. Traditionally, in order to determine such a number, the marketer would execute separate queries against the recipient database and the transaction database, join the results using an intersection or union set operation, and then either count or query the resulting data set. However, due to the large amounts of recipient data—and even larger amounts of transaction data—typically managed by marketers, running such database queries simply to obtain a count of potential recipients is prohibitive with respect to cost, time and computing resources. As a result, the number of queries that the marketer is able to effectively execute, and the marketer's ability to fine tune and effectively target marketing communications, is significantly reduced.
Accordingly, there remains a need in the art for a technique that addresses the drawbacks and limitations discussed above with respect to analyzing database records.