The invention relates generally to techniques for processing information in a database system, and more particularly to techniques for updating customer signatures or other types of records in a database system.
It is desirable in many business applications of database systems to track separately the transaction behavior of each customer. This tracking may be implemented using a customer-specific record referred to herein as a customer signature. For example, a customer signature for buying behavior may contain information on the likely place of purchase, value of goods purchased, type of goods purchased, and timing of purchases for the given customer. The signature may be updated whenever the customer makes a transaction, and, because of storage limitations, the updating may be able to use only the new transaction and the summarized information in the current customer signature.
The above-described customer signatures may be provided using relative frequency distributions. Such distributions are also commonly known as histograms. Updating histograms sequentially is not difficult when observations are randomly sampled. If {circumflex over (xcfx80)}n is a vector of current histogram probabilities and Xn+1 is a characteristic of a current transaction, represented as a vector of 0""s except for a 1 in a cell containing an observed value, then the sequentially updated vector of histogram probabilities is
xe2x80x83{circumflex over (xcfx80)}n+1=(1xe2x88x92wn+1){circumflex over (xcfx80)}n+wn+1Xn+1,xe2x80x83xe2x80x83(1)
where wn+1=1/(n+1) and {circumflex over (xcfx80)}0=0. Updating thus requires only the most recent transaction, the number of transactions made so far and the current summary. This is known as an unweighted average, because {circumflex over (xcfx80)}n weights each observed transaction X1, . . . , Xn equally.
In cases of time-dependent customer behavior, a histogram updated using an unweighted average is generally inappropriate because recent transactions have no more influence on the histogram than old transactions do. Such time-dependent behavior is tracked better by an exponentially weighted moving average (EWMA). An updated EWMA vector {circumflex over (xcfx80)}n+1 is given by equation (1) with wn+1=w for a fixed weight w, 0 less than w less than 1, that controls the extent to which {circumflex over (xcfx80)}n+1 is affected by a new transaction and the speed with which a previous transaction is xe2x80x9caged out.xe2x80x9d The initial probability estimate {circumflex over (xcfx80)}0 must be specified, and can be determined, e.g., from historical data on other customers. Under some conditions, an EWMA approximates the corresponding posterior mean under a Bayesian dynamic model, as described in M. West et al., xe2x80x9cBayesian Forecasting and Dynamic Models,xe2x80x9d Springer-Verlag, 1989. Additional details regarding EWMA can be found in B. Abraham et al., xe2x80x9cStatistical Methods for Forecasting,xe2x80x9d John Wiley and Sons, New York, N.Y., 1983.
It is important to note that the unweighted averages and EWMAs are generally appropriate only when variables are randomly sampled. However, timing variables like day-of-week are typically not randomly sampled. As a result, standard sequential estimates of their distributions can be badly biased. For example, if the transaction rate on Monday is high and the most recent transaction occurred early Monday morning, then the next transaction is likely to occur on Monday and unlikely to occur on Tuesday or any day of the week other than Monday. Because unweighted averages and EWMAs increase the estimated probability for a histogram cell every time that cell is observed, the estimated probability for Monday first rises with every transaction made on Monday and then falls with every transaction made before the following Monday.
As is apparent from the foregoing, a need exists for techniques for updating customer signatures or other records of time-dependent behavior in the presence of timing variables that are not randomly sampled.
The present invention provides improved techniques for updating customer signatures or other types of records in a database system.
In accordance with one aspect of the invention, a customer signature or other type of record in a database system is updated using an event-driven estimator based on a model of time-dependent behavior. A current version of a record affected by a given transaction is retrieved from a memory of the system and an event-driven estimator of at least one of a transaction rate and a period probability for the record is determined based at least in part on a dynamic Poisson timing model. The dynamic model has a number of periods and corresponding period-based transaction rates associated therewith. The event-driven estimator may be configured so as to generate an estimated transaction rate {circumflex over (xcex)}j,n for period j and a given transaction n, and then to generate an estimated period probability {circumflex over (xcfx80)}j,n for period j as {circumflex over (xcex)}j,n/xcexa3k{circumflex over (xcex)}k,n. An updated version of the record may then be generated based on the event-driven estimator.
In accordance with another aspect of the invention, a number of different techniques may be used to update the event-driven estimator. As one example, the event-driven estimator may be updated for every transaction in a specified period and at the end of each oft he periods. As another example, the event-driven estimator may be updated for every transaction, regardless of which of the periods any particular transaction falls in, but not at the end of each of the periods. As yet another example, the event-driven estimator may be updated for a given one of the periods only when there is a transaction occurring within that period.
In accordance with a further aspect of the invention, the estimated transaction rate {circumflex over (xcex)}j,n provided by the event-driven estimator under the dynamic Poisson timing model satisfies             λ      ^              j      ,      n              -      1        =      {                                                                      (                                  1                  -                  w                                )                            ⁢                                                λ                  ^                                                  j                  ,                                      n                    -                    1                                                                    -                  1                                                      +                          wZ                              j                ,                n                ,                                                                          if            ⁢                          xe2x80x83                        ⁢            transaction            ⁢                          xe2x80x83                        ⁢            n            ⁢                          xe2x80x83                        ⁢            falls            ⁢                          xe2x80x83                        ⁢            in            ⁢                          xe2x80x83                        ⁢            period            ⁢                          xe2x80x83                        ⁢            j                                                                                          λ                ^                                            j                ,                                  n                  -                  1                                                            -                1                                      +                                          w                                  1                  -                  w                                            ⁢                              Z                                  j                  ,                  n                  ,                                                                                                        if              ⁢                              xe2x80x83                            ⁢              transaction              ⁢                              xe2x80x83                            ⁢              n              ⁢                              xe2x80x83                            ⁢              does              ⁢                              xe2x80x83                            ⁢              not              ⁢                              xe2x80x83                            ⁢              fall              ⁢                              xe2x80x83                            ⁢              in              ⁢                              xe2x80x83                            ⁢              period              ⁢                              xe2x80x83                            ⁢              j                        ,                              
wherein w denotes a specified updating weight, Zj,n is an elapsed time in period j since a previous transaction nxe2x88x921, and {circumflex over (xcex)}j,nxe2x88x921 is a reciprocal rate that estimates the average time between transactions in period j.
Advantageously, the event-driven estimator of the present invention is both computationally efficient and memory efficient. It provides better sequential estimates of timing distributions used for customer signatures or other records than the conventional unweighted average and exponentially weighted moving average (EWMA) described previously, and yet is nearly as simple to compute as an EWMA.