The present invention relates to a method and apparatus for processing entries stored in an electronic database where each entry comprises a succession of data values of correlated variables. The entries are processed in order to provide predictions of future data values of the correlated variables. The invention has particular application to processing data entries that relate to customers of a bank or retail business so as to predict future data values of attributes of the customers. The future profitability of each customer may then be calculated by reference to the predicted values for the customer attributes.
Businesses need to know the value to them of individual customers who conduct business transactions. Such transactions take place over time and as a result a pattern of transactions can be observed in relation to each customer. The products and facilities that are offered by a business may cover a wide spectrum and each product may have its individual overhead costs of provision and profitability. For example, in a bank, the products and facilities on offer will include current and savings accounts, financial equity investment products, stock market investment management, insurance products and lending products such as home purchase and business loans. In addition, customers will interact with the bank through personal contact at branches of the bank, telephone instructions, and automated teller machine transactions.
An individual facility such as a current account has an overhead cost to a bank that depends upon the number and type of transactions that pass through the account and the balance within the account. The bank will be able to determine a cost to be assigned to each account transaction whether it is processing a check or a cash withdrawal from an automated teller machine. On the other hand, the bank can also determine any profit from charging interest when an account is in deficit or the use of funds when the account is in credit. An individual customer of the bank may have accepted other financial products such as a loan or an insurance product and the profitability of that customer will depend on the number, the type and the value of those products.
Some customers will be seen as more profitable than others because of the pattern of their financial relationship with the bank. A bank will have been able to develop a profitability algorithm which relates the profit v(t) for a customer at any one time t to a number of variables relating to that customer. The variables may include continuous variables X such as a bank account balance and discrete variables Y such as a number of credit card transactions. Personal attributes that are explanatory of the customer, such as age, sex, occupation or address may be included in the data relating to each customer although they do not contribute directly to the algorithm of profitability used by the bank.
In general terms the algorithm may be expressed as
v(t)=G(X(t),Y(t))xe2x80x83xe2x80x83(1)
where v(t) represents the profitability of the customer at time t, G is a mathematical function defined by the bank, X represents a continuous variable such as an account balance and Y represents a discrete variable such as a number of credit card transactions. It will be understood that for simplicity of explanation it has been assumed that only two variables have been taken into account in equation (1). In practice, the bank will take more than two variables into account.
It would be beneficial to a bank to be able to make a projection of the future profitability of a customer using the bank""s algorithm. Such a projection to a future time (t+1) would however require predictions of the future values of the variables X and Y at the future time (t+1). It is an aim of the present invention to provide a method and apparatus to provide such predictions.
According to the present invention, there is now provided a method of operating a data processing apparatus to process variables stored in an electronic database so as to provide predictions of future values of the variables, the method comprising the steps of;
accessing a prior distribution over the parameters of a first of the variables,
accessing data values of the first variable to derive a posterior distribution of the parameters of the first variable,
accessing a prior distribution over the parameters of a second of the variables, the second of the variables being correlated with the first,
accessing data values of the second variable to derive a posterior distribution of the parameters of the second variable,
taking statistical samples of the parameters of each posterior distribution to provide estimates of the parameters of the posterior distributions,
computing a predictive distribution of the first variable from the said estimates,
computing a predictive distribution of the second variable from the said estimates, and
formulating predicted data values for the first and second correlated variables from the said predictive distributions and known values for the variables.
Further according to the present invention, there is provided data processing apparatus to process variables stored in an electronic database so as to provide predictions of future values of the variables, the apparatus comprising;
means to access a prior distribution over the parameters of a first of the variables,
means to access data values of the first variable to derive a posterior distribution of the parameters of the first variable,
means to access a prior distribution over the parameters of a second of the variables, the second of the variables being correlated with the first,
means to access data values of the second variable to derive a posterior distribution of the parameters of the second variable,
a statistical sampler of the parameters of each posterior distribution to provide estimates of the parameters of the posterior distributions,
computing means to compute a predictive distribution of the first variable and a predictive distribution of the second variable from the said estimates, and
a predictor to predict data values for the first and second correlated variables from the said predictive distributions and known values for the variables.