The last ten years has seen the development and rapid expansion of a technology sector known as Customer Relationship Management (CRM). This technology relates to hardware, software, and business practices designed to facilitate all aspects of the acquisition, servicing and retention of customers by a business.
One aspect of this technology involves using and applying business intelligence to develop software solutions for automating some of the processes involved in managing customer relationships. The resultant software solution can be applied wherever there is a vendor and a purchaser, i.e. to both business-to-private consumer relationships, and business-to-business relationships. Moreover, these solutions can be deployed in particular configurations to support CRM activities in different types of customer channel. For example, CRM technology can be used to control and manage the interactions with customers through telephone call-centres (inbound and outbound), Internet web sites, electronic kiosks, email and direct mail.
One of the principal functions of a CRM software solution is to maximize the efficiency of exchanges with customers. The first requirement for maximizing the efficiency of any particular business interface is to define a specific efficiency metric, success metric, or objective function, which is to be optimized. Typically this objective function relates to the monetary gains achieved by the interface, but is not limited thereto. It could for example relate to the minimization of customer attrition from the entry page of a web-site, or the maximisation of policy renewals for an insurance company using call centre support activities. In addition, the metric could be a binary response/non-response measurement or some other ordinal measure. The term objective function will be employed herein to encompass all such metrics.
For the sake of clarity only, the remainder of this specification will be based on systems which are designed to maximize either the number of purchase responses or the monetary responses from customers.
As an example, a web site retails fifty different products. There are therefore a plurality of different candidate propositions that are available for presentation to the visiting customer, the content of those propositions can be predetermined and the selection of the proposition to be presented is controlled according to a campaign controller. The candidate proposition is in effect a marketing proposition for the product in question.
When a customer visits the web site, an interaction event occurs in that a candidate proposition (marketing proposition) is presented to the customer (for example by display) according to the particular interaction scenario occurring between the customer and the web site and proposition. The response behaviour of the customer to the marketing proposition, and hence the response performance of the proposition, will vary according to a variety of factors.
FIG. 1 illustrates the principal data vectors that may influence the response behaviour of a customer to a particular candidate proposition or marketing proposition during an interaction event. In each case, examples of the field types that might characterise the vector are given.
A Product/Service Data Vector may contain fields which describe characteristics of the product which is the subject of the marketing proposition, such as size, colour, class, and a unique product reference number, although others may clearly be employed.
A Positioning Data Vector may contain information about the way in which the marketing proposition was delivered, for example, the message target age group, price point used and so on.
A Customer Data Vector may contain a number of explicit data fields which have been captured directly from the customer, such as the method of payment, address, gender and a number of summarized or composite fields which are thought to discriminate this customer from others. The vector may also contain data fields which represent inferred characteristics based upon previously observed behaviour of the customer. The summarized or composite fields can include fields such as the total value of purchases to date, the frequency of visits of the customer, and the date of last visit. Collectively this Customer Data Vector is sometimes known as a customer profile.
An Environment Vector may contain descriptors of the context of the marketing proposition, for example, the marketing channel used, the time of day, the subject context in which the proposition was placed, although others may be used.
The objective of the campaign controller is to select the candidate proposition to be presented which is predicted to optimise the objective function that can occur during the interaction event, that is to say produce a response performance or response value which produces the most success according to the selected metric, typically maximising the monetary response from the customer. This is the optimal solution. If one knew everything that could ever be known, then this optimal solution would be provided by the true best candidate proposition. In reality, the objective can be met to a degree by evaluating what the most likely next purchase may be for each customer visiting to the site, based on everything that they have done up to the present moment.
For the campaign controller to have the opportunity of exploiting relationships observed in historical interactions, data which characterizes the interaction event must be logged for each customer interaction. Each interaction event produces an interaction record containing a set of independent variable descriptors of the interaction event plus the response value which was stimulated by the marketing proposition presented. After a number of customers have visited the web site, a data set of such interaction records is produced and it then becomes possible to identify the relationships between specific conditions of the interaction event and the probability of a specific response value or outcome.
The identification and mapping of these significant relationships, as shown in FIG. 2, is sometimes performed within a mathematical or statistical framework (Data Mining, Mathematical Modeling, Statistical Modeling, Regression Modeling, Decision Tree Modeling and Neural Network Training are terms that are applied to this type of activity). Sometimes no explicit mapping takes place, instead the data records are arranged in a special format (usually a matrix) and are stored as exemplar Acases@ (terms used to describe this approach are often Collaborative Filtering, Case Based Reasoning and Value Difference Metric, though there are many other names give to specific variants of this approach). Clustering is a method that could also be placed in this group as it is a method of storing aggregations of exemplars. These exemplar cases are then used as references for future expected outcomes.
The general purpose of all approaches is to use observations of previous interaction events to discriminate the likely outcome of new interaction events such that marketing propositions with a high expected outcome of success can be preferentially presented to customers. Over a period of time, the consistent preferential presenting of marketing propositions with higher expectation response values delivers a cumulative commercial benefit.
The choice of the modelling method typically depends on such things as:
The number of different types of response values that need to be modelled;
The computer processing time available for building the model;
The computer processing time available for making predictions based upon the model;
The importance of robustness versus accuracy;
The need for temporal stability in an on line application;
The simplicity of adaptation of the method for the problem at hand.
The two general approaches of learning from historical observations of interaction events are described briefly below with their principal strengths and weaknesses:
Collaborative Filtering
Advantages:
New observations of events can be formatted and incorporated into the collaborative filter model quickly, and in real time for on-line applications;
A single model can predict expected outcomes for many different response types (i.e. many different dependent variables may be accommodated by one model);
Very robust model.
Weaknesses:
The predictive outcomes are not generally as accurate as those derived from a mathematical regression model which has been built to maximize its discriminatory power with respect to a single dependent variable;
Generally slow when making a prediction for a new interaction event;
The predictions cannot easily be expressed as probabilities or expectation values with any specific statistical confidence.
Regression Modelling, Statistical Modelling, Neural Networks and Related
Advantages:
Generally regarded as the most accurate way to map the relationship between a number of independent variables and a dependent variable, given a set of exemplars;
Generally faster when making a prediction for a new interaction event than collaborative filters (dependent upon the precise model type);
Can provide expectation response values with specific statistical confidences, and in the case of binary response variables can provide the probability of a positive response (only some model types);
Work best when there is only one dependent variable per model.
Weaknesses:
Can be slow in model build mode relative to collaborative filter models;
There are other notable weaknesses which arise from the way in which mathematical models are used in known CRM campaign controllers.
Both methods also suffer from two disadvantages for on line applications:                1. They replicate instances of previously observed history and therefore have no way of accommodating new propositions/offers in their decision process (as such propositions/offers are not present in the historical data).        2. By way of reproducing history they are only capable of passive learning.        
There are other notable weaknesses which arise from the way which mathematical models are used in known CRM campaign controllers:                1. Given a particular set of input conditions (a particular set of interaction data descriptors) the systems will always present the same candidate proposition. This can make the content of the marketing proposition presented appear rather dull and lifeless to customers.        2. The erosion of the predictive relevance of historical observations resulting from temporal changes in market conditions is not controlled in an optimal manner (i.e. it is likely that observations which were made at earlier times will be less indicative of the prevailing market conditions than more recent observations. This temporal erosion of relevance would ideally be a managed feature of an automated CRM system.        3. Current systems do not explicitly measure their commercial benefit in terms of easily understood marketing metrics.        
Considering again the example of the web site retailing fifty different products, a preliminary analysis of a data set of historical interaction records reveals a product sales distribution like that shown in FIG. 6. This distribution is a function of two main influences, firstly the true product demand and secondly the relative prominence or promotional effort that has been made for each specific product.
For example, products 48, 49 and 50 exhibited zero sales during the period. If these product transactions were used as the basis for building predictive models then products 48, 49 and 50 would never be recommended for presenting to customers as they have exhibited zero sales in the past. However, the zero sales may in fact be a very poor representation of the relative potential of each product. For example, it may be that products 48, 49 and 50 were never presented at any time to the customers visiting the site whilst products 1, 2 and 3 were very heavily promoted. It may also be that the prominence of the promotions, and general representation of products 48, 49 and 50 had been historically much lower than that of the leading sales products.
If behavioural models are based around this set of data and then used as a basis for controlling the presenting of the web page marketing propositions, then two things would happen:                1. Products 48, 49 and 50 would never be presented to customers (never be selected for promotion).        2. The number of times of presenting those products which customers have historically responded to least favourably would become even less likely to be selected for presenting in the future.        
This would be a highly non-optimal solution. For example, it may be that products 48, 49 and 50 are the products in true highest demand but because they have been presented so few times then it is by statistical chance that they have exhibited zero purchases. In current CRM systems, products which are observed to have the highest response rates under a particular set of input conditions are always presented in the future under those same conditions. This prevents the system from being able to improve its estimates of the true product demand, or to adapt to changes in the market conditions. The web site also becomes dull and lacks variation in the content for a particular user, and the available statistics from which conclusions may be drawn about weaker performing products become even fewer, further reducing the confidence in what may already be weak hypotheses based on sparse historical observations.
In a case where the site has a large number of potential products to present then the efficiency with which each product is tested and presented becomes of significant commercial importance. The requirement for high testing efficiency is further exaggerated in markets which exhibit temporal variations in preferences, since the response rates with respect to specific marketing propositions will need constant reappraisal. Markets can change as a result of seasonal effects, the “ageing” of content, actions and reactions from competitors offering similar products in the market place, and other reasons.
The CRM implementations described also do not efficiently manage the introduction of new products or marketing propositions. Since new products do not appear in the historical data set then these systems cannot naturally accommodate them. Marketers may force testing of new products by requiring a minimum number of presentations but this is non optimal and can be expensive. It can also be labour intensive to manage where the product/offer portfolio is dynamic.
In the case of regression models, the same effect of tending to reinforce and propagate historical uncertainties manifests itself with respect to independent variables. Consider an example, illustrated in FIG. 7, where a particular product offer is found to be most effective at a certain time of day. Suppose also that other products are found to exhibit higher response rates outside the window shown between lines A and B.
In the situation described by FIG. 7, a regression modelling system using historical observations as the basis for optimizing the presentation of future marketing propositions will exclusively present propositions relating to this specific product inside the time window A B. This means that in the future, little or no data about the response behaviour to marketing proposition for this product will be available outside the time window. In the short term as a method of increasing the average response rate by presenting customers with the right marketing proposition at the right time, the system is successful. However, in the absence of a control mechanism which ensures adequate ongoing exploration, then the ability of this system to maintain confidence and track possible changes in the locations of the optimum operating points will be compromised, that is to say, the system does not operate with a sustainably optimal solution.
One known method of enhancing the sustainability is to seed the activities of the system with a certain level of randomness by forcing the system, from time to time, to make a random choice whereby there is a specific low level of ongoing exploratory activity. If the level of exploratory activity could be set at the right level, this method would permit temporal stability, but there is a problem with determining what this right level of ongoing exploration is such that the system will remain confident that it is tracking the optimum solution whilst minimizing the cost of the sub optimal exploratory activities.