1. Field of the Invention
The present invention relates in general to the field of task management support through analytics, and more particularly to a system and method for automated population splitting to assist task management through analytics.
2. Description of the Related Art
Analytics are often used to aid management of a variety of tasks. Generally, analytics use historical information to generate models that predict future outcomes. For example, linear and logistic regression analytical techniques apply selected independent variables to generate models that predict dependent variable outcomes based on the historical relationship of the independent and dependent variables. Such models are commonly used to predict outcomes for the extension of credit, such as mortgages, credit cards and auto loans. One common measure of an individual's creditworthiness is a credit score that compares the individual's financial characteristics with the characteristics of a modeled population. Generally, the more that the population and individual characteristics share in common with each other, the greater the predictive quality of a score for the individual that is generated from a model of the population. In order to obtain credit scores with adequate predictive quality, financial institutions tend to attempt to generate models from large populations. The predictive quality of models is usually estimated by a variety of techniques that consider population size and other factors.
The availability of powerful computers to process large sets of data has expanded the use of analytics for task management not only in the types of tasks that are managed, but also the frequency with which analytics are performed to obtain models that are current and accurate. One example is the use of analytics to predict whether a desired person will answer a telephone call to a telephone number, as described in U.S. Pat. No. 5,802,161 System and Method for Optimized Scheduling. Basically, a model predicts the likelihood that a right person will answer a telephone call during defined periods of a day and the predicted likelihoods are optimized so that the number of right person contacts is increased compared with random call placement. In some instances, models are applied based on hourly calling periods defined through a day. Optimized dependent variables include the maximum number of right party contacts, the maximum amount of revenue generated by the calls, the maximum number of cured accounts, the minimum number of canceled accounts, etc . . . As contacts are made and accounts are handled, updated results allow additional modeling so that future contact attempts benefit from the application of the existing results.
One technique that helps to develop more accurate statistical predictive models through linear regression, logistic regression and other modeling techniques is to split the modeling population into relatively homogeneous subpopulations. Developing separate models for each of these subpopulations helps to maximize both the total amount of predictive power of the overall scoring system and the goodness of fit and validation of the subpopulation models. As a simple example, a subpopulation that has high-value mortgages often has a higher correlation for a selected dependent variable, such as likelihood of repaying credit or likelihood of answering a call at a defined time, with a given set of independent variables than does a population that includes individuals with sub-prime mortgages. During preparation of custom models, the number of subpopulations considered is typically limited due to the labor costs associated with developing and validating each model. For example, where a large number of variables are available, a skilled statistician might select certain variables to consider as subpopulations in a recursive manner, such as with the Classification and Regression Tree (CART) method. A binary tree of subpopulations is defined recursively with each node split into two sub-nodes and the splitting variable selected to establish sub-populations that are as homogeneous as possible in terms of the behavior of the regression response variable. Such selection of splitting variables and validation of models based on subpopulations is a labor-intensive, manual process typically performed by a skilled statistician that generally cannot be reliably automated. As an example, a natural hierarchy sometimes exists for the order in which splitting variables are used that is not amenable to automation. For instance, splitting populations based on types of loans before the amount of each loan naturally provides more homogeneous populations since different types of loans have different typical balance levels. As another example, splits into more than two nodes are sometimes desired or required.