Our work considers a widely applicable method of constructing segmentation-based predictive models from data that permits limits to be placed on the statistical estimation errors that can be tolerated with respect to various aspects of the models that are constructed. In this regard, we have discerned that the ability to limit estimation errors during model construction can be quite valuable in industries that use predictive models to help make financial decisions. In particular, we have discerned that this ability is of critical importance to the insurance industry.
Insurers develop price structures for insurance policies based on actuarial risk models. These models predict the expected claims that will be filed by policyholders as a function of the policyholders' assessed levels of risk. A traditional method used by actuaries to construct risk models involves first segmenting an overall population of policyholders into a collection of risk groups based on a set of factors, such as age, gender, driving distance to place of employment, etc. The risk parameters of each group (i.e., segment) are then estimated from historical policy and claims data.
Ideally, the resulting risk groups should be homogeneous with respect to risk; that is, further subdividing the risk groups by introducing additional factors should yield substantially the same risk parameters. In addition, the risk groups should be actuarially credible; that is, the statistical errors in the estimates of the risk parameters of each group should be sufficiently small so that fair and accurate premiums can be charged to the members of each risk group.
However, identifying homogeneous risk groups that are also actuarially credible is not a simple matter. Actuaries typically employ a combination of intuition, guesswork, and trial-and-error hypothesis testing to identify suitable risk factors. For each combination of risk factors that are explored, actuaries must estimate both the risk parameters of the resulting risk groups as well as the actuarial credibility of those parameter estimates. The human effort involved is often quite high and good risk models can take several years to develop and refine.