Actuaries develop risk models by segmenting large populations of policies into predictable risk groups, each with its own risk characteristics. Actuarial pricing methods often use predictive variables derived from various internal insurance company and external data sources to compute expected loss and loss ratio at the individual policy level.
The traditional method used by actuaries to construct risk models involves first segmenting the overall population of policyholders into a collection of risk groups based on a set of factors, such as age, gender, driving distance to place of employment, etc. The risk parameters of each group are then estimated from historical policy and claims data. Actuaries may employ a combination of intuition and trial-and-error hypothesis testing to identify suitable factors.
However, some partly automated approaches use a predictive modeling class library to discover risk characterization rules by analyzing large sets of insurance data. The values of particular risk factors that apply to a particular individual or business are given weight in such analyses based on predictive modeling. Risk groups and their associated risk characteristics can be expressed in the form of actuarial rules, each rule defining a distinct risk group and its level of risk, such as: male drivers under age 25 who drive sports cars have a claim frequency of 25% and an average claim amount of $3200. To be able to discover such rules from historical claims and policy data, predictive modeling based on rule induction can be utilized.
Examples of external data sources for such data include the C.L.U.E. database of historical homeowners' claims; the MVR (Motor Vehicle Records) database of historical motor claims and various databases of both personal and commercial financial stability (or “credit”) information.
As described above, using predictive modeling, key variables that one must try to predict are claim frequency and claim severity, and thereby pure premium. Approaches to predictive modeling are described in C. Apte, et al., “Research Report: Insurance Risk Modeling Using Data Mining Technology,” IBM Research Division (31 Mar. 1998); U.S. Pat. No. 4,975,840 issued Dec. 4, 1990, to DeTore et al.; and U.S. Patent Publication 20060136273, Jun. 22, 2006, each of which is incorporated herein by reference in its entirety. Apte, et al., note that P&C companies continually refine both the delineations they make among risk groups and the premiums they charge.
Thus, insurance companies generally seek to improve the accuracy of their risk analysis. They collect many data fields for each policy they underwrite. From time to time, the use of certain consumer data for underwriting (for example credit scores or neighborhood boundaries) comes under attack for philosophical or political reasons. Thus, there is a need to identify factors for use in classifying prospective insured.