The present invention is directed to a method and system for evaluating the results of a predictive statistical scoring model and more particularly to a system and method that determines the contribution of each of the variables that comprise the predictive scoring model to the overall score generated by the model.
Insurance companies provide coverage for many different types of exposures. These include several major lines of coverage, e.g., property, general liability, automobile, and workers compensation, which include many more types of sub-coverage. There are also many other types of specialty coverages. Each of these types of coverage must be priced, i.e., a premium selected that accurately reflects the risk associated with issuing the coverage or policy. Ideally, an insurance company would price the coverage based on a policyholder's actual future losses. Since a policyholder's future losses can only be estimated, an element of uncertainty or imprecision is introduced in the pricing of a particular type of coverage such that certain policies are priced correctly, while others are under-priced or over-priced.
In the insurance industry, a common approach to pricing a policy is to develop or create complex scoring models or algorithms that generate a value or score that is indicative of the expected future losses associated with a policy. The predictive scoring models are used to price coverage for a new policyholder or an existing policyholder. As is known, multivariate analysis techniques such as linear regression, nonlinear regression, and neural networks are commonly used to model insurance policy profitability. A typical insurance profitability application will contain many predictive variables. A profitability application may be comprised of thirty to sixty different variables contributing to the analysis.
The potential target variables in such models can include frequency (number of claims per premium or exposure), severity (average loss amount per claim), or loss ratio (loss divided by premium). The algorithm or formula will directly predict the target variable in the model. The scoring formula contains a series of parameters that are mathematically combined with the predictive variables for a given policyholder to determine the predicted profitability or final score. Various mathematical functions and operations can be used to produce the final score. For example, linear regression uses addition and subtraction operations, while neural networks involve the use of functions or options that are more complex such as sigmoid or hyperbolic functions and exponential operations.
In creating the predictive model, often the predictive variables that comprise the scoring formula or algorithm are selected from a larger pool of variables for their statistical significance to the likelihood that a particular policyholder will have future losses. Once selected from the larger pool of variables, each of the variables in this subset of variables is assigned a weight in the scoring formula or algorithm based on complex statistical and actuarial transformations. The result is a scoring model that may be used by insurers to determine in a more precise manner the risk associated with a particular policyholder. This risk is represented as a score that is the result of the algorithm or model. Based on this score, an insurer can price the particular coverage or decline coverage, as appropriate.
As noted, the problem of how to adequately price insurance coverage is challenging, often requiring the application of complex and highly technical actuarial transformations. These technical difficulties with pricing coverages are compounded by real world marketplace pressures such as the need to maintain an “ease-of-business-use” process with policyholders and insurers, and the underpricing of coverages by competitors attempting to buy market share. Notwithstanding the recognized value of these pricing models and their simplicity of use, known models provide insurers with little information as to why a particular policyholder received his or her score. Consequently, insurers are unable to advise policyholders with any precision as to the reason a policyholder has been quoted a high premium, a low premium, or why, in some instances, coverage has been denied. This leaves both insurers and policyholders alike with a feeling of frustration and almost helpless reliance on the model that is used to determine pricing.
While predictive scoring models are available in the insurance industry to assist insurers in pricing insurance coverage, there is still a need for a method and system to that overcomes the foregoing shortcomings in the prior art. Accordingly, there exists a need for a system and method to interpret the results of any scoring model used in the insurance industry to price coverage. Indeed, the system and method may be used to interpret the results of any complex formula. There is especially a need for a system and a method that allow an insurer to determine and rank the contribution of each of the individual predictive variables to the overall score generated by the scoring model. In this manner, insurers and policyholders alike may know with certainty the factors or variables that most influenced the premium paid or price of an insurance policy.