1. Field of Invention
The present invention relates generally to methods and apparatus for use in financial data analysis. More particularly, the present invention relates to methods and apparatus for efficiently deriving characteristic variables from financial transaction data using precedence relationships such that the characteristic variables may be used by risk prediction models.
2. Background
As the use of bankcards is becoming more prevalent, issuers of bankcards are finding that their credit and fraud charge-offs, including bankruptcy losses, are increasing. When a bankcard account holder is forced to "default" on payments for transactions, e.g., financial transactions, performed using his or her bankcard, it is the issuers of the bankcards who are most often forced to absorb the associated losses. As such, to protect themselves financially, issuers of bankcards are developing so-called "risk prediction" models which they use to assess risks, e.g., bankruptcy risk, fraud risk and non-bankrupt risk, associated with a bankcard account holder. Risk prediction models for the detection of frauds are typically based upon the analysis of patterns exhibited in series of transactions performed by the bankcard holder in a single account.
On the other hand, models for evaluating bankruptcy and credit risks are typically based on historical payment data and account performance data To elaborate, risk prediction models for the evaluation of bankruptcy and credit risk typically use historical account performance data associated with a bankcard account or, more generally, the holder of a bankcard account, to identify a pattern of payment and to correlate the pattern of payment to known patterns of payment. In other words, the payment pattern of the account holder is compared against payment patterns which are considered as being indicative of a relatively high risk of future financial problems, as for example bankruptcy or credit loss.
With respect to fraud detection systems, for example, transaction data, e.g., data in the format of a string of data containing a series of different data fields, typically is not used directly by the fraud detection models. In general, the transaction data, which includes such data as an account number, a transaction amount, a transaction time, and a merchant zip code, as well as various other data, must be transformed into characteristic variables which may be used as direct inputs to the risk prediction models. These characteristic variables include, for example, a variable which holds the risk associated with a transaction occurring in a particular geographic area, a time-weighted sum of the total number of consummated financial purchases, and a running sum of the total amount of consummated purchases.
It should be appreciated that the number of characteristic variables which may be used by fraud risk detection models is numerous, as well as dynamic, in that desired characteristic variables may change. By way of example, new characteristic variables may be created for use in risk prediction models as needed.
In conventional fraud risk detection models, characteristic variables are derived from transaction data using hard-coded computer programs written in a suitable language, as for example computer programs written in the C computer language. Hard-coded computer programs are used for their ability to handle high volume streams of data. The transaction data is provided as inputs to the hard-coded computer program which then generates characteristic variables. Due to the volume of characteristic variables which may potentially be used, as well as the size constraints associated with most computer programs, creating a computer program which is arranged to generate substantially any possible characteristic variable would be impractical, if not virtually impossible.
Requesting characteristic variables which are not already provided for in hard-coded computer programs, therefore, are not easily obtained. Hence, the use of hard-coded computer programs to generate characteristic variables often proves to be unsatisfactory, as required characteristic variables often change, for instance, as fraud detection models become more advanced.
Theoretically, although substantially any characteristic variable may be generated using hard-coded computer programs, when a new, previously unavailable characteristic variable is desired, the hard-coded computer programs must generally be rewritten and recompiled. Therefore, such an approach to generating characteristic variables is often complicated, and, hence, inefficient, as rewriting and recompiling code is not a trivial task. Further, it would be virtually impossible to anticipate which characteristic variables may eventually be needed. As such, writing a hard-coded computer program that is intended to produce only those characteristic values whose use is anticipated would be an extremely difficult task.
To address the flexibility problem, non-hardcoded computer programs or analytical systems may be used to generate characteristic variables. Once the characteristic variables are found, using the non-hardcoded approach, the mathematical descriptions of these characteristic variables are typically handed off to the production system programmers, who may then code the mathematical description into a transactions processing system using, e.g., C, C++, COBOL, or any other suitable programming language that can achieve the necessary transaction processing rates. However, such non-hardcoded computer programs or analytical systems also have disadvantages, e.g., they typically do not have the capability to handle high volume streams of data.
Although the preceding discussion has been made with reference primarily to fraud risk detection systems, similar issues exist in the design and implementation of bankruptcy prediction systems. As mentioned, the transaction data for prior art bankruptcy prediction systems differ from prior art fraud detection systems in that they typically represent historical payment data and account performance data. Nevertheless, the task of generating characteristic variables for prior art bankruptcy prediction systems using hardcoded computer programs and non-hardcoded approach also involve the aforementioned flexibility and/or data handling penalties.
An efficient method and apparatus for transforming raw transaction data into characteristic variables, without requiring the reconfiguration of significant portions of hard-coded computer programs, while enabling high volume streams of data to be handled, is therefore desired. In other words, what is needed is a method and apparatus which enables substantially any characteristic variable to be readily created from raw transaction data. It would also be desirable if such a method and apparatus were capable of processing high volumes of data in real-time.