1. Field of Invention
The present invention relates generally to methods and apparatus for use in financial data analysis. More particularly, the present invention relates to methods and apparatus for efficiently deriving characteristic variables from financial transaction data using precedence relationships such that the characteristic variables may be used by risk prediction models.
2. Background
As the use of bankcards is becoming more prevalent, issuers of bankcards are finding that their credit and fraud charge-offs, including bankruptcy losses, are increasing. When a bankcard account holder is forced to xe2x80x9cdefaultxe2x80x9d on payments for transactions, e.g., financial transactions, performed using his or her bankcard, it is the issuers of the bankcards who are most often forced to absorb the associated losses. As such, to protect themselves financially, issuers of bankcards are developing so-called xe2x80x9crisk predictionxe2x80x9d models which they use to assess risks, e.g., bankruptcy risk, fraud risk and non-bankrupt risk, associated with a bankcard account holder. Risk prediction models for the detection of frauds are typically based upon the analysis of patterns exhibited in series of transactions performed by the bankcard holder in a single account.
On the other hand, models for evaluating bankruptcy and credit risks are typically based on historical payment data and account performance data. To elaborate, risk prediction models for the evaluation of bankruptcy and credit risk typically use historical account performance data associated with a bankcard account or, more generally, the holder of a bankcard account, to identify a pattern of payment and to correlate the pattern of payment to known patterns of payment. In other words, the payment pattern of the account holder is compared against payment patterns which are considered as being indicative of a relatively high risk of future financial problems, as for example bankruptcy or credit loss.
With respect to fraud detection systems, for example, transaction data, e.g., data in the format of a string of data containing a series of different data fields, typically is not used directly by the fraud detection models. In general, the transaction data, which includes such data as an account number, a transaction amount, a transaction time, and a merchant zip code, as well as various other data, must be transformed into characteristic variables which may be used as direct inputs to the risk prediction models. These characteristic variables include, for example, a variable which holds the risk associated with a transaction occurring in a particular geographic area, a time-weighted sum of the total number of consummated financial purchases, and a running sum of the total amount of consummated purchases.
It should be appreciated that the number of characteristic variables which may be used by fraud risk detection models is numerous, as well as dynamic, in that desired characteristic variables may change. By way of example, new characteristic variables may be created for use in risk prediction models as needed.
In conventional fraud risk detection models, characteristic variables are derived from transaction data using hard-coded computer programs written in a suitable language, as for example computer programs written in the C computer language. Hard-coded computer programs are used for their ability to handle high volume streams of data. The transaction data is provided as inputs to the hard-coded computer program which then generates characteristic variables. Due to the volume of characteristic variables which may potentially be used, as well as the size constraints associated with most computer programs, creating a computer program which is arranged to generate substantially any possible characteristic variable would be impractical, if not virtually impossible.
Requesting characteristic variables which are not already provided for in hard-coded computer programs, therefore, are not easily obtained. Hence, the use of hard-coded computer programs to generate characteristic variables often proves to be unsatisfactory, as required characteristic variables often change, for instance, as fraud detection models become more advanced.
Theoretically, although substantially any characteristic variable may be generated using hard-coded computer programs, when a new, previously unavailable characteristic variable is desired, the hard-coded computer programs must generally be rewritten and recompiled. Therefore, such an approach to generating characteristic variables is often complicated, and, hence, inefficient, as rewriting and recompiling code is not a trivial task. Further, it would be virtually impossible to anticipate which characteristic variables may eventually be needed. As such, writing a hard-coded computer program that is intended to produce only those characteristic values whose use is anticipated would be an extremely difficult task.
To address the flexibility problem, non-hardcoded computer programs or analytical systems may be used to generate characteristic variables. Once the characteristic variables are found, using the non-hardcoded approach, the mathematical descriptions of these characteristic variables are typically handed off to the production system programmers, who may then code the mathematical description into a transactions processing system using, e.g., C, C++, COBOL, or any other suitable programming language that can achieve the necessary transaction processing rates. However, such non-hardcoded computer programs or analytical systems also have disadvantages, e.g., they typically do not have the capability to handle high volume streams of data.
Although the preceding discussion has been made with reference primarily to fraud risk detection systems, similar issues exist in the design and implementation of bankruptcy prediction systems. As mentioned, the transaction data for prior art bankruptcy prediction systems differ from prior art fraud detection systems in that they typically represent historical payment data and account performance data. Nevertheless, the task of generating characteristic variables for prior art bankruptcy prediction systems using hardcoded computer programs and non-hardcoded approach also involve the aforementioned flexibility and/or data handling penalties.
An efficient method and apparatus for transforming raw transaction data into characteristic variables, without requiring the reconfiguration of significant portions of hard-coded computer programs, while enabling high volume streams of data to be handled, is therefore desired. In other words, what is needed is a method and apparatus which enables substantially any characteristic variable to be readily created from raw transaction data. It would also be desirable if such a method and apparatus were capable of processing high volumes of data in real-time.
The present invention relates to methods and apparatus for transforming scaleable transaction data into financial data features. In one aspect, a computer-implemented method transforms scaleable transaction data into a financial data feature for use in assessing credit risk. The financial data feature is extracted from the transaction data. The method involves obtaining the transaction data from a data source, and performing a set of operations on the transaction data to transform the transaction data into the financial data feature. The set of operations is selected only from a predefined set of classes of operations which are interrelated by a predefined order of precedence. Each operation in the set of operations is performed in an order based on the predefined order of precedence of a class associated with each operator.
In one embodiment, the set of predefined classes of operations includes at most five classes of operations which are a data structure class, an atomic transformation class, an entity transformation class, a time transformation class, and a joining operator class. In another embodiment, the financial data feature is configured to be used in a risk prediction model, and the method also involves providing the financial data feature to the risk prediction model. In such an embodiment, the method further involves implementing the risk prediction model with the financial data feature and assessing a risk of bankruptcy based on a result of the implementation of the risk prediction model.