1. Technical Field
The present invention related generally to creating statistical models of transactional behavior, useful, for example, for detecting aberrant behavior of individuals or organizations, and more particularly to forming profiles of various entities and combinations of entities for development of such statistical models.
2. Background of the Invention
In many real-world problems involving prediction, detection, forecasting and the like, the problem setting consists of the interactions between different entities such as individuals, organizations or groups. In such cases, the activity related to the problem at hand is largely described by a body of transaction data (historical and/or ongoing) that captures the behaviors of the relevant entites. Examples of such problems abound in everyday life. A few sample settings along with the corresponding transaction data and related entites are described below in Table 1.
TABLE 1Problem/SettingTransactionsEntitiesHealthcare fraud and abuseClaims (inpatient andClient (Patient), Doctor,detectionoutpatient)Hospital, Pharmacy, LabCredit Card fraud detectionPurchases, Payments, Non-Account holder, Merchant,monetary transactionsCredit Card issuerBank Checking SystemCheck processing transactionsAccount holder, Bank, TellerFood Stamp fraud detectionFood Stamp transactionsRetailer, Client
In each of these settings, the common phenomenon is the fact that the encounters between the different entities are captured in the form of the associated transactions.
An entity is an operational unit within a given setting, application or environment and represents objects that interact within that setting, application or environment. The members of an entity are generally objects of a similar type. Different entities interact with each other and their interactions are encapsulated in the transaction data corresponding to that application. Thus, examples of entities in a healthcare setting are clients, providers (this includes doctors, hospitals, pharmacies, etc.), clients' families, etc. and their interactions are captured in the claims data; i.e. the interaction of a healthcare provider and a patient is captured in a claim by the provider for reimbursement. In the credit card world, the interacting entities are account holders, merchants, credit card issuers, and the like and their interactions are captured through different types of transactions such as purchases and payments.
Usually, entities correspond to individuals or organizations that are part of the setting, as the examples in the previous paragraph illustrate. However, more abstract entities characterizing a transaction may also be defined. Examples include procedure codes (describing the type of healthcare service rendered), disease groups and SIC codes (Standard Industry Codes).
The member of an entity is an individual instance of the entity. For example, a specific doctor is a member of the healthcare provider entity, a particular grocery store is a member of the credit card merchant entity and so on.
A target entity is the primary entity of interest for a given application. Usually, it is the focus of some type of analysis such as a statistical model or a rule. A target entity interacts with other entities through the transactions. Thus, in provider fraud and abuse detection, the helathcare providers are the target entity while the clients (patients), clients' families, other providers, etc are the entites interacting with the target entity. In credit card fraud, the merchant would be one example of a target entity (depending upon the type of fraud being analyzed) and the interacting entities then are the cardholder, the credit card issuer, etc. Alternatively, a point of sale terminal could be another type of target entity, and the cashiers who use the terminal would be the interacting entities.
As noted above, a transaction captures the information associated with an interaction beween a group of entities. A transaction may initially arise between two entities (e.g. a doctor and a patient) and then be processed by still other entities (e.g. a pharmacy providing a prescription and a laboratory providing a lab test required by the doctor). Different types of transactions will typically capture different types of interactions or interactions between different groups of entities. For example in the credit card setting, a purchase transaction captures the interaction between the cardholder and the merchant, while a payment transaction encapsulates the information regarding the payments made by a cardholder to the credit card issuer. Similarly, in healthcare, an outpatient claim represents the service received by a client (i.e. patient) from a provider as part of an office or home visit, while an inpatient claim encodes data regarding a patient's stay at a hospital or another facility.
The word “profile” literally means “to draw in outline”. In the context of the present invention, the word “profile” is used to denote a set of behavioral features (profile variables) that figuratively represents the “outline” of an entity. A profile may be understood as a summary of the historical (and/or ongoing) transactional behavior of the entity, which ideally eliminates the need to store the details of all the historical transactions that are summarized by the profile variables. The values of the profile variables can be used to characterize the different members belonging to that entity. The primary intention of a profile is to capture the behavioral characteristics of an entity's members as exhibited through the transactions, in as complete a manner as possible.
In order to perform a meaningful analysis in settings that are described by a large number of transactions (and supporting data), a rich characterization of the target entities based on their transactional activity is required. This process has two key aspects—                defining a set of profile variables for an entity, and        setting up a process to derive the values of these variables for each member of the entity using the relevant set of transactions.        
The profile variables that are thus defined and derived for an entity constitute that entity's profile, that is, constitute a summary of the entity's behavior. Thus, for instance, to build a model that assesses the risk of healthcare providers performing fraudulent/abusive activity, it is desirable to first define characteristics that would help distinguish fraudulent providers from legitimate providers and then build profiles for each provider that include their respective profile variables, derived from the relevant transactions, here claims. The method of transforming the raw transaction data into meaningful behavior features is significant to the effectiveness of any analysis that uses the derived features.
Each profile variable for an entity captures some aspect of the entity's behavior as observed through the transaction data. The comprehensiveness of a profile is determined by the diversity and depth of its profile variables.
A profile variable of an entity may be generally defined as follows:                A formulation that converts data from a set of transactions involving the entity to a scalar quatity that summarizes some aspect of that entity's transactional activity.        
Typically, a profile variable is derived by applying a distributional or statistical function to a series of numbers extracted either directly from the entity's transactions, or indirectly through an intermediate profile dataset. Note that a profile and hence a profile variable is generated for each individual member of an entity (e.g. in the case of healthcare providers, a profile will be generated for each individual provider). While the formulation of the profile variable is the same across all members of an entity, the value of the profile variable differs from one member to another depending on the specific transaction activity of the specific member. For example, one doctor (member of a healthcare provider entity) will likely have a different average number of services per month than another doctor (a different member).
The simplest general example of a profile variable for an entity is the number orf transactions. This is derived by applying the summation function to the series of numbers created (from the transaction dataset) by associating an indicator variable that is set to 1 for transactions in which the particular member of the entity is involved and set to 0 for all other transactions.
The specific set of profile variables that should be included in a profile is highly dependent on the application that the profiles are going to be used for. However, even though the interpretation and the relevance of the variables depends on the specific problem at hand, the general definition above applies to any setting, and enables the construction of a common framework through which profile variables may be derived. Common techniques and formulations cna be used to derive variables that have different interpretations in different environments.
For example, consider the healthcare application where the transaction is a claim, the entity is a healthcare provider and the profile variable is the average dollars paid to the provider per claim. This variable would typically be derived by summing the field in each transaction containing the dollar amount for that transaction, across all transactions of a member (provider) and then dividing by the total number of transactions for that member (provider).
Now consider the credit card environment, where the cardholder is an entity and each transaction represents a purchase made by a cardholder. Applying the same type of formulation (i.e. total spent by cardholders for all purchases divided by number of purchases) yields the average dollars spent by a cardholder each time the card is used for a purchase. If instead of the dollar amount, the field contained the time passed since the last transaction, then the same computation yields the average time between purchases for the cardholder. Although these are simple examples, they serve to illustrate the fact that the same mathematical formulation may be applied to derive profile variables in different settings for different entities.
In the past profiles have been created for individual entities and used to devlop statistical models based solely on the profiles of the invidual entities. For example, U.S. Pat. No. 5,819,226 discloses, among other things, the use of profiles of individual credit card account holders for modeling credit card fraud by such individuals. While this approach is useful for particular applications, in other applications it is desirable to understand the complex interactions between different entities. Accordingly, profiles based only on transactions of individual members of the entity are insufficient to capture these rich interactions between entities in a manner that yields statistically useful information for modeling the interactions between entities.