1. Field of the Invention
The present invention relates to systems and methods for managing data stored in a data warehouse, and in particular to a method and system for providing a simplified description of logical processing of user data stored in the data warehouse.
2. Description of the Related Art
Database management systems are used to collect, store, disseminate, and analyze data. These large-scale integrated database management systems provide an efficient, consistent, and secure data warehousing capability for storing, retrieving, and analyzing vast amounts of data. This ability to collect, analyze, and manage massive amounts of information has become a virtual necessity in business today.
The information stored by these data warehouses can come from a variety of sources. One important data warehousing application involves the collection and analysis of information collected in the course of commercial transactions between businesses and consumers. For example, when an individual uses a credit card to purchase an item at a retail store, the identity of the customer, the item purchased, the purchase amount and other related information are collected. Traditionally, this information is used by the retailer to determine if the transaction should be completed, and to control product inventory. Such data can also be used to determine temporal and geographical purchasing trends.
Similar uses of personal data occur in other industries. For example, in banking, the buying patterns of consumers can be divined by analyzing their credit card transaction profile or their checking/savings account activity, and consumers with certain profiles can be identified as potential customers for new services, such as mortgages or individual retirement accounts. Further, in the telecommunications industry, consumer telephone calling patterns can be analyzed from call-detail records, and individuals with certain profiles can be identified for selling additional services, such as a second phone line or call waiting.
Additionally, data warehouse owners typically purchase data from third parties, to enrich transactional data. This enrichment process adds demographic data such as household membership, income, employer, and other personal data.
The data collected during such transactions is also useful in other applications. For example, information regarding a particular transaction can be correlated to personal information about the consumer (age, occupation, residential area, income, etc.) to generate statistical information. In some cases, this personal information can be broadly classified into two groups: information that reveals the identity of the consumer, and information that does not. Information that does not reveal the identity of the consumer is useful because it can be used to generate information about the purchasing proclivities of consumers with similar personal characteristics. Personal information that reveals the identity of the consumer can be used for a more focused and personalized marketing approach in which the purchasing habits of each individual consumer are analyzed to identify candidates for additional or tailored marketing.
Another example of an increase in the collection of personal data is evidenced by the recent proliferation of xe2x80x9cmembershipxe2x80x9d or xe2x80x9cloyaltyxe2x80x9d cards. These cards provide the consumer with reduced prices for certain products, but each time the consumer uses the card with the purchase, information about the consumer""s buying habits is collected. The same information can be obtained in an on-line environment, or purchases with smart cards, telephone cards, and debit or credit cards.
Unfortunately, while the collection and analysis of such data can be of great public benefit, it can also be the subject of considerable abuse. In the case of loyalty programs, the potential for such abuse can prevent many otherwise cooperative consumers from signing up for membership awards or other programs. It can also discourage the use of emerging technology, such as cash cards, and foster continuation of more conservative payment methods such as cash and checks. In fact, public concern over privacy is believed to be a factor holding back the anticipated explosive growth in web commerce. For the foregoing reasons, a privacy-enhanced data warehouse has been developed, as described in the above cross-referenced patent applications.
As can be seen by the foregoing, the protection of private data is a growing consumer issue around the world. This consumer issue is reflected in legislation in many countries, which place certain requirements on organizations that collect, process, and disseminate information.
Much of this legislation is based on European Union (EU) Directive 95/46/EC regarding xe2x80x9cthe protection of individuals with regard to automatic processing of personal dataxe2x80x9d, which went into effect in October 1998. One of the requirements of the EU Directive relates to explaining automated decisions: the data subject""s right of access to data includes the right to obtain xe2x80x9cknowledge of the logic involved in any automatic processing of data concerning himxe2x80x9d. This applies in particular to decisions that produce significant legal effects or evaluate certain personal aspects, such as xe2x80x9cperformance at work, creditworthiness, reliability, conduct, etc.xe2x80x9d. A similar requirement exists in the US and some other countries relative to credit decisions, whereby a financial institution declining credit is obligated to be able to explain the reasons for the decision.
Businesses frequently make automated decisions of this nature. Banks and other institutions that extend credit typically perform some kind of credit scoring. Automated checks for potential fraud or misuse are often made on credit card transactions, sometimes in real time, with the transaction being either denied or referred to a human for review. Similarly, long distance or mobile phone call activity is often monitored for possible fraudulent usage, sometimes resulting in the service being cut off.
Many institutions today use data mining techniques to help make better automated decisions. Data mining algorithms can usually handle more data and more variables than can a human, in determining the criteria for a decision. For example, in credit scoring, a human might be able to discern that high income, home ownership, and number of children are key indicators of credit risk, and develop a simple set of rules based on the observed factors. But data mining techniques can examine large numbers of cases each with tens or hundreds of such variables, and pick out and blend the five or ten variables that are the best propensity indicators. These would then typically represent a better set of xe2x80x9crulesxe2x80x9d for the credit scoring function. The most common data mining techniques used today in building such xe2x80x9cpropensity modelsxe2x80x9d are neural networks and decision trees. Neural networks produce a model that combines the factors it selects into a xe2x80x9cfuzzyxe2x80x9d decision matrix that is not well understood by humans.
Conversely, decision trees, produce a model that is well understood by humans, and can in fact be converted into a set of rules (two of which might be: xe2x80x9cif income is greater than $50,000, and if homeowner, and if married with no more than two children, then credit risk is goodxe2x80x9d; and xe2x80x9cif income is greater than $50,000, and if not homeowner, and if not married, then credit risk is goodxe2x80x9d).
Decision trees are more explainable than neural networks. From a decision tree model, it is possible to create a full set of rules that govern the decisions made for all cases that are passed through the model. It is thus possible to explain credit decisions in a general sense, based on the set of rules that apply to all cases. But if an individual wanted an explanation of the specific reason(s) that applied in his case, this would not be possible without examining the set of rules and the specific set of data relating to the individual. This would be a difficult and laborious process, requiring some effort to gather up all of the relevant data for the individual, and to then examine the data and the rule set to determine which rule applied in this case.
From the foregoing, it can be seen that there is a need for providing a description of the logic that was used make an automated processing decision based on customer""s personal data. The present invention satisfies that need.
To address the requirements described above, the present invention discloses a method, apparatus, and article of manufacture for providing a description of logic used in determining an outcome based on automatic processing of data.
The method comprises the steps of hierarchically applying a series of decision criteria to the data to arrive at the outcome, while recording a rule determined from application of each decision criteria to the data, and retrieving the recorded rules. The article of manufacture comprises a data storage device tangibly embodying instructions to perform the method steps outlined above, and the apparatus comprises a means for performing these method steps
This provides an automated way of recording the rule that applies to each decision made by a decision tree model, thereby making all decisions easily explainable. During execution of the decision tree, as each case is passed through the tree, the rules that apply to the case are recorded, along with the decision.