The present invention relates to extracting information from large data sets. More specifically, the present invention relates to a method for inferring behavioral characteristics of an individual entity based on a large volume of data about a multitude of entities.
Data mining is the name given to the process of extracting information from large data sets. It is a multi-disciplinary field that brings together the latest technology and research in database design, exploratory data analysis, algorithms for very large data sets, model fitting, visualization and systems.
Many will claim that data mining is nothing new. Database systems have been around for decades. Also, and for centuries, scientists have studied data sets for correlations and dependencies between the elements. They have developed models to describe data and invented ways of displaying both the data and the models.
However, data mining differs from xe2x80x9ctraditionalxe2x80x9d data analysis on one single, but important dimension: scale. The size of data sets has grown tremendously with modern computers and data storage capabilities. Ten years ago a large data set would consist of thousands of observations/cases stored in 10 MB. Today, a large data set consists of millions of cases stored in 10 GB. The size and complexity of the available data is often so overwhelming that most algorithms from machine learning, statistics, and visualization become impossible to use in practice. An important problem in data mining is to determine how to scale these algorithms. This problem is referred to in xe2x80x9cStatistical Inference and Data Miningxe2x80x9d, Glymour et al., Communications of the ACM, Vol. 39, No. 11 (Nov. 1996) 35-41 and xe2x80x9cExternal Memory Graph Algorithmsxe2x80x9d, Chiang et al., ACM-SIAM Symposium on Discrete Algorithms, 1996.
Large data sets are ubiquitous in science, technology, and the commercial sector as described in xe2x80x9cMining Business Databasesxe2x80x9d, Brachman et al., Communications of the ACM, Vol. 39, No. 11 (November 1996) 42-48; xe2x80x9cCitibank Mines Dataxe2x80x9d, Information Week, No. 600 (October 1996), and xe2x80x9cA Collaborative Filter Can Help You Mine Data For Jewelsxe2x80x9d, InfoWorld, Vol. 18, No. 49 (December 1996), 47. For ease of reference and explanation, the commercial sector is addressed since this is where the inventors"" recent experience lies. Here, the data sets may be the result of credit card transactions, retail sales, or calls carried by a phone company. In a highly competitive marketplace these data sets are an extremely valuable asset. From these data, businesses can learn who their customers are, where they are, what their needs are, how they use existing services and products, what makes customers stop using or buying the offered services and product, and what offers could attract new customers. Businesses that do not have a method for efficient recording, handling, and analyzing data are at a serious competitive disadvantage. Thus, data mining is more than a research area, it is a commercial need.
Unfortunately, often times since the number of commercial transactions is fairly voluminous, the amount of data recorded per transaction is somewhat limited. Thus accountingxe2x80x94like information might be retained for each transaction. But, it would be beneficial if further information about a transaction or the participants to the transaction could be gleaned from this data set.
The present invention provides techniques and methods for performing meaningful mining of gigabytes of data. In accordance with the present invention, a volume of data relating to a multitude of transactions is maintained. The volume of data is analyzed to determine inferences about characteristics of the entities or parties conducting the transactions.
Making inferences from data has been a long tradition in the field of statistics. The present invention provides the ability to make inferences from data when the size of the data set precludes conventional analyses. Data streams may comprise transaction records (transactions) that capture the salient details of a transaction between two entities. Examples of types of transactions to which the present invention can be applied include such things as credit card purchases; telephone call records, packet headers in data communications; and stock transactions.
The present invention further provides the ability to make inferences pertaining to dynamic behaviors.
In one embodiment of the invention provided to illustrated the principles of the present invention the transactions are calls along a communication network. Billing records for the calls are maintained. Characteristics about individual calling or called parties can be inferred from the billing records. Examples of such characteristics include the likelihood that the party is a business or a resident, the likelihood that the communication is a facsimile transmission, the likelihood that the communication is somehow fraudulent in nature or other similar types of characteristics about calls, or calling and/or called parties.