The field of the invention is relationship modeling between multiple types of entities, in particular household constituencies, for data services provided in a computational environment with a very large number of records that must be processed.
Consumer marketing efforts today are extending across multiple channels, including, for example, on-line advertising, text advertisements, telephone calls, digital television, and other targeted forms of advertising. These efforts are also expanding from a focus on a single individual to groups of individuals that share a common social/economic relationship. Several products now offer models of household constituency and attributes in order to facilitate household-based marketing. These products are based first on identifying individuals and their postal addresses from a variety of available representations and attribute information. Once this is done, both direct and indirect evidence of common relationships that culminate in the construction of a representative household must be gathered and interpreted. Household-based marketing offers the potential advantages of reduces cost of mailings. In addition, it may offer computational efficiencies if households are identified more accurately, since a smaller data set of households will result if more individuals are accurately categorized in a single household rather than inaccurately treated as separate entities.
The data to support householding efforts are gathered from consumer generated forms, such as surveys, as well as public data sources such as telephone directory information. Also, a variety of data is compiled from larger sets of such information generated by marketing and business sources for the direct intention of marketing services. This data is primarily compiled and interpreted in terms of single point-in-time (PIT) instances of records that contain personally identifiable information (PII), i.e., each included individual is represented by a single record intended to represent a single “here and now”snapshot of the individual's representation and attribute. Therefore, the recency of the information for such data sets is highly critical in order to determine accurate residence addresses and association data such as telephone numbers, current name used by the individual, and age. Unfortunately, in spite of all efforts, collections of such data continue to contain a significant amount of “stale”or incorrect information, and the identification of such records is an extremely difficult if not impossible task.
Compounding this difficulty to collect and validate trusted PII is the fact that there will always be transcription errors and compilation misinterpretations that create records that appear legitimate in isolation but creates significant ambiguity when aggregated with other PII records. Such errors can include digit mistyping or flipping in personal identification strings, dates, and names (which can actually change the perceived gender of the represented individual), as well as representations that are difficult to determine if the name representation string identifies a single individual or a pair of individuals.
In an attempt to mitigate the data problems just identified, some data services attempt to create a hierarchy of “trusted” sources from which quality decisions concerning individual PII records are made based on the ranking of the associated source. But the overall quality of a data source does not necessarily translate to the data quality of any single PII record, and it is not uncommon to create a set of PII records from a small set of highly trusted sources whose actual accuracy falls well below that of any one of the sources individually.
Individuals change their PII representations and attributes for a variety of reasons. This can happen due to marriages, divorces, moves, and changes in cellular telephone numbers. Similarly, individuals often create multiple “views” of themselves that they wish to be kept separate, such as using a name variant and a post office box address for all financial and legal business transactions and a different name and address for specific personal transactions. Hence ambiguity in sets of PII and associative data is not necessarily an indicator of any difference in the quality or recency of the corresponding PII records, further complicating householding efforts.
Moving from the identification of consumers and their most recent postal address to properties of representative households, individuals' moves from one location to another do not necessarily imply that the associated household has broken or become significantly different in terms of its attributes. Once representative households are identified and household links (unique identifiers) are assigned, these links must be carefully persisted (i.e., maintained) in these cases, as the assignments of new links primarily imply a significant change in the constituency of the entity. For example, the definition above requires a common residence for the individuals; however, a common change in the specific residence does not change the household. Similarly, a change in the name representation of an individual may or may not trigger a change in the associated household. Current householding methods perform poorly in these scenarios.
As changes in social and economic relationships that affect marketing decisions also affect the resulting real-world households, the accurate and timely identification of such changes are important for users of a household relationship product. These relationship changes are often difficult to identify from a single PIT PII framework. Hence, the inventors hereof have recognized that significant improvements in the state of the art require a rich framework that includes data and metadata not capable of being captured in traditional PIT data sources.
Acxiom's Entity Graph Resolution Repository (EGRR) is a non-discoverable repository that allows for resolution of entities, where each entity consists of a set of PII representations, attributes, and metadata. These entities are given a persisted and maintained identification link using Acxiom's proprietary linking technology. (This linking process is described in certain implementations in U.S. Pat. Nos. 6,523,041 and 6,766,327, which are incorporated by reference herein in their entirety.). For purposes of this invention the primary entities represent “consumers” and “addresses” (consumer link, i.e., CL, and address link, i.e., AL). The EGRR contains PII representations that can be interpreted from a temporal perspective that is not possible from localized PIT data. On access to a particular entity representation in the EGRR, its internal metadata captures and aggregates data over a long-term for a fixed time period. This aggregated data is used to infer possible changes in the behavior of the entities they represent. This method helps us get a historical view of possible entity representation changes that cannot be simulated with PIT data. In particular, the inventors hereof have recognized that this broad and anonymized coverage could be leveraged to construct representative households for every consumer link in the EGRR that directly addresses all of the issues noted earlier.
This aggregated metadata contains a time sequenced (temporal) set of entity representations for an individual that both enriches the PIT data context and can directly identify and validate changes in PII information at a very granular level. The EGRR offers a several such temporal views of each consumer relative to their identified PII and attribute data extending over many years. Finally, the sources that represent partial temporal PII information publish only the most recent changes in PII such as changes in address, and hence provide independent confirmatory information.
What is desired then is a system and method that leverages the vast data source represented by the EGRR or a similar data store in order to build and maintain effective representative households using both PIT and temporal data within a computationally efficient contextual framework.