The present invention relates to systems for creating and maintaining a database of audience members, and in particular to systems for creating and maintaining such a database from source databases containing contact information about audience members.
A growing trend in business is to improve the quality of communication provided to individuals with whom a business interacts, including, but not limited to, customers, potential customers, employees, shareholders, service providers and other stakeholders. One way of improving such communications is to ensure the information a business has about each individual is complete and up to date. Many businesses maintain multiple databases containing information about individuals with whom they communicate. For example, information about a business"" employees might be found in a human resources database, a department database, a retirement plan database and an insurance database. Alternatively, information about a business"" customers might be found, for example, in an externally purchased mailing list database, a trade show lead database, a salesperson""s contact manager database, and a customer service department database. In each case, information in these databases is often replicated and out-of-date. Moreover, different databases are often updated by different departments of an organization, and a change by one department""s database about a particular individual or business may not necessarily cause the same information about the individual or business in a different department""s database to be updated. Moreover, when records from different databases, but which contain information about the same individual, are inconsistent, it is difficult to determine which database is more likely to contain accurate data. This results in inaccurate or incomplete communications from the business to the recipient of the communication. Such inaccurate information diminishes the quality of communications with individuals, for example, for one-to-one marketing campaigns.
Data warehousing is a recently popular technique for assimilating information from disparate databases. However, databases created as a result of data warehousing generally have data no more accurate than the most accurate database from which the newly created database is derived. Accordingly, it is desirable to provide a system that can assimilate data from multiple data sources while maintaining data accuracy.
Moreover, for large businesses with millions of customers, exhaustively searching their database for prospective matches can quickly grow to an unmanageable task. Comparisons of fields in database records take time, and repeating such comparisons millions of times for each record to match is not practical.
In one embodiment, the invention comprises a software system and method comprising a plurality of source databases, each source database comprising, a plurality of source audience member records, and a plurality of source fields for each source audience member record; a target database comprising: a plurality of target audience member records, at least some of which identify the same audience member identified by at least one source database audience member record, and a plurality of target fields for each target audience member record. The system also includes means for mapping at least one target database field to corresponding source fields of each of a plurality of source databases, and, for each such mapping, means for ranking a relative priority between (a) the source fields of the each of the plurality of source databases and the target field, and (b) the target database field. Software provides for selecting from at least one of the plurality of source databases, a source audience member record that matches a target audience member record, and means for updating the fields in the target matching record of the target database from multiple mapped fields in the plurality of source databases, including selecting, from among the source database fields to which the target database fields are mapped, the highest ranked priority fields.
A method provides for generating, for a source audience member record having at least first and second fields, a set of matching candidate audience member records having multiple fields from a target audience member database, wherein there is no pre-defined relationship between the source audience member record and any record of the target audience member database, comprising the steps of:
a. providing first and second indices to the target audience member database based on at least first and second fields of the target database;
b. specifying a match-closeness parameter;
c. generating multiple references to records of the target audience member database by querying the first index for similarities based on the first field of the source audience member record, the quantity of multiple references being responsive to the match-closeness parameter; and
d. generating additional multiple references to records of the target audience member database by querying the second index for similarities based on the second field of the source audience member record, the quantity of additional multiple references being responsive the to the match-closeness parameter.
Additionally, software provides a method for updating a target record having at least three fields of an audience member database that identifies the same audience member of a source audience member record having at least three fields, wherein there is no pre-defined relationship between the source audience member record and any record of the target audience member database, comprising the steps of:
a. providing at least two non-encoded field indices to the target audience member database based on at least first and second fields of the database;
b. providing at least one encoded field index to the target audience member database based on a third field of the database;
c. querying each of the at least two non-encoded indices for matches to a field of the source audience member record, and storing references to target audience member database records having matching value fields in the set of matching candidate audience member records;
d. encoding a value of a field of the source audience member record that corresponds to an encoded index of the target audience member database;
e. querying the at least one encoded field index for matches to the encoded field value of the source audience member record, and storing references to target audience member database records having matching encoded value fields in the set of matching candidate audience member records;
f. selecting one of the references to a target audience member database records; and
g. replacing at least one field in the record of the target audience member database that matches the selected reference, with the at least one corresponding field of the source audience member record.
The software further provides a method for selecting one of a set target audience member records and updating at least one field of the selected record with information from at least one field of a source audience member record, comprising the steps of:
a. mapping a plurality of fields from the source audience member record to corresponding fields in the set of target audience member records;
b. for at least some of the plurality of corresponding mapped fields, generating encoded representations of such fields from the source audience member record;
c. for each record of the set of target audience member records:
1. and for each mapped field of each target audience member records:
A. comparing the field or its encoded representation to the corresponding field or its encoded representation of the source audience member record;
B. assigning a match score value to the target audience member record for the field based on the extent of the match;
2. aggregating the plurality of match scores for the target audience member record;
d. selecting the target audience member record having the highest aggregated match score value; and
e. updating a plurality of fields in the selected target audience member record with information from corresponding fields from the source audience member record.