1. Field of the Invention
The present invention generally relates to techniques for compiling computer-processable market research databases and, in particular, to techniques for gathering and linking investor-specific data from multiple data providers without their disclosure of investor-identifying information.
2. Background Description
It has long been commonplace among marketers of consumer products and services to acquire and analyze externally-generated information about their markets in order to gain insight into market trends, consumer preferences, relative competitive performance, the behavior of their customers, and other metrics, with the objective of improving marketing decision-making Among the methods that have been developed over the years for creating information of this kind, two of the most widely-used are Survey Research and Database Compilation, which can be briefly characterized as follows:
(i) Survey Research—This method entails drawing a scientifically-designed sample of respondents from a certain study population, administering a questionnaire to the respondents to obtain answers to various questions, making statistical projections of the responses to estimate what results would be obtained if such questions were posed to the entire population under study (rather than just a sample thereof) and then creating a range of reports/analyses based upon the survey results to satisfy information needs of the various end-users.
(ii) Database Compilation—This method entails gathering data provided by a set of organizations that participate in an industry (usually extracted from the computerized customer-related data records maintained by such organizations), integrating such data to form a database (a “Multi-Source Database”), and then creating a range of reports/analyses based upon the data in such Multi-Source Database to satisfy information needs of the various data providers.
Discussion of Survey Research
Survey Research is perhaps the earliest and most widely-practiced method of systematic market research within the business world. It is by now a very well-established method of estimating the characteristics of a population of interest or of any segment thereof (such as buyers of luxury automobiles, or retirees, or families with high incomes and children in school).
Typically, the respondents to a survey are not identifiable to the end-users of the survey results. Instead, each respondent data-record includes a set of respondent attributes, which provide a means for classifying respondents so that tabulations/cross-tabulations of responses can be produced that reveal differences among various types or classes of consumers (such as demographic groups, geographic areas, behavioral classes, or any combination of any of the foregoing).
The process typically involves the following set of steps: [a] stratifying the population into relatively homogeneous strata, [b] setting a sampling ratio for each stratum, [c] recruiting a panel of respondents in each stratum to fulfill the sampling plan, [d] assigning a weight to each respondent according to the population size of the stratum the respondent represents, [e] designing and testing a standard questionnaire, [f] administering such questionnaire to each of the respondents, [g] editing the results and loading them into a database of combined respondent records, [h] statistically projecting the results, based upon the weight assigned to each respondent, to estimate the behavior, attitudes, etc., of the various segments of the population of interest, and then [i] generating a set of statistical reports/analyses based upon such data for use by the intended end-users of the survey.
Strengths of Survey Research
The main strength of the Survey Research method is that information about a large population can be gathered through the sampling/projection procedure on a very economical basis, as compared to the cost of surveying the entire population. In addition, this method is highly flexible in terms of the nature of questions asked, being limited only by the imagination of the questionnaire designer and the willingness and/or ability of the respondents to answer the questions posed and to do so accurately.
Limitations of Survey Research
The most important limitation of Survey Research is that the survey process is subject to bias, which can be so profound as to render the survey results invalid. Two main types of bias exist: [a] sampling bias, which arises when the sample of respondents is not representative of the population under study, and [b] response bias, which arises when respondents are either unable or unwilling to accurately respond to questions asked.
There are multiple, and oftentimes intractable, causes of bias in survey work, and it is almost always difficult (or sometimes even impossible) for a survey researcher to know the extent to which survey results have suffered from such distortions since there usually is no objective standard against which to judge the results. Often, the only means by which the validity and accuracy of a survey's outcome can be gauged is by comparing the results to those obtained from similar surveys conducted at other times, or, in the absence of comparable survey results, by making common-sense evaluations of the results. However, both of these approaches are notably fallible ways of determining if a given survey can be relied upon as accurately reflecting the population under study or if, on the contrary, the results are a misleading distortion of reality or, even worse, an invalid (and therefore potentially dangerous) misrepresentation of reality.
Another concern in using Survey Research is that the statistical reliability of projections of survey results depends upon respondent sample sizes. The required sample size usually varies from the hundreds of respondents to the low thousands of respondents, depending upon the percent-incidence in the population of the behavior and/or characteristics to be measured and the degree of projection accuracy required. In cases where the researcher desires to obtain estimates for distinct segments of the population, a comparable sample size is frequently needed for each segment to be measured. Given that survey costs generally vary proportionally with the number of respondents interviewed, the cost of surveying the overall population of interest must be multiplied by a factor that is proportional to the number of mutually-exclusive segments to be measured.
This characteristic of Survey Research makes the method especially challenging from a cost perspective when researchers need to measure many local markets and a variety of discrete segments within them. The prohibitively high cost of conducting a survey on such a large scale is the main reason for the lack of surveys providing extensive geographic and segment detail.
Implications
The Survey Research method has enjoyed broad acceptance and application in a variety of industries as a tool for gaining insight into, and understanding of, consumer/buyer behavior. This is primarily because of the flexibility of content and administration of survey questionnaires and the relative cost-effectiveness of this method in situations where information is sought concerning a certain population in aggregate, or with respect to broad segments of such population. However, when information is required concerning very detailed geographic areas or narrow segments of the population, or when a high degree of accuracy is sought with respect to the study subjects' actual behavior, Survey Research has proved to be less effective than other methods of market research (notably, Database Compilation).
Discussion of Database Compilation
Database Compilation, as a method of conducting market research, has become more and more widely used as the use of computer technology has spread.
Depending upon the analytical purposes for which a Database Compilation is created, data from multiple sources may be merely pooled (i.e., amassed to form a database without interrelating source data records) or both pooled and linked (to further integrate the data by logically associating various sets of source data records received from different data providers).
In cases where input data are merely pooled into a database, reports/analyses can be produced either by comparing the data received from various data providers or, alternatively, summing the data reported so as to calculate totals for all data providers combined.
In cases where data sources are to be linked (after having been pooled), the Database Compilation process includes the creation of a composite logical data record for each person or other entity with respect to which data have been provided (each such person or other entity, a “measured entity”), by linking data received from all data providers that pertain to such measured entity. This procedure enables the creation of reports/analyses based upon a more complete picture of the behavior and characteristics of each measured entity than would be possible using the data from each data provider separately.
Examples of Database Compilations
The present-day practice of Database Compilation takes many forms. Three broad types can be cited to illustrate:
(i) Trade Association Compilations: Numerous trade associations gather data from their members and pool the data to create reports/analyses that satisfy common needs of their members.
One well-known example of this is the flow-of-funds compilation operated by the Investment Company Institute (the “ICI”) to track, at the national level, the flow of investment capital into and out of mutual fund families and individual funds. For this purpose, the ICI gathers from each mutual fund group a set of statistical measures related to each fund managed by such fund group and combines the data into a Multi-Source Database in which individual mutual funds and fund groups are measured entities and the input data are pooled for time-series analysis purposes at the level of individual fund and fund group.
From this database the ICI provides to each participant a tracking report that reveals the fund industry's overall status and progress, the trends of the market, and the competitive performance of each fund group.
In this application, there is no need to link data across data sources because the data pertaining to each fund are reported exclusively by the single applicable fund group and, therefore, require only to be pooled before reporting/analysis is performed using the data.
(ii) Government Compilations: The federal and state governments in the U.S. cause useful databases to be compiled for purposes of supporting regulatory functions in their jurisdictions. In many cases such databases are made available to the public for other private uses.
One well-known example is the FDIC's quarterly compilation of bank financial statements and other data. For this program, the FDIC gathers from each bank it regulates a computerized data feed reporting detailed balance sheet information and branch-level metrics, among other data, which it combines into a Multi-Source Database where individual banks and bank holding groups (and, for certain data, bank branches) are measured entities and the input data are pooled for comparative time-series analysis of all measured entities.
The FDIC uses the database for regulatory oversight purposes. Commercial enterprises then acquire the database and repackage the data it contains into products that are marketed to financial institutions and other commercial users for a range of purposes.
For this program, the FDIC does not link data across data sources because the data that pertain to each measured entity are only reported by the institution of which such entity is a part, and the data therefore require only to be pooled before reporting/analysis is performed using the data.
(iii) Commercial Compilations: Many commercial enterprises have established Database Compilation programs by gathering data from firms participating in an industry and then pooling, or pooling and linking, the data obtained in order to create a database from which reports/analyses can be generated to satisfy various information needs of the participants.
An example of a pooled database is the database of financial-advisor performance metrics compiled by the McLagan company. That company gathers data from U.S. securities brokers at the level of individual financial advisor, including a range of metrics related to revenue production and other performance criteria. McLagan pools the data received in order to create a Multi-Source Database in which the financial advisors are treated as measured entities. Because data for each financial advisor can only be provided by the brokerage firm for which he/she works, there is no need (or ability) to link the data across the data providers at that level.
McLagan uses the database to generate comparative reports at the level of individual financial advisor, enabling each participant to assess the performance of each of its individual financial advisors relative to his/her peers in the same geographic market area. McLagan also tabulates aggregate statistics from its database to make broader reports/analyses for its data contributors.
An example of a linked database is the database of financial asset data compiled by IXI Corporation. This firm gathers data from a variety of financial institutions at the ZIP+4 Code level, including data related to customer financial assets, broken down by investment product. IXI pools the data it receives into a Multi-Source Database in which ZIP+4 Codes are treated as measured entities. Because more than one financial institution may provide data with respect to any given ZIP+4 (each doing so to the extent that it has at least one customer who receives statements at an address in such ZIP+4), IXI links its input data across data providers at the ZIP+4 level, thereby enabling it to calculate wealth ratings (both aggregate and average) for each such geographic unit. IXI uses such data in a variety of applications, which include rating consumers based upon the average wealth level of their ZIP+4 neighborhoods and summing the combined data to higher levels in order to produce aggregate statistical analyses.
Strengths of Database Compilation
The principle strengths of Database Compilation as a market research method result from the fact that the source data provided for creating databases in this manner are extracted from computerized records maintained by the participating data providers. As such, input data can be far more accurate and precise than is typically possible using the Survey Research method. As an example, to calculate how much money consumers have invested in financial assets, it would clearly be advantageous to tally the actual statement balances of millions of investors, as reported by the institutions holding such assets, rather than to have to rely upon a few thousand survey respondents to recall balances and be willing to truthfully and accurately report such data, or to rely on the assumption that the persons agreeing to be respondents are actually representative of the population from which they were drawn.
In addition to the inherent accuracy advantages of Database Compilation, there is also the potential for virtually unlimited depth of detail with respect to geographic breakdowns, behavioral classification and other dimensions of segmentation because it is conceptually possible to create a database using every transaction record or account record held by the data providers, rather than using only a sample of records. By contrast to Survey Research processes, once a Database Compilation process has been established, the cost of expanding the quantity of subject records acquired and processed is typically trivial. As a consequence, a large-scale Database Compilation can usually be created at a substantially lower operational cost than is possible when conducting a survey of comparable scope and depth of detail.
Limitations of Database Compilation
Notwithstanding these compelling advantages, a number of natural limitations of Database Compilation can make this research method unsuitable for certain applications. One of the most important limitations is that a Database Compilation can be created solely with respect to data that are captured/maintained by the data providers. The method is therefore suitable for applications based upon data in customer accounts or transactions (for example) but not suitable if the data of interest are not captured, such as customer buying intentions or attitudes (the latter being research interests where the Survey Research method can be effective). In addition, data providers would not normally be able to provide data with respect to consumers who are not yet their customers, while survey researchers can theoretically interview virtually anyone (i.e., both customers and non-customers).
The functional possibilities of a Database Compilation may also be limited by the level of granularity of analysis enabled by the data providers through the structure and content of the data they provide. Any such built-in structural limitations are a function of how the input data records are coded, which reflects the purpose of the database in addition to any policy-based constraints the data providers elect to apply, it normally being the case that data providers do not disclose any more information about their customers than is strictly necessary to fulfill the purpose for which the data are being gathered and are to be used.
There are two main ways in which such limitations are imposed, as described in more detail below.
(i) Pre-aggregation—In some cases, input data are pre-aggregated to the lowest level at which data analysis is to be permitted. As an example, the data providers for the McLagan database might pre-tabulate (or otherwise summarize) their data by financial advisor before submission of their data. In such event, the McLagan database would not be capable of supporting reports/analyses based on entities at more granular levels of detail, such as individual customers or accounts.
(ii) Pre-coding—In other cases, the data may be submitted in disaggregated form (e.g., in the form of individual transactions) but coded so that entity-identifying codes are included only for the levels at which analysis is to be permitted. In such cases, all entity-identifying codes for lower levels of aggregation are excluded (or otherwise removed) from the data before submission, thereby making it impossible to conduct analysis at such levels. For example, data providers for the IXI database might provide disaggregated data records but pre-code each data record with the ZIP+4 Code of the customer to which the data pertain, excluding all personally-identifying information. In that way, IXI would be able to link data across data providers at the ZIP+4 level and tabulate data for that level, while not being able to link data at a more granular level, such as by household or individual person.
Implications
In sum, then, Database Compilation can be a very effective method for gathering information when data of interest are recorded and maintained in the computerized records of the firms in an industry. Moreover, this method can be used to create information of such scope, depth and accuracy as to make it economically infeasible or even impossible to create anything comparable using Survey Research.
In the case of the McLagan database, for example, financial advisors could be surveyed, but they would likely be unable (and perhaps unwilling, as well) to report their commission and other revenue production for each specific time period with an adequate degree of accuracy. Moreover, since it is essential (for the applications intended) to obtain information about each specific financial advisor, it is a far simpler solution to gather revenue/commission data from the computerized records of the brokerage firms that employ such financial advisors rather than to attempt to collect comprehensive and accurate data of this kind by surveying the financial advisors themselves.
A similar comparison to Survey Research could be made with respect to the other Database Compilations described as examples above. All of these examples demonstrate special strengths of Database Compilation as a market research tool and reveal the reasons why this method has emerged as an important complementary tool to Survey Research.
A Special Case: The U.S. Wealth Management Industry
Recent decades have brought exceptionally rapid growth in the U.S. to the retail wealth-management industry, which includes securities brokerage firms, mutual fund groups, retail banks, insurance companies and other firms that hold cash or invested financial assets on deposit for consumers. In response, many efforts have been launched to use Survey Research to track industry trends, gather useful insight into the behavior of investors, and gauge the effectiveness of marketing programs. However, through time, two key issues have emerged that, taken together, constitute a formidable obstacle to the acceptance of Survey Research as a credible and dependable method of creating market information for retail wealth-management firms.
Limitations of Survey Research
The first issue is the difficulty experienced by survey researchers in securing the cooperation of the affluent and wealthy to function as effective survey respondents. A high percentage of such individuals simply refuse to join respondent panels. Of those who agree to do so, many fail to complete the interview process (often as the result of “respondent fatigue,” concerns over how their answers might be utilized, or sensitivity to the questions being asked: e.g., “What is your approximate net worth?”). In addition, due to the all-too-frequent lack of plausibility of many survey “findings,” financial institutions have come increasingly to suspect that affluent and wealthy respondents, cautious or concerned about how their personal wealth information may be used, misrepresent and consistently understate to a substantial degree the assets they hold when questioned about them in surveys. The net effect of these phenomena is that many financial institutions doubt the representativeness (i.e., lack of bias) and accuracy of self-reported data obtained through surveys of affluent and wealthy individuals and are therefore reluctant to rely on such data for corporate decision-making.
The second issue is the prohibitively high cost of surveying with sample sizes large enough to yield statistically-reliable results for each significant geographic locality and market segment of interest to the wealth-management industry. This limitation is especially significant for the many wealth-management firms engaged in marketing through branch systems in local markets that differ sharply in demographic composition and competitive conditions.
Given the foregoing, and the fact that financial institutions participating in the wealth-management industry are almost exclusively interested in knowing about the affluent and high-net-worth segments of the market, it is all but inevitable that Survey Research would be accorded a far less significant position in the wealth-management industry than in other consumer-oriented industries, such as packaged goods, automotive, media and telecommunications.
Uses of Database Compilation
By contrast, the Database Compilation method has been broadly adopted for market research programs in the wealth-management industry, with applications for a variety of measured entities. As noted earlier in connection with the ICI, FDIC, McLagan and IXI databases, information has been gathered for mutual funds, competing firms, geographic areas, ZIP Codes, ZIP+4 Codes and bank branches as measured entities, among others. Some of these databases are focused on producing aggregate statistics to track an industry's totals and trends. That application is a noteworthy strength of Database Compilation (relative to Survey Research) because the use of actual customer records as source data makes it possible to achieve a very high level of accuracy in the resulting database (presuming of course that there is a high level of participation among the possible sources of data).
However, in order to compile a database that is capable of filling the information gap caused by the key shortcomings of Survey Research (relative to the needs of the wealth-management industry), source data would have to be gathered from multiple financial institutions, pooled and then linked at the individual investor level. If the provided data are not linked in this manner, then it would not be possible to conduct accurate analysis of investors' behavior (whether at an individual or aggregate level) since many investors allocate their financial assets across multiple financial institutions, thereby fragmenting their investment portfolios. In such cases, an accurate view of an investor's portfolio can be created only by joining together (i.e., linking) the various parts held by different institutions.
The linkage of data across data providers at the individual investor level would enable the tabulation/cross-tabulation of data using investor-specific data records in much the same way as is done with survey-respondent data records. A database compiled in this manner would combine the analytical flexibility that is the hallmark of the Survey Research method with the accuracy and depth of detail that is the hallmark of the Database Compilation method. In spite of the considerable promise such a database would hold, there is hardly an example in the prior art, and the reasons for this are well-known, as explained below.
Consumer Privacy Issues and Regulation
Traditionally, financial institutions have been highly sensitive to the confidentiality commitments they have made to their customers. In addition, they have been very cautious when taking the business risk of exposing their customer-identifying information to processes that could (whether inadvertently or as a result of malfeasance) disclose their competitively sensitive data to a third party. Moreover, following the enactment by Congress of the Gramm-Leach-Bliley Act (“GLB”) in 1999, the disclosure by a financial institution of any customer-identifying data to a commercial enterprise for the purpose of compiling a Multi-Source Database would constitute a violation of GLB-based regulations. Given these factors, it is hardly surprising that the financial industry has not proven to be a fertile ground for the development of Database Compilations at the individual investor level.
The credit bureau industry could be considered a noteworthy exception to the foregoing in that the databases used in the field are created by gathering credit-history data from many different credit grantors and then pooling and linking such data at individual-consumer level by matching records using identifying information such as name and address, or other identifiers. However, such activities fall under the jurisdiction of the Fair Credit Reporting Act (“FCRA”), not the GLB legislation and, as such, do not constitute a true exception.
Some attempts have been made to initiate similar Database Compilations outside the scope of the FCRA, in which financial data other than credit data would be linked at the individual investor level. One case is a project initiated by Abacus Corporation over a decade ago to pool and link customer-related information it would gather from financial institutions. Abacus had earlier developed a database using catalog purchase records provided by various catalog marketers, which it linked at the person level via name-and-address matching. This yielded a composite purchase-history record for each distinct catalog buyer and a highly productive resource for targeted-marketing purposes. Abacus sought to extend its model into the financial industry but did not succeed. At first, financial institutions were reluctant to participate based on privacy policy considerations. With the passage of the GLB legislation, the Abacus model became legally infeasible and the program never reached a marketable status.
A Gap in Technology
There remains, then, a critical gap in Database Compilation methods used to create market information for the wealth-management industry. In order for a Database Compilation program to fill that gap, a new method is needed that makes it possible for financial institutions to provide investor-specific data to a database compiler, and for the database compiler to link such data across data providers at the individual investor level to create composite investor-specific logical data records, and do so without causing a financial institution to violate GLB-related regulations or to expose its confidential and competitively-sensitive customer information to significant risk of improper disclosure (whether inadvertent or otherwise).