The consumer lending industry bases its decisions to grant credit or make loans, or to give consumers preferred credit or loan terms, on the general principle of risk, i.e., risk of foreclosure. Credit and lending institutions typically avoid granting credit or loans to high risk consumers, or may grant credit or loans to such consumers at higher interest rates or on other terms less favorable than those typically granted to consumers with low risk. Consumer data, including consumer credit information, is collected and used by credit bureaus, financial institutions, and other entities for assessing creditworthiness and aspects of a consumer's financial and credit history.
In many emerging and developing markets, the available consumer data may be of a lower quality as compared to consumer data available in developed markets. For example, records of consumer data may not include a unique identification number, formats of addresses may vary, dates of births may be unreliable or non-existent, name conventions may vary, and particular names and surnames may be very popular and duplicated among a large number of people. Traditional consumer data search algorithms that are often used in developed markets do not always perform well on consumer data in emerging markets. Such traditional algorithms rely on consistent formatting of consumer data, more complete information, and information that is in discrete fields, such as house number, street name, telephone, postal code, and identification number. In developed markets, searches on consumer data may be performed relatively quickly by using a well-indexed relational database key that uses a single field, e.g., identification number or telephone, or a composite key, e.g., date of birth and name, name and house number, etc.
In particular, matching addresses in consumer data may be useful in many situations, such as determining whether database records should be merged, de-duplication of addresses for a particular consumer, verifying an address match during a dispute process, or other situations. Using traditional algorithms to match addresses that are contained in a single field may result in overmatching, i.e., false positives, for addresses with similar alphabetic and/or numerical values that are not actually matches; and/or undermatching, i.e., false negatives, for addresses that are actually matches but are not detected as matches. Accordingly, the usefulness of search results that are further filtered based on matching of addresses may be reduced if false positives are included and/or false negatives are not included. Furthermore, merging records based on false positives and/or false negatives of matched addresses may also contribute to incorrect database records.
Therefore, there is a need for an improved system and method that can accurately match addresses and accounts for the formatting and quality issues with consumer data that may be present in emerging markets, in order to, among other things, reduce overmatching and undermatching of addresses.