Consumers are increasingly reliant upon online resources to research or locate businesses that they may be interested in patronizing. For example, a consumer might search for local Italian restaurants that specialize in southern Italian cuisine. As another example a consumer may search for the closest hardware store where the consumer could purchase tools for a woodworking product. As yet another example, a consumer may browse a horticulture website in order to identify local businesses that can assist in the planning and execution of a backyard landscaping project. For each of these uses, the locality of the business may be important to the consumer, since it reduces the travel time to the business, connects the consumer with businesses having local knowledge such as climate or community standards, and allows the consumer to support local establishments.
In order to aid consumers in identifying local businesses, many websites and other services have been launched that seek to provide local information to consumers. For example, online review sites such as CitySearch and Yelp allow users to search and/or browse a large database of business listings in order to find local goods and services. Such sites allow consumers to specify a number of filtering criteria to enable the consumer to find a desired business. In order to appeal to consumers, local search or information sites often seek to be as comprehensive as possible in the subject matter that they serve. The more business listings that a service can provide, the more likely it is that a consumer will find the business listing in which they are interested. Moreover, it is also important that such sites provide a high degree of accuracy in the business listings that are presented. Because consumers rely upon the sites for contact information for the business or driving directions to the business, inaccurate information can result in frustration for the consumer and lost sales for the business. If consumers don't have a high degree of trust in the information presented on a local site, consumers will not return to the site for additional information in the future.
One of the challenges in presenting comprehensive yet accurate information via a local site is collecting a reliable set of data to present to consumers. There is no single database that contains accurate information about all businesses in the U.S. or abroad, and a site or service operator is typically forced to cobble-together business records from tens or even hundreds of different databases. Such databases may contain conflicting information, may contain incorrect or outdated information, and may be missing information. FIG. 1 depicts an example of three different business records 100, 105, and 110 that may be obtained by an operator of a local site. Each business record contains a number of fields of information that characterize a business. Such fields may include, for example, the name of the business, the street or mail address of the business, the primary or secondary phone number of the business, the latitude and longitude of the business, customer reviews of the business, a URL for the business, and any other information that might be used to characterize the business or its products and services. Each business record may have been obtained from a different data source, for example, the first business record 100 may have been obtained from a first data source, the second business record 105 may have been obtained from a second data source, and the third business record 110 may have been obtained from a third data source.
One of the challenges in obtaining business records from different sources is determining whether the records are related to the same business. For example, street addresses may differ from mailing addresses, phone numbers may differ by one or more digits, businesses may operate under a consumer-facing name and a corporate name, and businesses having the same name but different owners may operate in similar geographic locations. The variety of business information and the lack of trustworthy sources of business information make it extremely difficult to reconcile business records and determine whether any two or more business records relate to the same business or to different businesses. With reference to FIG. 1, for example, the second business record 105 may or may not relate to the same business as the third business record 110. While both records ostensibly relate to a restaurant, the name of the two restaurants in the records is slightly different. Moreover, while the second business record has an incomplete street address, the address of the third business record is a post office box. In such circumstances, a human reviewer will typically need to review the data contained in each record and assess whether the two or more records relate to the same business entity or not. Such a manual process has many limitations, such as being overly reliant upon human judgment and being unable to easily scale to process thousands or tens of thousands of business records. Websites and other services are therefore without an effective mechanism to analyze large numbers of business records in order to compile and provide accurate local information to consumers.