The Internet is an easily accessible worldwide forum to share information about any object. Information about objects can be presented in a structured format, which can be readily available online through various Web listings services, for example. Such information can be easily mapped into tables in a relational database system. Information can also be presented online in largely unstructured, free-text format, such as blogs, news articles, discussion groups, or consumer feedback hosted at e-commerce sites. Since such unstructured information does not come clearly delineated with fields (e.g., location, price), it is much more difficult to establish mapping between such information and a structured table in a typical database system. For example, a Web listing may provide information about a restaurant in a structured format in which information on different aspects of the restaurant is available as values of different fields for the corresponding record, and numerous individual users may create online reviews of the restaurant in highly unstructured formats such as a typical textual review hosted on an online review site, where different aspects of the restaurant are mentioned as part of the running text.
An online user may request that a search engine return an online unstructured text (e.g., a review) relating to a tangible object (e.g., a restaurant) that is represented online by a structured object (e.g. a Web page). For instance, a person may ask that a search engine obtain all user reviews relating to a restaurant named, “Casablanca Moroccan Restaurant”, which can be represented as a structured object such as a record in a database. In response to this search request, a search engine typically would seek to obtain as many reviews of this restaurant as possible. Today's Web is replete with restaurant reviews, which may be located in a wide range of different online sources such as newspaper articles, newsgroup discussions, or blogs, for example. However, given the restaurant name and additional information about the restaurant available through structured online listings, the challenge for the search engine is to match the unstructured online reviews to the structured object that represents the restaurant.
Matching unstructured online text descriptions with structured online objects is a pervasive problem in computer networks such as the Internet. A structured object contains text that is descriptive of attributes of some real world physical entity such as a restaurant, a consumer product, or a movie. For a restaurant, the attributes might be name, address and cuisine. For a consumer product such as a camera, the attributes might be price, image resolution, maximum optical zoom, etc. For a movie, the attributes might be title, director, and actors. Given the ambiguities in unstructured text, which is characteristic of natural language (as opposed to values in a database), matching unstructured text to structured objects is a challenging problem. For instance, when a restaurant review contain the word “Food”, it can be part of a general comment (“Food was great though price a little high”), or it can be used to refer to the restaurant whose name is Food (“‘Food’ is one of the best restaurants I have been to”). In contrast, the distinction would have been clear in a database system: it is either a field itself describing one aspect (i.e., “food quality”) of restaurant objects, or it is the value of the “name” field. There has been a need for improvement to the matching of unstructured text such as restaurant reviews to a collection of structured objects such as an online listing that serves as an online presence for a tangible object such as a restaurant. The present invention meets this need.