Records for many kinds of large-scale business applications are often stored in electronic form. For example, a global electronic retailer may use electronic records containing text as well as non-text attributes to store information about millions of items that are available for sale, and publish at least some portions of the item descriptions contained in the electronic records to enable customers to select and purchase the items. Similarly, a large medical organization may store medical records for millions of customers. Although some organizations may attempt to standardize the manner in which information about entities is stored internally, such standardized approaches may not always succeed. For example, in environments in which a variety of vendors or product suppliers sell their items through a common re-seller, different vendors may use respective approaches towards describing items. Furthermore, the standardization approaches may differ from one organization to another, which may for example make it somewhat difficult to determine whether an item description at one e-retail web site is necessarily referring to the same item as another differently-formatted item description at another web site.
The ability to resolve entity or product information-related ambiguities (such as slightly different descriptions of the same item, or very similar descriptions of distinct items) may be extremely important for many organizations. For example, consider a scenario in which the same product is being sold on behalf of several different product suppliers via a particular retailing web-site, at which for each available product, a “details” web page is made available to potential customer. If different details pages are provided, based on the differences in the way that the product suppliers describe their product, this may lead to customer confusion, lowered customer satisfaction or even lower sales than may have been achieved had the products been clearly and unambiguously identified as being identical. Resolving such ambiguities, given various natural-language descriptions of items originating at different sources, may present a non-trivial technical challenge, especially in environments in which the item catalog or inventory size is extremely large and tends to change rapidly.
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to. When used in the claims, the term “or” is used as an inclusive or and not as an exclusive or. For example, the phrase “at least one of x, y, or z” means any one of x, y, and z, as well as any combination thereof.