A provider of network-accessible content may provide a table which identifies salient information items pertaining to the content. For example, consider the merely illustrative case of a provider which provides content regarding movies. This provider may provide a table which identifies the movies by listing their titles, actors, directors, and so on. The information items maintained by the table may be structured (or at least partially structured) in the sense that the table can organize the information items using a defined format.
In the above example, a user who wishes to access content regarding a particular movie may submit a query which attempts to identify one or more of the information items discussed above. For example, a user may submit a query that attempts to identify the name of a desired movie, or an actor who appears in the desired movie, or a combination thereof, and so on. However, this type of retrieval tactic is not always successful. The table may identify the titles and actors of the movies using a canonical (standard) form of these entries. A user may not know the precise from in which the table stores the information items. Hence, the user may enter a query which fails to match the way information is expressed in the table. For example, the user may enter an abbreviated form of a movie title, or a nickname associated with a movie actor. This may result in the inability of the user to obtain the information that he or she is seeking.
There are known strategies for broadening a user's input query in an attempt to mitigate the above problems. For example, one known technique can identify queries which are textual variants of the query input by a user. For example, this technique may broaden an input query by removing suffixes and the like, or, more generally, by determining whether there is a matching information item that has a sufficiently small edit distance with respect to the input query. However, this type of technique may not be reliable in the above-described scenario because the common variants of the information items may have weak textual similarity (or virtually no textual similarity at all) with respect to the canonical forms of the information items. For example, the nickname of an actor may have very little textual similarity with his or her formal name. Further, the common variants can vary from the canonical forms by adding extra words, omitting words, and so on. In short, the queries entered by users may be non-trivial variations of the canonical form of the information items.
Another known technique allows a user to manually annotate a canonical form of an information item such that it includes one or more known variants. For example, a user who wishes to advertise a particular merchandise item for sale may list a set of keywords which identify the various ways that people refer to that merchandise item. However, this technique is not fully satisfactory because it requires a user to manually create and maintain the lists of variants. Further, the list of variants may fail to capture the myriad of ways in which the public refers to information items. Moreover, it is difficult to manually capture the most appropriate variants of information items because the most appropriate variants can dynamically change over time.
Known techniques for expanding information items may have yet additional shortcomings.