There has been tremendous growth in the amount of commerce conducted over the web in the past decade. Most online transactions begin with search, both in dedicated e-commerce sites (e.g., Amazon™) and in search engine verticals (e.g., Bing™ or Google™ shopping). However, there are many characteristics that differentiate commerce search from web search. While web search is predominantly performed over unstructured data such as contents of web pages, commerce search is performed over structured data in the form of a catalog. The rich semantics in the structured catalog, when leveraged, enables effective query analysis and ranking. In particular, keywords queries annotated with type semantics enables effective retrieval from structured data sources.
However, not all commerce queries can be annotated with such clean semantics. For example, consider the query “designer hand-bags”. There is no explicit type semantics that can be associated with the term “designer” based on the catalog. Due to a possible lack of domain knowledge, users often express their information need using combinations of keywords, some of which may be easily typed to an explicit semantic type and others that cannot be easily typed. The former are referred to as typed tokens and the latter are referred to as free tokens.
One solution is to treat free tokens as keywords, and to perform keyword searches using the free tokens over the available unstructured data such as product descriptions and user reviews associated with the catalog. There are several drawbacks to this solution. First, these sources could be noisy—a seller may have incentives to label the handbags he sells with positive terms such as “designer” and “stylish” to boost sales. Second, the information can be dated—a handbag that is considered designer a year ago may become blasé today. Finally, it can adversely affect recall, if the free token is rare and/or not mentioned in the unstructured data.