1. Field
This invention pertains in general to natural language processing and in particular to automated sentiment classification to provide rankings of documents.
2. Description of the Related Art
Determining indicators of search result relevance and ranking the search results according to these indicators is an integral function of web search engines. Common indicators of search result relevance include indicators of popularity such as number of links to a web page or number of page hits a day. Other indicators of popularity may be collected through monitoring user-interaction with search results. Monitoring user-interaction with search results produces metrics which indicate search result relevance such as user click through rates or average time spent by the user at a web page associated with a search result.
Often searches are performed for entities about which public opinion is expressed such as movies, restaurants and hotels. This opinion or sentiment is also a valuable indicator of the relevance of search results. For instance, if a user searches for French restaurants, it is most likely that a user would like to know of the restaurants that are the most favorably reviewed. Similarly, most users who search for a listing of hotels in a geographic area wish to see results containing the hotels with the best reviews. Users may be interested in search results for reviewable entities such as books and films for which strong public opinion is expressed, whether or not the opinion is favorable or unfavorable.
Attempts to use sentiment as a ranking signal for search results have commonly used structured reviews. In structured reviews, the reviewer selects a rating in addition to providing a textual review of the entity. Structured reviews can be conveniently used in ranking systems as most structured reviews use a numeric rating (e.g. a 5 star system or a scale of 1 to 10) that can easily be used to rank results. Results are ranked by their average numeric rating from the structured review. However, in instances where an entity has mixed reviews valuable information may be lost due to the averaging.
Another limitation of solely using ratings from structured reviews as indicators of search result relevance is that valuable information in the textual review regarding the sentiment or public opinion about the reviewable entities is discarded. In textual reviews sentiment is expressed through statement, allowing a finer level of precision or “granularity” than rankings and the ability to express different types of sentiment within a review (e.g. “food great, service bad”).
Textual reviews may also help correct for inconsistencies in ranking system normalization. For instance, a restaurant consistently rated at two stars by restaurant reviewers may be favorably reviewed by its patrons due to differences in ranking system scales. Incorporating the sentiment expressed within the textual reviews that accompany the ratings from both reviewers and patrons can help correct for these inconsistencies. Additionally, there are many other textual sources of sentiment outside of structured reviews such as blogs or personal web pages that may not be integrated into search result rankings based solely on structured ratings.