The Internet provides access to a vast amount of information. A major challenge given the quantity of information is how to find and discover information to provide a user with the most relevant information for a particular circumstance. The most common tool for doing this today is a keyword based search query provided to a search engine. The search engine matches received keywords to one or more words or phrases in a search index to identify documents, web pages, or other content that is potentially relevant to the user's query. For example, if a user searches for “dinosaurs” then the search engine provides the user a list of search results that are links to web pages that contain that term.
User queries often contain one or more entities (e.g., a person, location, or organization name) identified by name or properties associated with the entity. For example, one query might search for “Barack Obama”, while another might search for “President of the United States”. Both of these queries are looking for information related to a specific entity. Users may also search for locations, such as restaurants, banks, shopping centers, and so forth. Entities may include any type of nameable thing whether it is a business, person, consumer good, service, and so forth.
Understanding how people feel about an entity (brand, product, person, business, etc.), finding out what are the most distinctive characteristics of this entity, and comparing two entities to understand the main differences are among the most common tasks people do on the Web. These tasks are very common for individuals, but they are also extremely important for businesses. Businesses spend a lot of effort and money trying to understand how people feel about their brands and products relative to their competitors. The World Wide Web contains lots of data with answers to these questions, but finding, filtering, and summarizing the web data to obtain these answers is challenging. Web data is often noisy and customer opinions about products may be distributed all over the Internet in a format and language that is difficult for automated tools to consume. There are web sites that ask users to provide reviews on entities and then display the reviews users entered (e.g., yelp.com, epinions.com), but these do not provide any reliable way to summarize and use this information in an automated fashion. Opinion mining is an active research area in Natural Language Processing (NLP). The goal there is to perform linguistic analysis of a piece of content (e.g., a product review) to understand the opinion of the author about it. However, this type of research is in its infancy and there is still far to go to get these processes to produce automatable results.