Embodiments of the invention generally relate to electronic natural language processing, and more particularly, to natural language processing based on textual polarity.
Generally, natural language processing (NLP) systems are designed to process unstructured data in natural language form. NLP systems seek to bridge the gap between the processing power of computers and the variable nature of natural language expression. Search engines and Question-Answering systems are two classes of NLP systems.
Search engines traditionally operate based on matching key terms in a search phrase to terms in a reference document (for example, a webpage). The matching may be enhanced by using Boolean search operators, wildcard characters, or the like. In this model, a search result is generally deemed relevant to a search phrase if there is close mapping of words in the search phrase to words in the search result. The search engine generally ignores the disparate impact that a given word may have on the meaning of the search phrase as a whole, or on the meaning of a mapped phrase in a search result. For example, in response to receiving the search phrase “first president of the United States,” a traditional search engine may rank the following results closely to one another: “George Washington was the first president of the United States,” and “George Washington was not the first president of the United States.” While the two search results are substantially similar (they share ten words appearing in the same sequence with the exception of “not” in the second sentence), they convey completely opposite meanings. The search engine likely presents both sentences as highly relevant in its search results, even though at least one of the two sentences is wrong.
Question-answering (QA) systems generally are designed to receive a natural language question input, analyze the question to determine its meaning beyond the mere words used in the question, and generate a natural language answer to the question. For example, in a typical QA use-case, the QA system receives a natural language question from a user. The likelihood that the QA system arrives at a correct answer to the question can be improved by categorizing the question into a known question type, and by employing special techniques that take advantage of known properties of the question type, and known properties of likely answers to that question type.