A technical problem exists in evaluating apparent trends in the perception of people regarding products, services, brands, events, etc., in a global community, where many, if not most of, such opinions are found in documents and files available on the Word Wide Web.
There is a need for an automated, computationally efficient, method and system for analyzing this plethora of documents and files, and to efficiently generate information which would indicate perceptions (positive or negative) of people regarding such products and services, by goegraphical or by another similar segmented manner.
For clarity, in the following description, “objects” means products, services, brands, events, etc. that can be identified by words in a document and on which one wants to have a global satisfaction evaluation.
On the web, numerous documents are published each day, and numerous opinions are expressed in different ways, for example, by setting up web pages, by exchanging information in newsgroups and discussing subjects in chat-rooms. These documents contain valuable information about the perception of objects.
Capturing or monitoring the perception of objects on the web or any other network allows evaluating the positive and/or negative (and/or opinionless) perception of the users/customers. More generally, perception analysis allows a corporation, institution or the like to know the opinion of actual or potential users/customers on their products, services, etc. This satisfaction opinion can then be used for modifying/improving the object.
Known evaluation methods providing satisfaction indexes use document search engines to perform keyword searches for publications on the web. Such search engines are able to find on the web all documents containing one or more keywords and to download documents related to a specific topic. Then, the examination and evaluation of the satisfaction index (positive, negative, opinionless) of each document are either done manually, i.e. by readers extracting the global impression of the document, or automatically with artificial neural networks which are very complex, especially to configure for a new domain. The complexity of the neural networks analysis is prejudicial to the speed of processing a voluminous data base taking into account the configuration time.
An approach which comes in mind to one who will attempt to automatically evaluate a satisfaction index is to use a classification algorithm. Such a classification algorithm consists in a statistical text learning algorithm which can be trained to approximately classify documents, given a sufficient set of labeled training examples. Classification algorithms are already used to automatically catalog news articles, sort e-mail or learn the reading interest of users, and one might think that such algorithms could be trained to distinguish between positive and negative documents. However, not only they require a large, often prohibitive, number of labeled documents (i.e. hand classified) but they do not adapt well to understanding the context of specific words (for example positive references to a given product).
A purpose of the present invention is to overcome at least one disadvantage of the known solutions for automatically determining a satisfaction index about an object.
Another purpose of the present invention is to provide a computer based apparatus and method of selecting and analyzing candidate documents in order to determine a satisfaction index about a predetermined object, which leads to a very simple and fast software algorithm.
A further purpose of the present invention is to make a discrimination between appreciation and description of the object.