The present invention relates to the field of computerized search and retrieval systems. More particularly, this invention relates to a method and apparatus for characterizing and retrieving information based on the affect content of the information.
Advances in electronic storage technology have resulted in the creation of vast databases of documents stored in electronic form. These databases can be accessed from remote locations around the world. Moreover, information is not only stored in electronic form but it is created in electronic form and disseminated throughout the world. Sources for the electronic creation of such information includes news and periodicals, as well as radio, television and Internet services. All of this information is also made available to the world through computer networks, such as the worldwide web, on a real time basis. As a result, vast amounts of information are available to a wide variety of individuals. The problem with this proliferation of electronic information, however, is the difficulty of accessing useful information in a timely manner. More particularly, how can these vast sources of information be personalized and used in decision support.
To assist in this effort, an analysis of the characteristics of textual information and an intuitive presentation to the user of those characteristics become increasingly important. For example, to match an individual user""s interest profile on the worldwide web, it would be particularly useful to understand how the user felt about various topics. The information from which this judgment is made, however, is simply text (or associated voice or video signals converted to a text format) without an associated characterization. We can, however, introduce a human emotional dimension into textual understanding and representation. The analysis of the human emotional dimension of the textual information is referred to as affect analysis. Affect analysis of a text, however, has two sources of ambiguity: i) human emotions themselves and ii) words used in the natural language. The results of the analysis must be conveyed to the user in a form that allows the user to visualize the text affect quickly. In this way, responses to a web user""s interest profile may be personalized on a real-time basis.
It is an object of the present invention to provide a method and apparatus for extracting information from data sources.
It is another object of the present invention to extract information from data sources by analyzing the affect of the information.
It is still another object of the present invention to extract information from data sources by analyzing the affect of information and creating a graphical representation of that affect.
It is still a further object of the present invention to analyze the affect of information by quantifying the ambiguity in human emotions and the ambiguity in the natural language.
It is still another object of the present invention to combine affect analysis with other characteristics of textual information to improve the characterization of the information.
The present invention is a technique for analyzing affect in which ambiguity in both emotion and natural language is explicitly represented and processed through fuzzy logic. In particular, textual information is processed to i) isolate a vocabulary of words belonging to one or more emotions, ii) using multiple emotion categories and scalar metrics to represent the meaning of various words, iii) compute profiles for text documents based on the categories and scores of their component words, and iv) manipulate the profiles to visualize the texts. Lexical ambiguity is dealt with by allowing a single lexicon entry (domain word) to belong to multiple semantic categories. Imprecision is handled, not only via multiple category assignments, but also by allowing degrees of relatedness (centrality) between lexicon entries and their various categories. In addition to centralities, lexicon entries are also assigned numerical intensities, which represent the strength of the affect level described by that word.
After the affect words in a document are tagged, the fuzzy logic part of the system handles them by using fuzzy combination operators, set extension operators and a fuzzy thesaurus to analyze fuzzy sets representing affects. Instead of narrowing down or even eliminating the ambiguity and imprecision pervasive in the words of a natural language, fuzzy techniques provide an excellent framework for the computational management of ambiguity.
The representation vehicle in the system is a set of fuzzy semantic categories (affect categories) followed by their respective centralities and intensities, called an affect set. An affect set with attached centralities is always treated as a pure fuzzy set, and all fuzzy techniques applicable to fuzzy sets are applied to affect sets. Intensities are handled differently, in a more statistical way, since they involve less ambiguity and imprecision and more quantitative aspects of the text. Graphical representation of the affect set is used as a tool for decision making.