One of the major areas of this invention is the computer analysis of text. Here, "text" refers to a stream of data bits. Individual data bits will be referred to as "characters" and will typically be ASCII (American Standard Code for Information Exchange) characters. Small strings of such data bits are grouped into units called "words" with each word having a small number of defined meanings. Unless otherwise stated, words need not begin and end with special characters like spaces, carriage returns or line feeds.
In some expert systems for computer text analysis, words in text are assigned to defined categories and the number words in each category is counted. Final interpretations are based on the frequencies of words in different categories. Although this strategy is suitable for analyzing large quantities of text ranging to millions of characters or more, meanings due to relationships between words are lost.
Other expert systems include the spatial relationships between words in the test analysis. This strategy has the disadvantage that only limited amounts of text can be examined without the rules for assessing word relationships becoming exceedingly complex and very time consuming to compute.
In addition, no general and systematic method has been propsed in either of these areas of artificial intelligence to obtain directly numerical scores for the extent to which a "story" supports different "positions" for an "issue." Here, an issue is basically a question such as: should there be More, Same, or Less spending for military defense. A position, also called an "idea," is a possible answer such as: More, Same, Less or Don't know. A story is a continuous segment of text.
The other major area of this invention concerns the determination of expected public opinion, and, more generally, the determination of expected social traits. Previous attempts to develop an artificial intelligence system for predicting public opinion are described in two previous publications: "Mathematical models for the impact of information on society," by David P. Fan in Political Methodology, Vol. 10, pp. 479-493, 1984; and "Ideodynamics: The kinetics of the evolution of ideas," by David P. Fan in Journal of Mathematical Sociology, Vol. 11, pp. 1-23, 1985. This system was based on a model called ideodynamics.
In ideodynamics, expected opinion is computed from scores for the extent to which information supports different positions. Expected opinion is determined as a time trend with the time intervals of the determinations being of arbitrarily small size.
For the opinion computations in the prior system, each persuasive message able to change minds of people in a population was defined as an "infon" I.sub.ijk where i=1,2 was the index referring to whether the infon was available to all people (i=1) or only those already aware of the issue (i=2) supported a position, where j=1,2 was the index referring to the position (denoted by Q.sub.j) the infon favored, and where k was the index identifying the individual message. Each infon was defined to have the characteristics of t.sub.ijk =the time the infon first arrived at the population, a.sub.ijk =the fraction of the audience reached immediately after the infon was sent, v.sub.ijk =the credibility of the infon, c.sub.ijk =the fraction of the infon favoring idea Q.sub.j and d.sub.ijk (t)=function describing the fraction of the population accessing the infon at time t.
For opinion formation, equations describing the effects of infons from the mass media on the population were EQU d.sub.ijk (t)=e.sup.-p ijk.sup.(t-t.sbsp.ijk.sup.) for t&gt;=t.sub.ijk =0 for t&lt;t.sub.ijk ( 1)
where p.sub.ijk is constant, EQU f.sub.ijk (t)=a.sub.ijk.c.sub.ijk.d.sub.ijk (t), (2) ##EQU1## for all k with t.sub.ijk &lt;t, ##EQU2## where A(t') is the fraction of the population aware of the issue at time t', ##EQU3## for all k with t.sub.ijk &lt;t, EQU H.sub..j =G.sub..j /(G.sub... +w) (6)
where w is a weighting constant, and EQU dB/dt=k.sub.2 1-B).H.sub..1 -k.sub.2 B.H.sub..2 ( 7)
where B is the fraction of the population believing in position Q.sub.1 given by subscript j=1 assuming that there was only one other posible position (Q.sub.2) opposed to idea Q.sub.1.
The previous ideodynamic system suffered from several drawbacks rendering it inoperable. First, an infon was defined as "a single packet of information transmitted in identical copies to a group of people" (the article by Fan in the Journal of Mathematical Sociology, 1985, described above). Since each infon is defined as I.sub.ijk with only one index j, the implication is that an infon can only support the single position Q.sub.j. However, in defining content scores c.sub.ijk, there is the implication that an infon can have contents supporting more than one position. Therefore, there is contradiction in the terms defined earlier.
In the system of this invention, "a single packet of information transmitted in identical copies to a group of people" is redefined as a "persuasive message" with an "infon" I.sub.ijk now referring to "a component of a message favoring position or idea Q.sub.j." In fact, it is possible to divide a message into several infons all favoring the same position Q.sub.j. All such infons would have the same subscript j but would have different subscripts i and/or k.
Second, all persuasive messages were assumed to have the same total content score since the content score c.sub.ijk for any one position was only the fraction of the message favored by infon I.sub.ijk. This interpretation gives excessive weights to persuasive messages with very little information relevant to the issue. Therefore, the result will be highly inaccurate opinion determinations.
In this invention, c.sub.ijk is redefined as the total and not the fractional content of infon I.sub.ijk favoring position Q.sub.j.
Third, the prior equation for opinion determination (equation 7) did not include the case of an issue having more than the two positions of pro and con.
In this invention, a new equation is used where any number of positions is possible. The extension to more positions could have resulted from a number of different assumptions so the formulation in this invention cannot be directly deduced from the prior system.
Fourth, the equations in the systems of this invention no longer include the term d.sub.ijk (t). Also, the constant a.sub.ijk is redefined as a.sub.ijk (t)=a function of time including features from both a.sub.ijk and d.sub.ijk (t) in the previous formations of equations 1-7.
Fifth, equation 6 is now replaced by a totally new equation in which constant w is eliminated and in which G.sub... no longer appears. The replacement equation is not a natural extension of equation 6 since the new constants have no relation to constant w in equation 6. With this invention no longer using constant w, the entire sketch for the solution of equations 1-7 in the prior system is inoperative since that sketch required finding constant w.
Sixth, the prior system did no permit messages favoring different positions to have different abilities to cause opinion change. The possibility is now included by the introduction of constants w.sub.ij'j" (see equation A.29 of step III-4 of the Preferred Embodiment below).
Seventh, the prior system did not specify the method for solving differential equation 7 including the setting of the boundary conditions so that system did not give a complete description of the opinion determination.