1. General Points
1.1 Summary
The claimed method of the computer-implemented invention “meaning-checking” (literally translated from German: “right-meaning-checking”) is: for each sentence of a text of a high-level natural language, to automatically, deterministically determine whether it is univocally formulated, by automatically calculating whether for each word that frames the sentence—computationally—only 1 single, relevant meaning of the word exists in the context and what this meaning is.
The meanings and coupled associations of all relevant words of the high-level natural language in which the sentence is written are stored in special pre-generated, standardized, numeric fields—so-called meaning-signals—and can be retrieved automatically.
In the invention these are automatically, arithmetically combined and comparatively analyzed—controlled only by the input sentence and its context per se—in such a way that as a result of the process either a formulation error is reported—if the sentence is not univocal—or each word is permanently linked to the single, associated meaning-signal which is valid for the word in this context.
This corresponds to the task of extracting information items from the sentence that are not explicitly, but normally only implicitly, present in it.
This implicit information of the sentence, which can be calculated out of the context by the invention, is based on the method according to the invention of the arithmetic and logical combination of the meaning-signals of the words present in the sentence, controlled solely by the special arrangement and morphology of the words in the sentence itself.
Note on Terminology:
Special technical vocabulary and invention specific, novel terms (e.g. meaning-signal, complementary or word ligature), are listed in Table 4. Standard technical terms from linguistics and computational linguistics are listed in Table 7.
1.2 Underlying Procedure
1.2.1 A method for automatically detecting meaning-patterns in a text using a plurality of input words, in particular a text with at least one sentence, comprising a database system containing words of a language, (line 1 in FIG. 3.1), a plurality of pre-defined categories of meaning in order to describe the properties of the words (columns 1-4 in FIG. 3-1, see FIG. 3.1 and explanations thereof in section 3.2), and meaning-signals for all the words stored in the database, wherein a meaning-signal is a univocal numerical characterization of the meaning of the words using the categories of meaning, and wherein at least the following steps are carried out:    a) reading of the text with input words into a device for data processing,    b) comparison of all input words with the words in the database system,    c) assignment of at least one meaning-signal to each of the input words, wherein in the case of homonyms two or more meaning-signals are assigned;    d) in the event that the assignment of the meaning-signals to the input words is univocal, the meaning-pattern identification is complete,    e) in the event that more than one meaning-signal could be assigned to an input word, the relevant meaning-signals are compared with one another in an exclusively context-controlled manner, wherein    f) on the basis of the combination of the meaning-signals of the input words among one another, it is determined whether a contradiction or a match—particularly in the case of homonyms—is present in the meaning of the input word with respect to the context;    g) meaning-signal combinations that lead to contradictions are rejected (see FIG. 3.2 and related explanations in section 3.3), meaning-signal combinations for matches are automatically numerically evaluated in accordance with the degree of matching (meaning modulation) based on a pre-defined relevance criterion (see section 3.3) and recorded,    h) automatic compilation of all input words resulting from the steps d) and g) are output as the meaning-pattern or the numeric meaning intersection matrix (FIG. 3.2) of the text, in particular of the sentence.    i) in the case of text where words with homophones are present, e.g. from speech recognition and with appropriate triggering, including checking the degree of meaning-signal correspondence, but also morphological-syntactic compatibility of the word that is present and its further homophonous spelling in relation to the context and possibly automatic replacement or error warning in case of insufficient differentiation among the meaning-signals of the words of an identical homophone group in the context of the sentence under test.
1.2.2 Problem Solved
“Meaning-checking” solves the technical problem in the automatic processing of texts that, in particular in the case of words with multiple meanings (=homonyms), is not explicitly present, in which of its meanings the homonym has actually been used in the text by the author of the sentence.
In spoken texts “meaning-checking” solves the same problem as for homonyms also for homophones. For homophones, the spelling of the word used is not determined when hearing a text.
Examples of homophonous words: Lehre—Leere (teaching—empty); or DAX—Dachs (DAX—badger); also, especially in German, in upper and lower case (e.g. wagen (be brave)—Wagen (car, vehicle); wegen (because of)—Wegen (ways, dative/plural of way);
in English, for example, to—two—too; or knew—new—gnu.
But also word ligatures (not compounds): e.g. “an die” (to the)—“Andy”;
or for example in Spanish “del fin” (i.e “from the end”)—“delfin” (dolphin).
The number of homophonous words (not counting common word ligatures) is e.g.: in German about 8,000 words, in English about 15,000 words, in French 20,000 words, in Japanese approx. 30,000 words).
This information of a sentence which is not explicit, e.g. with respect to the homonyms and homophones—but which is implicitly present in any univocal sentence of a natural language due to the combination of the words used themselves, in sentence and context—has up to now only been possible to be determined by human beings who master the language in which the sentence was created (be it phonetically or alphanumerically).
Homonyms and homophones belong to the most frequently used words in all languages. E.g. in German, of the 2000 most frequently used words about 80% are homonyms and approx. 15% homophones. In other high-level languages these values are sometimes much larger.
If one wants e.g. to discern the meaning of each word of a sentence in a completely unknown language, for each word of the sentence one must look up its meanings in its basic form—e.g. by means of a dictionary—and then—in the unknown language—determine which of the meanings was likely intended by the author of the sentence in the context of the other words of the sentence.
This is all the more difficult the more homonyms the sentence contains.
In the case of sentences with 5 or 8 words it is already common for hundreds, or even thousands, of basic possible combinations of the meaning of the words of a sentence to exist, although only one of the possible combinations is correct in the context. See for example in FIG. 2 the sentences 2.1.A1 and 2.1.A2.
In sentence 2.1.A2 after the application of the invention, the meaning of each word is identified and can be recognized by superscripts on the respective word. (See individual meanings in the box to the right) This sentence from FIG. 2 is univocal, although nearly 2 million basic possible meaning combinations of the meanings of its words exist for it. Refer to the information given in the fields J4-J6, and J15-J17 in FIG. 2. More detailed information on other meanings of the homonyms of this example is given in Table 1.
This problem—to determine the basic form, the possible semantic variants, and to calculate the correct meaning combination of a word in any given sentence and context—for all words stored in the databases linked to the invention with meaning-signals, is solved automatically by the invention.
And in fact this is done solely by automatic analysis and numerical comparison of the meaning-signals of the input text (sentence+sentence context) itself and without needing to analyze any other text databases, corpora, lexica etc.; neither statistically, nor by graph-based methods (e.g. calculation of edge lengths in Euclidean vector spaces), nor by means of artificial neural networks etc.
Here it is important to speak about meaning-signals because the selected structure and arithmetics for computational treatment of meaning-signals corresponds to the computer-based treatment of numeric patterns, in contrast to a rather neurological term like “associations”.
Meaning-signals do represent associations on a numerical way, but they are not themselves associations.
It is the analogy of the process of mutual modulation of meaning-signals from the field of communications technology, as well as the existence of electrical “currents” in the brain during the processing of associations when language is perceived by human beings, which recommend the use of the new expression “meaning-signals”.
1.3 Technical Applications/Comparison to the Prior Art
A direct, practical application of the invention, beyond meaning-checking, are e.g.:                High quality automatic machine translation systems, because:        Firstly, only univocal sentences can be translated correctly. Secondly, an univocal sentence can only be assigned correct translations, if the—only—relevant meaning of each individual word of the sentence in the context is known. The perceived state-of-the-art based on well-known products—regardless of whether they are free of charge or not—=50% incorrect translations, e.g. in the case of statistical machine translation engines. The database to be searched in the invention is nevertheless smaller by a factor of 500 . . . 1000 than those based on conventional statistics machine translation systems, while increasing the translation quality to better than 95%. (cf. Tables 5+6)        The knowledge of the relevant unique meaning of each word in the context allows, among other things, a novel, automatic, semantic indexing of text databases according to meaning, which then allows very much more accurate search results from search engines (a factor of 99% to 99.99% fewer irrelevant hits), than the prior art. Perceived state-of-the-art technology based on well-known products=if the search term is a homonym, the hits for all meanings of the word are displayed and not only those for the single intended meaning.        In addition, for speech recognition or human-machine dialogs this knowledge of the relevant unique meaning of each word in the context allows a precise—meaning-related—recognition and further processing of the input—also in the form of automatically generated input-related, rationally intelligible, interactive dialogs—which have not existed up to now.        Perceived state-of-the-art technology based on well-known products=100% erroneous interpretation of homophones, and no reliable detection of words that are important for logical inferences. See also example 2.2 sentences 2.2.B1 and 2.2.B2.        