The “Tower of Babel” problem is the Data technology problem that data stores, whether Unstructured Data like Big Data, text documents, natural language discourse, web sites or Structured Data like relational databases do not carry meaning with their data. That is, computer systems do not understand the meaning of their data. Developers of these systems rely on humans to read and understand their meaning and then build their understanding into software that manipulates the data. The consequence of computer systems not understanding meaning is that data is not understandable within data stores or between data stores.
Heterogeneity, i.e., differences in data structure, data morphology, word sense ambiguity and semantics combine to cause the same meaning in two different data stores to be mutually unintelligible. The effect is that each data store “speaks a different (coded) language”. Since computer systems do not understand the meaning of the data that they store, human manual engineering and ad hoc tools are required to integrate data, fuse data or analyze data. This is the consequence the “Tower of Babel” problem. For large data stores, “Tower of Babel” solutions typically require hundreds of years of engineering effort. Many of these problems are so large that they cannot be accomplished with current technology.
There is no prior art providing an automated solution to this problem. Approaches to understanding meaning in unstructured data require resolution of lexical string (word) ambiguity as a first step. Two widely used and well known approaches are WordNet at Princeton and FrameNet at Berkeley. WordNet is a lexical database that attempts to solve for lexical string ambiguity by providing “synsets” that group lexical strings into “cognitive synonyms”. On average, English lexical strings have about 7 different meanings. The word “run” has 179. WordNet has been built manually with human understanding. The accuracy of the “cognitive synonyms” does not account for ambiguity nor does it account for ontological relationships. This approach is very, very limited in usefulness.
FrameNet at Berkeley was developed by Charles Fillmore and provides a more robust approach to meaning because it includes both words and sentences. It is an attempt to link the meaning of a word to its grammatical function in a sentence. Fillmore's observation was that word sense and therefore disambiguation is semantically dependent on the sentence in which the word appears. Fillmore called this (sentential) Frame Semantics. Fillmore, Charles J. (1968) “The Case for Case”. However, Fillmore developed Frame semantics within the context of Chomsky's TG (Transformational Grammar) and treated Frame Semantics as surface structure. He did not develop any mechanisms beyond TG to explain Frame Semantics. FrameNet is therefore an annotation of sentences and words built with human understanding. Without mechanisms or a theory of mechanisms it is not computable and therefore cannot be used to compute meaning-data maps.
Machine learning has been applied in computational linguistics to learning sentences and sentence recognition. It has achieved recognition of about 70% for limited domains. The algorithms have not generalized and successful recognition systems are not able to use the meaning recognition for automating work like data integration, data fusion or data analytics. Crucially, machine learning does not account for human's ability to perform “one-shot” concept formation which is the basis for human level concept induction.
The “background” description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventor, to the extent it is described in this background section, as well as aspects of the description which may not otherwise qualify as prior art at the time of filing, are neither expressly or impliedly admitted as prior art against the present invention.
There is no related art in general purpose AI (Artificial Intelligence) precisely because the issue of meaning has not been solved. AI, as currently practiced, has been the history of what Allan Turing called the “imitation game”. That is, computer systems have been able to pass one of Turing's proposed tests like playing chess (IBM's Deep Blue), but were unable to hold a conversation, a second Turing test, because human guidance was necessary to solve each point application problem. In practice, each point solution required elaborate human specification and programming because the computer did not have access to the meaning of what it was doing. Without meaning, each point solution is nothing more than an elaborate programming challenge and not artificial intelligence in the human sense.
As such, there exists a need for a method and apparatus providing for determining a meaning of data. The need further includes generating the meaning of data for actionable computing operations beyond existing limited AI techniques.