The invention addresses the problem of efficiently representing, storing, retrieving and processing real world knowledge on a computer or network of computers.
The problem is a fundamental one which has been made all the more apparent since the invention and growth of the internet. It is also one that has enormous ramifications in every area of human endeavor. Historians have argued that much of the scientific and cultural progress that the human race has made can be traced to innovations which positively affected the storage and spread of knowledge between peoples and generations. The internet has the potential to be another innovation in this list but is hindered by technological barriers which prevent the knowledge contained in its millions of linked computers being exploited to its full potential.
Typically, a computer system will store any knowledge it needs to keep in a local form understood only by the local system. The files recording this information will only be updated and read by the local software and the knowledge contained in them will only be usable by other computer systems after a great deal of programming work has been done to integrate the two systems. This problem applies even if the files are stored in a widely-recognized file format. For example, a database application may store an employee database in a recognized relational database format which can be read by other database systems. However, without specific programming, all the new system will see are rows and named columns containing numbers and strings. It will have no understanding, say, that the fields entitled “employee name” denote people and would certainly not be able to answer, for example, a query about monthly salary by dividing the number under “annual salary” by twelve.
Another limitation of a typical computer system is the narrow domain of the knowledge it can contain. The programming effort required even to handle very specific knowledge is huge so a typical computer system can only deal with the very narrow scope that the application is designed to cover. Once that effort has been made the program generally cannot be made use of elsewhere.
A common way to store general knowledge in some applications is to use natural language (e.g. English text) to store the information. This approach certainly allows the widest possible domain of knowledge to be stored but natural language is not a format that is understandable to computers in any realistic way. This means that although computers can store and display natural language to humans with ease they cannot fully exploit the real meaning of the text.
Nowhere are the limitations of natural language as a knowledge-storing mechanism more apparent than with the World Wide Web. The Web consists of billions of pages of text all of which are instantly retrievable and displayable by any computer on the internet. The amount of knowledge contained within these pages is phenomenal. However, if a human user wants to find something out using this knowledge the only practical technique that is available at the moment is keyword searching.
In order to find information using keyword searching the human user first hopes that a page exists which answers the question, hopes again that this page has been copied and indexed by a search engine and then tries to imagine what distinctive words will appear on this page. If any of the words guessed are wrong or the page has not been indexed by the search engine they will not find the page. If the combination of words requested is contained on too many other pages the page may be listed but the human user will then have to manually read through hundreds or thousands of similar pages before finding the knowledge they require.
In addition there is a certain arbitrariness about the words being used. Searching for general information on a person or product with a unique, distinctive name has a high probability of success but if the search is for someone with a common name or for information on something where the name also means something else (the Japanese board game “Go” is a very good example) the search will fail or an extraordinary amount of extra human effort is needed to locate the information. Furthermore, different ways of describing the same thing mean that several different queries often need to be made or the search may fail. For example, a search for information on “Bill Clinton”, will not produce documents where he is referred to as “President Clinton” or “William Jefferson Clinton”.
In summary, although innovations may be possible that can statistically improve the results produced by search engines none can completely avoid the fundamental problems with the indexing and keyword searching approach. To overcome these problems requires a strategy that includes representing knowledge in a form other than natural language.
Methods other than natural language of representing knowledge on a computer have been proposed previously. These include systems based on logic where a mathematical language with syntax and semantics is used to represent the knowledge; Semantic Nets where the information is modeled graphically using nodes which represent objects and links between the nodes which represent relationships between objects and frame-based systems where the knowledge is represented using frames which represent objects and slots which represent properties of those objects.
However, these methods have serious limitations and have failed to be widely adopted except in narrow applications.