1. Technical Field
The invention relates to the organization and use of information. More particularly, the invention relates to a graph store.
2. Description of the Background Art
There is widespread agreement that the amount of knowledge in the world is growing so fast that even experts have trouble keeping up. Today not even the most highly trained professionals—in areas as diverse as science, medicine, law, and engineering—can hope to have more than a general overview of what is known. They spend a large percentage of their time keeping up on the latest information, and often specialize in highly narrow sub-fields because they find it impossible to keep track of broader developments.
Education traditionally meant the acquisition of the knowledge people needed for their working lives. Today, however, a college education can only provide an overview of knowledge in a specialized area, and a set of skills for learning new things as the need arises. Professionals need new tools that allow them to access new knowledge as they need it.
The World Wide Web
In spite of this explosion of knowledge, mechanisms for distributing it have remained pretty much the same for centuries: personal communication, schools, journals, and books. The World Wide Web is the one major new element in the landscape. It has fundamentally changed how knowledge is shared, and has given us a hint of what is possible. Its most important attribute is that it is accessible—it has made it possible for people to not only learn from materials that have now been made available to them, but also to easily contribute to the knowledge of the world in their turn. As a result, the Web's chief feature now is people exuberantly sharing their knowledge.
The Web also affords a new form of communication. Those who grew up with hypertext, or have otherwise become accustomed to it, find the linear arrangement of textbooks and articles confining and inconvenient. In this respect, the Web is clearly better than conventional text.
The Web, however, is lacking in many respects.
It has no mechanism for the vetting of knowledge. There is a lot of information on the Web, but very little guidance as to what is useful or even correct.
There are no good mechanisms for organizing the knowledge in a manner that helps users find the right information for them at any time. Access to the (often inconsistent or incorrect) knowledge on the Web thus is often through search engines, which are all fundamentally based on key word or vocabulary techniques. The documents found by a search engine are likely to be irrelevant, redundant, and often just plain wrong.
A Comparison of Knowledge Sources
There are several aspects to how learners obtain knowledge—they might look at how authoritative the source is, for example, or how recent the information is, or they might want the ability to ask the author a question or to post a comment. Those with knowledge to share might prefer a simple way to publish that knowledge, or they might seek out a well-known publisher to maintain their authority.
While books and journals offer the authority that comes with editors and reviewers, as well as the permanence of a durable product, the Web and newsgroups provide immediacy and currency, as well as the ability to publish without the bother of an editorial process. Table “A” is a summary of the affordances of various forms of publishing.
TABLE AAffordances of Various Forms of PublishingNEWSTEXTTHE WEBGROUPSBOOKSJOURNALSPeer-to-PeerYesYesNoLimitedpublishingSupports linkingYesLimitedNoLimitedAbility to addNoYesNoNoannotationsVetting andNoLimitedYesYescertificationSupportsLimitedNoYesYespayment modelSupports guidedLimitedNoYesNolearning
The invention addresses the problem of providing a system that has a very large, e.g. multi-gigabyte, database of knowledge to a very large number of diverse users, which include both human beings and automated processes. There are many aspects of this problem that are significant challenges. Managing a very large database is one of them. Connecting related data objects is another. Providing a mechanism for creating and retrieving metadata about a data object is a third.
In the past, various approaches have been used to solve different parts of this problem. The World Wide Web, for example, is an attempt to provide a very large database to a very large number of users. However, it fails to provide reliability or data security, and provides only a limited amount of metadata, and only in some cases. Large relational database systems tackle the problem of reliability and security very well, but are lacking in the ability to support diverse data and diverse users, as well as in metadata support.
The ideal system should permit the diverse databases that exist today to continue to function, while supporting the development of new data. It should permit a large, diverse set of users to access this data, and to annotate it and otherwise add to it through various types of metadata. Users should be able to obtain a view of the data that is complete, comprehensive, valid, and enhanced based on the metadata.
The system should support data integrity, redundancy, availability, scalability, ease of use, personalization, feedback, controlled access, and multiple data formats. The system must accommodate diverse data and diverse metadata, in addition to diverse user types. The access control system must be sufficiently flexible to give different users access to different portions of the database, with distributed management of the access control. Flexible administration must allow portions of the database to be maintained independently, and must allow for new features to added to the system as it grows.
U.S. patent application Ser. No. 12/049,145, filed 14 Mar. 2008, which is incorporated herein in its entirety by this reference thereto, provides a system to organize knowledge in such a way that users can find it, learn from it, and add to it as needed. In connection with such system, and especially with schema last approaches to information storage and retrieval, two observations are noted:
1. A collaborative database must, of necessity, support the creation or modification of schema long after data have been entered. While the relational model is quite general, current implementations map tables more-or-less directly into btree-based storage. This structure yields optimal performance but renders applications quite brittle. This invention supports a ‘Schema Last’ approach.2. A conventional table-of-tuples implementation is problematic, even on a modern column store. The starting point, a table of tuples and indexes with compound keys which are permutations of subject-predicate-object, is well studied and subject to obvious limitations of index size and self-join performance. Attempting to optimize an existing relational store for this tuple access pattern, while possible, is burdened both by compatibility with a relational model that is far more general than needed, and by an SQL interface in which it is difficult to say what is really meant.
It would be advantageous to provide optimization techniques for such systems.