Many types of software systems have been developed to meet the needs of users in the area of storing and retrieving information. Existing systems have enabled the storage and retrieval of large amounts of information. Performance is an important design consideration for such systems, and operations performed on stored information must be completed in the shortest possible time. It is therefore desirable that any processing steps performed either in response to, or preparation for, operations such as information storage, information retrieval, etc., be performed efficiently. However, in any information storage and retrieval system in which pieces of information must be categorized, system performance may be adversely impacted to a significant degree by the categorization process. This problem may be exacerbated when there are potentially large numbers of information categories and information pieces.
In addition, significant performance problems have been identified with regard to certain applications of existing relational and non-relational approaches to information storage and retrieval. First, it is well understood that the performance of relational database systems suffers when they are used to provide run-time flexibility in the information categories (i.e. tables) being stored. With regard to non-relational systems, including those involving the storage and retrieval of “semi-structured” information, such as XML (eXtensible Markup Language) documents, in some cases they may be more suitable to certain applications than relational systems for the storage, management, retrieval, and exchange of certain types of data. However, some areas that have traditionally been approached using non-relational systems are not cleanly reducible to a set of documents. As a result, existing “semi-structured” approaches are inadequate for a significant number of data storage and retrieval applications that are characterized by high variability of the structure of the stored information. Moreover, if it is desirable to share parts of documents in a document based system, there arises the problem of maintaining different document versions, and the resulting dependencies may become too complex for a system in which the documents are totally independent. File systems also suffer from the same problem, since they are based on an independent container model. The above shortcomings of existing relational and non-relational systems are apparent in a number of specific areas, including the storage of personal information, such as contact information.
For the above reasons and others, it would be desirable to have a new approach to categorizing information that provides improved performance in an information storage and retrieval system. The new system should perform efficiently in the face of large numbers of categories and amounts of information to be categorized. The system should be conveniently applicable to problems not amenable to solution using relational databases, or using existing non-relational systems, such as existing semi-structured document-based systems. The system should further be applicable to problems in which there is a highly variable information structure. Finally, the system should be conveniently applicable to storage, management, retrieval and exchange of various specific kinds of information, including personal information and/or information relating to information workers.