The present invention pertains to the art of document configuration and more particularly to configuration of a document in a document management system which stores the contents of a document separate from the properties of the document. The contents of the document are retrieved by a bit provider that delivers the content to the document from external storage repositories without informing the document as to the location where the content is stored.
The inventors have recognized that a large amount of a user""s interaction with a computer has to do with document management, such as storing, filing, organizing and retrieving information from a variety of electronic documents. These documents may be found on a local disc, on a network system file server, an e-mail file server, the world wide web, or a variety of other locations. Modern communication delivery systems have had the effect of greatly increasing the flow of documents which may be incorporated within a user""s document space, thereby increasing the need for better tools to visualize and interact with the accumulated documents.
The most common tools for organizing a document space rely on a single fundamental mechanism known as hierarchical storage systems, wherein documents are treated as files that exist in directories or folders, which are themselves contained in other directories, thereby creating a hierarchy that provides the structure for document space interactions. Each directory in a hierarchy of directories, will commonly contain a number of individual files. Typically, files and directories are given alpha-numeric, mnemonic names in large storage volumes shared via a network. In such a network, individual users may be assigned specific directories.
A file located in a sub-directory is located by its compound path name. For example, the character string D: TREE LIMB BRANCH TWIG LEAF.FIL could describe the location of a file LEAF.FIL whose immediate directory is TWIG and which is located deep in a hierarchy of files on the drive identified by the letter D. Each directory is itself a file containing file name, size, location data, and date and time of file creation or changes.
Navigation through a file system, to a large degree, can be considered as navigation through semantic structures that have been mapped onto the file hierarchy. Such navigation is normally accomplished by the use of browsers and dialog boxes. Thus, when a user traverses through the file system to obtain a file (LEAF.FIL), this movement can be seen not only as a movement from one file or folder to another, but also as a search procedure that exploits features of the documents to progressively focus on a smaller and smaller set of potential documents. The structure of the search is mapped onto the hierarchy provided by the file system, since the hierarchy is essentially the only existing mechanism available to organize files. However, documents and files are not the same thing.
Since files are grouped by directories, associating a single document with several different content groupings is cumbersome. The directory hierarchy is also used to control the access to documents, with access controls placed at every node of the hierarchy, which makes it difficult to grant file access to only one or a few people. In the present invention, separation of a document""s inherent identity from its properties, including its membership in various document collections, alleviates these problems.
Other drawbacks include that existing hierarchical file systems provide a xe2x80x9csingle inheritancexe2x80x9d structure. Specifically, files can only be in one place at a time, and so can occupy only one spot in the semantic structure. The use of links and aliases are attempts to improve upon such a limitation. Thus, while a user""s conception of a structure by which files should be organized may change over time, the hierarchy described above is fixed and rigid. While moving individual files within such a structure is a fairly straightforward task, reorganizing large sets of files is much more complicated, inefficient and time consuming. From the foregoing it can be seen that existing systems do not address a user""s need to alter a file structure based on categories which change over time. At one moment a user may wish to organize the document space in terms of projects, while at some time in the future the user may wish to generate an organization according to time and/or according to document content. A strict hierarchical structure does not allow management of documents for multiple views in a seamless manner resulting in a decrease in the efficiency of document retrieval.
Existing file systems also support only a single model for storage and retrieval of documents. This means a document is retrieved in accordance with a structure or concepts given to it by its author. On the other hand, a user who is not the author may wish to retrieve a document in accordance with a concept or grouping different from how the document was stored.
Further, since document management takes place on a device having computational power, there would be benefits to harnessing the computational power to assist in the organization of the documents. For example, by attaching a spell-checker property to a document, it can extend the read operation of a document so that the content returned to the requesting application will be correctly spelled.
The inventors are aware that others have studied the area of document management/storage systems.
DMA is a proposed standard from AIIM designed to allow document management systems from different vendors to interoperate. The DMA standard covers both client and server interfaces and supports useful functionality including collections, versioning, renditions, and multiple-repository search. A look at the APIs show that DMA objects (documents) can have properties attached to them. The properties are strongly typed in DMA and must be chosen from a limited set (string, int, date . . . ). To allow for rich kinds of properties, one of the allowable property types is another DMA object. A list type is allowed to build up big properties. Properties have a unique IDs in DMA. Among the differences which exist to the present invention, is the properties are attached to documents without differentiation about which user would like to see them; properties are stored in the document repository that provides the DMA interface, not independently from it. Similarly, DMA does not provide support for active properties.
WebDAV is another interface designed to allow an extended uniform set of functionality to be attached with documents available through a web server. WebDAV is a set of extensions to the HTTP 1.1 protocol that allow Web clients to create and edit documents over the Web. It also defines collections and a mechanism for associating arbitrary properties with resources. WebDAV also provides a means for creating typed links between any two documents, regardless of media type where previously, only HTML documents could contain links. Compared to the present invention, although WebDAV provides support for collections, these are defined by extension (that is all components have to be explicitly defined); and although it provides arbitrary document properties, these live with the document itself and cannot be independently defined for different users, furthermore there is no support for active properties and are mostly geared toward having ASCII (or XML) values.
DocuShare is a simple document management system built as a web-server by Xerox Corporation. It supports simple collections of documents, limited sets of properties on documents and support for a few non-traditional document types like calendars and bulletin boards. It is primarily geared toward sharing of documents of small, self-defined groups (for the latter, it has support to dynamically create users and their permissions.) DocuShare has notions of content providers, but these are not exchangeable for a document. Content providers are associated with the type of the document being accessed. In DocuShare properties are static, and the list of properties that can be associated with a document depends on the document type. Users cannot easily extend this list. System administrators must configure the site to extend the list of default properties associated with document types, which is another contrast to the present invention. Also, in DocuShare properties can be visible to anyone who has read access for the collection in which the document is in. Properties are tightly bound to documents and it is generally difficult to maintain a personalized set of properties for a document, again a different approach than the one described in the present invention.
An operating system xe2x80x9cSPINxe2x80x9d from the University of Washington allows users to inject code into the kernel that is invoked when an appropriate system call or system state occurs (For example, users can inject code that alters paging decision.). Their technology could be used to make it possible to inject code into the file system to invoke a user""s code on read and write. Among the differences between SPIN and the concepts of present invention are that code injected into SPIN runs at the kernel level and users can only express their behaviors in a restricted, safe language in which it is not possible to do xe2x80x9cbad things.xe2x80x9d As such, expressiveness is limited. On the other hand, the properties in the present invention run at the user level, and can have GUIs call out to third party libraries and in general be far more expressive than a kernel injected spindle. Further, the properties of the present invention are expressed in terms of documents, as in xe2x80x9cI attach property X to Document Y.xe2x80x9d The SPIN system, on the other hand, extends a system call such as xe2x80x9creadxe2x80x9d on all files. The example document specific behaviors mentioned above are more easily mapped into a system such as the present invention in which properties are explicitly attached to individual documents.
Other works which have been done which allow operating system calls to be extended into user""s code include, the article xe2x80x9cInterposition Agents: Transparently Interposing User Code and System Interface,xe2x80x9d by Michael B. Jones in Proceedings of the 14th Symposium on Operating Systems, Principles, Ashville, N.C., December, 1993, pages 80-93. The article xe2x80x9cSLIC: An Extensibility System for Commodity Operating Systems,xe2x80x9d by Douglas P. Ghormley, Steven H. Rodriguez, David Petrou, Thomas E. Anderson, which is to appear in the USENIX 1998 Annual Technical Conference, New Orleans, La., June 1998.
Further, Windows NT (from Microsoft) has a function called xe2x80x9cFilter Driversxe2x80x9d which, once installed, can see the accesses made to a file system. Like SPIN, a filter driver is involved on operations on all filters instead of on a document by document basis. Furthermore, installing filter drivers is a privileged operation, it is not available to normal users. As such, a user level mechanism, such as the document properties of the present invention and event dispatching architecture would be needed to allow users to express their desired behaviors.
There are also systems which, in a very specific domain, allow users to apply behaviors when documents are accessed. An example is the Tandem e-mail system, which has a xe2x80x9cscreen cobolxe2x80x9d language and has hooks to find out when things occur. This system allows users to code filters to do custom operations when documents arrive and/or read. One of the differences between this system and the present invention, is that that system solves the problem in a specific domain and invokes only the user""s behaviors when the documents are accessed via the mail application. In the present invention, the behaviors are invoked regardless of the application and regardless of the interface.
The paper, xe2x80x9cFinding and Reminding: File Organization From the Desktopxe2x80x9d, D. Barreau and B. Nardi, SIGCHI Bulletin, 27 (3) July, 1995, reviews filing and retrieval practices and discusses the shortcomings of traditional file and retrieval mechanisms. The paper illustrates that most users do not employ elaborate or deep filing systems, but rather show a preference for simple structures and xe2x80x9clocation-based searchesxe2x80x9d, exploiting groupings of files (either in folders, or on the computer desktop) to express patterns or relationships between documents and to aid in retrieval. In response to the Barreau article, the article, xe2x80x9cFind and Reminding Reconsideredxe2x80x9d, by S. Fertig, E. Freeman and D. Gelernter, SIGCHI Bulletin, 28(1) January, 1996, defends deep structure and search queries, observing that location-based retrieval is, xe2x80x9cnothing more than a user-controlled logical search.xe2x80x9d There is, however, one clear feature of location-based searching which adds to a simple logical searchxe2x80x94in a location-based system, the documents have been subject to some sort of pre-categorization. Additional structure is then introduced into the space, and this structure is exploited in search and retrieval.
The article xe2x80x9cInformation Visualization Using 3D Interactive Animationxe2x80x9d, by G. Robertson, S. Card and J. Mackinlay, Communications of the ACM 36 (4) April, 1993, discusses a location-based structure, an interesting feature is that it is exploited perceptually, rather than cognitively. This moves the burden of retrieval effort from the cognitive to the perceptual system. While this approach may be effective, the information that the systems rely on is content-based, and extracting this information to find the structure can be computationally expensive.
The article xe2x80x9cUsing a Landscape Metaphor to Represent a Corpus of Documents,xe2x80x9d Proc. European Conference on Spatial Information Theory, Elba, September, 1993, by M. Chalmers, describes a landscape metaphor in which relative document positions are derived from content similarity metrics. A system, discussed in xe2x80x9cLifestreams: Organizing your Electronic Lifexe2x80x9d, AAAI Fall Symposium: AI Applications in Knowledge Navigation on Retrieval (Cambridge, Mass.), E. Freeman and S. Fertig, November, 1995, uses a timeline as the major organizational resource for managing document spaces. Lifestreams is inspired by the problems of a standard single-inheritance file hierarchy, and seeks to use contextual information to guide document retrieval. However, Lifestreams replaces one superordinate aspect of the document (its location in the hierarchy) with another (its location in the timeline).
The article xe2x80x9cSemantic File Systemsxe2x80x9d by Gifford et al., Proc. Thirteenth ACM Symposium of Operating Systems Principals (Pacific Grove, Calif.) October, 1991, introduces the notion of xe2x80x9cvirtual directoriesxe2x80x9d that are implemented as dynamic queries on databases of document characteristics. The goal of this work was to integrate an associating search/retrieval mechanism into a conventional (UNIX) file system. In addition, their query engine supports arbitrary xe2x80x9ctransducersxe2x80x9d to generate data tables for different sorts of files. Semantic File System research is largely concerned with direct integration into a file system so that it could extend the richness of command line programming interfaces, and so it introduces no interface features at all other than the file name/query language syntax. In contrast, the present invention is concerned with a more general paradigm based on a distributed, multi-principal property-based system and with how interfaces can be revised and augmented to deal with it; the fact that the present invention can act as a file system is simply in order to support existing file system-based applications, rather than as an end in itself
DLITE is the Stanford Digital Libraries Integrated Task Environment, which is a user interface for accessing digital library resources as described in xe2x80x9cThe Digital Library Integrated Task Environmentxe2x80x9d Technical Report SIDL-WP-1996-0049, Stanford Digital Libraries Project (Palo Alto, Calif.) 1996, by S. Cousins et al. DLITE explicitly reifies queries and search engines in order to provide users with direct access to dynamic collections. The goal of DLITE, however, is to provide a unified interface to a variety of search engines, rather than to create new models of searching and retrieval. So although queries in DLITE are independent of particular search engines, they are not integrated with collections as a uniform organizational mechanism.
Multivalent documents define documents as comprising multiple xe2x80x9clayersxe2x80x9d of distinct but intimately-related content. Small dynamically-loaded program objects, or xe2x80x9cbehaviorsxe2x80x9d, activate the content and work in concert with each other and layers of content to support arbitrarily specialized document types. To quote from one of their papers, xe2x80x9cA document management infrastructure built around a multivalent perspective can provide an extensible, networked system that supports incremental addition of content, incremental addition of interaction with the user and with other components, reuse of content across behaviors, reuse of behaviors across types of documents, and efficient use of network bandwidth.xe2x80x9d
Multivalent document behaviors (analogs to properties) extend and parse the content layers, each of which is expressed in some format. Behaviors are tasked with understanding the formats and adding functionality to the document based on this understanding. In many ways, the Multivalent document system is an attempt at creating an infrastructure that can deal with the document format problem by incrementally adding layers of xe2x80x9cunderstandingxe2x80x9d of various formats. In contrast, the present invention has an explicit goal of exploring and developing a set of properties that are independent of document format. While properties could be developed that could parse and understand content, it is expected that most will be concerned with underlying storage, replication, security, and ownership attributes of the documents. Included among the differences between the present invention and the Multivalent concepts are that, the Multivalent document system focuses on extensibility as a tool for content presentation and new content-based behaviors; the present invention focuses on extensible and incrementally-added properties as a user-visible notion to control document storage and management.
File systems known as the Andrew File System (AFS), Coda, and Ficus provide a uniform name space for accessing files that may be distributed and replicated across a number of servers. Some distributed file systems support clients that run on a variety of platforms. Some support disconnected file access through caching or replication. For example, Coda provides disconnected access through caching, while Ficus uses replication. Although the immediately described distributed file systems support document (or file) sharing, they have a problem in that a file""s hierarchical pathname and its storage location and system behavior are deeply related. The place in the directory hierarchy where a document gets stored generally determines on which servers that file resides.
Distributed databases such as Oracle, SQL Server, Bayou, and Lotus Notes also support shared, uniform access to data and often provide replication. Like some distributed file systems, many of today""s commercial databases provide support for disconnected operation and automatic conflict resolution. They also provide much better query facilities than file systems. However, distributed databases suffer the same problems as file systems in that the properties of the data, such as where it is replicated and how it is indexed and so on, are generally associated with the tables in which that data resides. Thus, these properties cannot be flexibly managed and updated. Also, the set of possible properties is not extensible.
A digital library system, known as the Documentum DocPage repository, creates a document space called a xe2x80x9cDocBase.xe2x80x9d This repository stores a document as an object that encapsulates the document""s content along with its attributes, including relationships, associated versions, renditions, formats, workflow characteristics, and security. These document objects can be infinitely combined and re-combined on demand to form dynamic configurations of document objects that can come from any source.
DocPage supports organization of documents via folder and cabinet metaphors, and allows searching over both document content and attributes. The system also provides check in/checkout-style version control, full version histories of documents, and annotations (each with its own attributes and security rules). The system also supports workflow-style features including notification of updates. DocBase uses a replicated infrastructure for document storage (see: http://www.documentum.com).
Among the key differences between Documentum DocPage and the present invention are: First, in the present system properties are exposed as a fundamental concept in the infrastructure. Further, the present system provides for a radically extensible document property infrastructure capable of supporting an aftermarket in document attributes. Documentum seems to be rather closed in comparison; the possible attributes a document can acquire are defined a priori by the system for a particular application environment and cannot be easily extended. Second, Documentum does not have the vision of universal access to the degree of the present invention which supports near-universal access to document meta-data, if not document content. In comparison, the scope of Documentum narrows to document access within a closed setting (a corporate intranet).
A document is an entity comprising identity, properties, and content. This definition permits document functionality including accessing properties, renaming, comparing for equality, and passing references.
The present invention contemplates generation of a virtual document constructed within the environment of a document management system that separates the content of a document from properties which are used to describe the document. The document management system further includes bit providers which retrieve the content portion of the document and deliver it to the document.
The document is unaware of the storage location of the content. In the case where a document system provides an interface to an underlying document storage repository, the generated virtual document capitalizes on the concept that a one-to-one correlation between the offered documents and the underlying documents is not required in the document management system.
With attention to a more limited aspect of the present invention, a document according to the teachings of the present invention is comprised of documents stored in more than a single document storage repository.
With attention to yet another aspect of the present invention, the document is comprised of a sub-portion of a file located in a document storage repository.
With attention to still another aspect of the present invention, the document is composed from documents in a plurality of document storage repositories where at least one of the sub-documents of the document is a sub-portion of a file.
With attention to still yet another aspect of the present invention, a bit provider retrieves the content from the different repositories, and then combines the content into a form which is perceived by a viewer as a single document.
A principal advantage of the present invention is creation of documents independent of their storage location such that a single document may consist of content from multiple repositories and/or segments of a larger file.
A further advantage of the present invention is that a bit provider combines the multiple portions of documents into a unified document, which from the perspective of users may be used and reviewed as if it were from a single document storage repository.
Yet another advantage of the present invention is that the virtual documents may be generated from sources independent of any repositories, e.g., on-line weather repositories, stock market tickers, etc., which may be accessible through protocols that are not document repository specific.
Another advantage of the present invention is that virtual documents are full-fledged DMS documents, affording all the benefits thereof. Particularly, virtual documents can be managed as full-fledged documents, with properties or as part of collections or extractions of information in repositories.
Still a further advantage of the present invention is increasing the ease with which a user may organize a document, wherein the user may review the overall document or sub-sections of the document may be presented as their own stand-alone documents.
Still other advantages and benefits will become apparent to those skilled in the art upon a reading and understanding of the following detailed description.