The management and use of documents has changed drastically with the evolution of electronic communications, and in particular, with the Internet and intranets. In the past, a document was simply a tangible media, such as a piece of paper, conveying information or data. Today, a document is an object conveying information that is created at a given time; it may be manipulated by various people and tools; it may be duplicated and transported from place to place; and finally it may be deleted or simply forgotten on a storage media at some location.
Only a few of all the manipulations that may be performed on a document are traceable. If a document is created by a word processing program, for example, the program may track certain types of manipulations about the document, such as editing, printing and accessing and this information may be stored with the document. However, if the document is copied, a record of the copying is usually not stored on the original document or the copy or elsewhere. If the document is translated, say from English to French via an automatic translator, the fact of the translation (even given the low quality of the translation) is not recorded on the original document. Nor is the translation itself recorded or accessible with the original document should someone desire the translation at a later date.
In addition to the information pertaining to word processing-type document manipulations, many documents are moved from site to site or from user to user. The path of distribution and the fact that a document undergoes changes through its travels add to the knowledge or information about the document. This kind of knowledge is generally not available to users, particularly users in an organization or users on an intranet or the Internet. In fact, most of the information about what happened to the document during its whole life (e.g., who read it, reviewed it, where it was sent as an email attachment, who liked it, etc.) is lost.
Generally when a document is considered important, it is simply duplicated in a large number of copies that are widely distributed. Users in an organization tend to share the feeling that the more copies are made, the more confident they are that the important knowledge contained in the document will be spread throughout the organization. In some organizations the document will be indexed and described in terms of important keywords and stored in a document management repository, where it may be accessed via an intranet or over the Internet. Then its URL will be forwarded to a certain number of users with a note to read the important information or knowledge contained in the document.
In order to store documents in a document management repository, certain additional data called metadata is stored with the document. Metadata is simply data about data. However, increasingly, the term has come to refer to data used to aid the identification, description and location of networked electronic resources, including documents. A variety of metadata formats currently exist from the basic proprietary records used in global Internet search services through a continuum encompassing simple attribute/value records.
Metadata has been used to encode information about a document, such as historical data and activity-centered information. The use of metadata has also been recognized as having a role in the ongoing management and preservation of digital resources. For example, it has been suggested that metadata could be used for recording the technological context of a resource's origins, for managing and recording rights management information, for preserving the authenticity and reliability of resources as well as for resource discovery. Preservation metadata could be used for checking the integrity of document files.
Even if important documents are placed on a document management repository and broadcast messages are sent to interested parties, current document properties and repository management features do not ensure that the right knowledge or information will be made available to the right people exactly when they need it. The importance of information/knowledge to users is not the same for all users and it depends heavily on the context. The importance of information also evolves over time; a piece of knowledge that was of not much interest to a user and deleted two months ago may suddenly become key to the user or to other users in the organization.
There is a need for a system and method of managing documents containing metadata which extracts as much metadata and information as possible from the documents. There is also a need for a system and method of managing documents which tracks all of the information about what happened to a document during its whole life (e.g., who read it, reviewed it, where it was sent as a email attachment, who liked it, etc.). There is also a need for a system and method of managing documents which stores as additional information the result of what happened to the document (for example, the comment associated with a review, the translation obtained from an automatic translator, the definitions of the terms recognized by a terminology checker tool etc.). There is also a need for a system and method of managing documents that can track document distribution data. There is a further need for a system and method of managing documents that can track a document's path of distribution and a document's changes. There is also a need for a method and a system of managing documents that can transfer information about or contained in the document to other sources and environments.