In a document management system, there is often a need to retain old versions of documents. Consider the case where a document is kept within the document management system, but needs to be modified. The document may be obtained from the document management system, updated, then stored back into the document management system. This new version of the document is now another item to be managed by the document management system.
There are many possible reasons why the new version is separately stored, instead of replacing the previous version. These may include, among other reasons: realizing that an error was made, and restoring to an earlier version; providing a history of changes that allows the process of creating the document to be reviewed; or adhering to policy rules or legal requirements for retaining documents.
Retaining older versions of documents increases the cost of owning a document because each version requires additional storage space. The number of versions of documents that may be created is increasing. This in turn is increasing the cost of operating a document management system.
There are several possible reasons for the increase in the rate of document version creation, and these include:
Strong integration between document authoring and editing software and the document management system. This encourages even “work in progress” documents to be centrally stored in a document management system.
New collaboration software becoming popular. With many people capable of editing the same document in a short period of time, the possible number of versions of a document that are created can increase dramatically.
Both of these scenarios illustrate a common pattern in the lifecycle of a document—rapid changes and many versions occurring in relatively short periods, usually near the start of a document's life.
One approach to limiting space requirements for versions is to not save versions. If a new version of a document is created, it replaces the existing version. This approach does not allow any of the benefits of retaining versions of documents, such as recovery and compliance.
Another solution is to retain a fixed number of versions of a document. For example, if ten versions of a document are to be saved, when version eleven is created then version one is deleted. When version one hundred is created, version ninety is deleted. A variation of this approach is to limit the number of versions by storage space. If the newest version causes the total space used by all versions to exceed an acceptable limit, older versions are deleted until the storage space used is within an acceptable range. While combinations of these solutions will reduce the storage requirements, there is no guarantee that a version of the document that is, e.g., six months old will be available if it is needed.
A more comprehensive approach to managing document versions is often found in Records Management products. Records Management will often incorporate policies and retention schedules. An example view is typically that documents are classified into types. Each type has a policy that dictates how long the document should be kept. Past this date, it is deleted, or subject to other constraints such as legal holds. For example, letters to customers may be kept for seven years, and then deleted. Records Management does not address the storage space versus utility of a version, and when many versions exist in a short time, Records Management policies may not provide any relief for cost of storage.
Another approach that some systems employ uses differences between documents to minimize storage requirements. For example, if a new version contains a change in one sentence, then only this sentence difference is recorded. There are many variations of this approach, but there are weaknesses. Often the difference analysis features must be built into the authoring/editing application, and not all sources of information support this type of capability, which means the solution is only applicable to specific types of documents. A general “binary” difference approach will also have problems if data is encrypted, and if any of the intermediate copies are damaged or lost, one may not be able to restore the older or newer version.