Corporations and other organizations routinely copy data produced and/or stored by their computer systems in order to retain an archive of the data. For example, a company might retain data from computing systems related to e-commerce, such as databases, file servers, web servers, and so on. The company may also retain data from computing systems used by employees, such as those used by an accounting department, marketing department, engineering, and so on.
Often, such retention and/or archiving amasses large amounts of data. There may be data copied or retained by way of periodic or one-time backups, continuous data protection (CDP) backups, snapshot backups, and so on. The data may include personal data, such as financial data, customer/client/patient contact data, audio/visual data, and other types of data. Organizations may also retain data related to the correct operation of their computer systems, such as operating system files, application files, user settings, and so on.
Once the stored data has aged a certain amount of time, the data storage systems may send the data to a data archive that stores the data for as long as is required. Typical data storage systems create a first storage copy for short term data recovery and after a certain time send the copies to an archive for long term storage. Thus, organizations are storing large amounts of data in their data archives at great expense.
Organizations increasingly rely on computer systems to produce and store critical information and the retention and recovery of data may cause problems in their operation and overall effectiveness. For example, a data storage system may receive an identification of a file location to store and create one or more storage files containing the contents of the stored file and/or location. The data storage system can then restore data from these storage files (such as backup files) should anything happen to the original data.
At times, organizations may want to quickly access data stored in their data archives. For example, an organization may receive a discovery request for a small amount of email data. Although the amount of requested data may be small, the data storage system may need to search many archive files (such as backup tapes) to find the requested data.
Companies are often required to retain documents in archive files in order to comply with various regulations. For example, when a company is in litigation, the company may be required to retain documents related to the litigation. Employees are often asked not to delete any correspondence, emails, or other documents related to the litigation. Recently enacted amendments to Federal Rules of Civil Procedure (FRCP) place additional document retention burdens on a company. According to Gartner, “Several legal commentators believe that the heart of the proposed changes to FRCP is the formal codification of “electronically stored information” (ESI) and the recognition that the traditional discovery framework dealing with paper-based documents is no longer adequate.” Legal discovery of electronic information has emerged as a key requirement for today's enterprise in recent years, and the new federal rules both strengthen and expand those requirements.
Complying with all of the regulations related to document retention can be difficult, particularly when many employees may have relevant documents stored under their control that are relevant to the issue at hand. Penalties for violation of regulations related to document retention can be steep, and executives and business managers want confidence that employees are taking appropriate steps to comply with the regulations. Employees may forget about requests to retain documents, or may not think that a particular document is relevant when others would disagree.
Companies also need provisions for finding retained documents. Traditional search engines accept a search query from a user, and generate a list of search results. The user typically views one or two of the results and then discards the results. However, some queries are part of a longer-term, collaborative process. For example, when a company receives a legal discovery request, the company is often required to mine all of the company's data for documents responsive to the discovery request. This typically involves queries of different bodies of documents lasting days or even years. Many people are often part of the query, such as company employees, law firm associates, and law firm partners. The search results must often be viewed by more than one of these people in a well-defined set of steps (i.e., a workflow). For example, company employees may provide documents to a law firm, and associates at the law firm may perform an initial reading of the documents to determine if the documents contain relevant information. The associates may flag documents with descriptive classifications such as “relevant” or “privileged.” Then, the flagged documents may go to a law firm partner that will review each of the results and ultimately respond to the discovery request with the set of documents that satisfies the request.
Collaborative document management systems exist for allowing multiple users to participate in the creation and revision of content, such as documents. Many collaborative document management systems provide an intuitive user interface that acts as a gathering place for collaborative participants. For example, Microsoft Sharepoint Server provides a web portal front end that allows collaborative participants to find shared content and to participate in the creation of new content and the revision of content created by others. In addition to directly modifying the content of a document, collaborative participants can add supplemental information, such as comments to the document. Many collaborative document management systems also provide workflows for defining sets of steps to be completed by one or more collaborative participants. For example, a collaborative document management system may provide a set of templates for performing common tasks, and a collaborative participant may be guided through a wizard-like interface that asks interview-style questions for completing a particular workflow.
The foregoing examples of some existing problems with data storage, archiving, and restoration are intended to be illustrative and not exclusive. Other limitations will become apparent to those of skill in the art upon a reading of the Detailed Description below.